• seminar

seminar

SEMINAR

Dry run of talks for Nodalida 2013 in Oslo

Yvonne Adesam & Gerlof Bouma: Experiments on sentence segmentation in Old Swedish editions

Malin Ahlberg & Peter Andersson: Towards automatic tracking of lexical change: linking historical lexical resources

Date: 2013-05-16 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

The ICALL* platform Lärka  is an open-source web-based application that uses principles of Service-Oriented Architecture  and reuses Korp and Karp for exercise generation.

The platform in aimed primarily at learners of Swedish as a Second/Foreign language. It is divided into several modules: an exercise generator with activities for university students of linguistics and second/foreign language learners including multiple-choice and spelling exercises; and modules facilitating different aspects of development and research. These at the moment consist of an experimental sentence readability module for a level-wise selection of appropriate dictionary examples / exercise items and an editor for learner-oriented corpora.

The platform is under active development, and in this talk we will describe its current state, including the exercise generation and the principles behind them, as well as the two projects relevant for Lärka's development: the project on the collection of a corpus of course book texts used in CEFR**-based language teaching and the sentence readability project.
We expect our talk to be interesting for computational linguists, language teachers, lexicographers and linguists in general.

*ICALL = Intelligent Computer-Assisted Language Learning
**CEFR = Common European Framework of Reference for Languages

Date: 2013-05-02 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Readability formulas are methods used to match texts with the readers' reading level. Several methodological paradigms have previously been investigated in the field. The most popular paradigm dates several decades back and gave rise to well known readability formulas such as the Flesch formula (among several others).

In this talk, I will present the results of a study I did in collaboration with Thomas Francois from the UCLouvain. We compare traditional readability formula approaches (henceforth "classic") with an emerging paradigm which uses sophisticated NLP-enabled features and machine learning techniques.

Our experiments, carried on a corpus of texts for French as a foreign language, yield four main results: (1) the new readability formula performed better than the "classic" formula; (2) "non-classic" features were slightly more informative than "classic" features; (3) modern machine learning algorithms did not improve the explanatory power of our readability model, but allowed to better classify new observations; and (4) combining "classic" and "non-classic" features resulted in a significant gain in performance.

https://sites.google.com/site/elenimi2/home

Date: 2013-04-25 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Four short talks in preparation for the FrameNet workshop in Berkeley.
 

  • Richard Johansson: Automatic Lexicon Expansion in the Swedish FrameNet
  • Dimitrios Kokkinakis: Medical Event Extraction using the Swedish FrameNet
  • Leif-Jöran Olsson: The lexical editing system of Karp
  • Rudolf Rydstedt: Globally defined semantic roles in the Swedish Constructicon

Date: 2013-04-11 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Rule-based machine translation systems need explicit linguistic resources which are usually coded by human experts in a highly time-consuming task. When experts are not available for a given language pair, automatic or semi-automamic acquisition of these resources can be carried out.

In this talk, a method to build monolingual dictionaries from the contributions of non-expert users is presented. A strategy to learn shallow-transfer rules from small parallel corpora is also described. Finally, the integration of shallow-transfer rules in statistical machine translation is addressed.

Victor Sánchez-Cartagena (Alacant)

Date: 2013-04-04 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

A wordnet is a kind of lexical semantic network that describes word meanings in terms of lexical semantic relations which they enter. This is a limited kind of description, but it has allowed to build large wordnets (e.g. the seminal Princeton WordNet, plWordNet), which are useful in various Natural Language Processing applications. A wordnet must be large enough to provide practical support for such applications. Still, its construction process is laborious which is a serious limitation. However, the indispensable manual work can be effectively facilitated by language knowledge extracted from large corpora. An example are pairs of words linked by various lexico-semantic relations that are processed by the WordnetWeaver system to produce suggestions for wordnet expansion.

Pattern-based methods exploit occurrences of word pairs in search for lexico-syntactic constructions that can be markers of particular lexico-semantic relations. Distributional Semantics methods are based on the analysis of statistically significant similarities among different word uses in order to identify those words that are semantically related. Results produced by methods of both types are complementary to some extent: pattern-based methods extract pairs of words that seem to be associated by particular lexico-semantic relations, while Distributional Semantics produces measures of semantic relatedness between words. Advantages, disadvantages and limitations of both paradigms will be discussed on the basis of rich practical experience in their utilisation.

The WordnetWeaver system can utilise the results of a number of different extraction methods and suggest the likely location of a new word in the wordnet. Each suggestion defines a potential sense of a new word. The suggestions are presented visually on the relation network graph. Linguists can browse suggestions, modify and freely edit the wordnet structure.

The complete process of data processing and relation extraction will be discussed from the perspective of our experience of wordnet building. A corpus-based lexicographic process supported by the WordnetWeaver system will be presented. Possibilities and limitations of the semi-automated wordnet expansion will be discussed on the basis of examples collected during plWordNet expansion

The work was co-funded by the European Union Innovative Economy Programme (Project POIG.01.01.02-14-013/09) and the Polish Ministry of Science and Higher Education (Project N N516 068637).

Maciej Piasecki (Wroclaw)

www.nlp.pwr.wroc.pl

www.plwordnet.pwr.wroc.pl

Date: 2013-03-21 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

In this talk, I will present two versions of a system for computing Gricean implicatures based on the Question Under Discussion, first in classical possible worlds semantics (sufficient for most purposes) and then in inquisitive semantics (for fancier tricks).

With the classical version, we can compute that, for example, "She speaks English" implies "She does not speak Swedish" in a context where the question is what languages she speaks and Swedish is one of the relevant languages (a 'scalar' or 'exhaustivity' implicature). The system is based on Stalnaker's (1978) notion of common ground and Groenendijk and Stokhof's (1984) semantics and pragmatics of questions.

In the second part of the talk, I will combine these ideas with inquisitive semantics (Groenendijk & Roelofsen 2009), which makes it possible to distinguish between "She speaks English", which can imply that she does not speak Swedish, and "She speaks at least English", which does not.

Date: 2013-03-14 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Ontologies have become a popular method for modelling knowledge about specific domains such as bio-medicine as well as general domain such as in the DBpedia ontology. However, current models for ontologies contain little information about the usage of such labels in natural language and are difficult to apply in a multilingual setting. In order to cover this gap, we propose a model called lemon (Lexicon Model for Ontologies), which enables existing ontologies to be extended with an independent lexical layer.

lemon allows ontologies to be connected with lexical, syntactic, terminological and morphological description of words and terms and in particular it describes how lexical entries are mapped to ontological predicates. As lemon is based on the linked data model RDF, lemon is also ideally suited to representing machine-readable dictionaries on the Web and linking them with existing semantic and lexical resources. I will describe the usage of lemon in leveraging existing resources as part of the Lexical Linked Open Data Cloud and the challenges presented, as well as recent developments in the model, that have occurred under the W3C Community Group on Ontology Lexica.

Finally, I will briefly describe the applications of lemon for NLP, in particular in a question answering system developed in Bielefeld that uses lemon alongside the Grammatical Framework (GF) to answer question by means of queries over the linked data cloud

More about John McCrae (post-doctoral researcher at the University of Bielefeld)

More about lemon

Date: 2013-02-28 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

The question of whether it is possible to characterise grammatical knowledge in probabilistic terms is central to determining the relationship of linguistic representation to other cognitive domains. We present a statistical model of grammaticality which maps the probabilities of a statistical model for sentences in parts of the British National Corpus (BNC) into grammaticality scores, using various functions of the parameters of the model. We test this approach with a classifier on test sets containing different levels of syntactic infelicity. With appropriate tuning, the classifiers achieve encouraging levels of accuracy. These experiments suggest that it may be possible to characterise grammaticality judgements in probabilistic terms using an enriched language model.

Shalom Lappin, King's College London

(Joint work with Alexander Clark and Gianluca Giorgolo)

Date: 2013-02-21 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

The LekBot project, part 1, was a collaboration between DART, Talkamatic and GU in 2010-2011. The project developed a talking and playing robot for children with communicative disabilities, with the aim of providing a toy that is easy and fun to use, and that provides opportunities for genuine play in the sense of play that is spontaneous, independent, on equal terms, etc. Three test groups participated in the project, with each test group consisting of a child with cerebral palsy, a peer and pre-school staff. All groups were recorded in interactions with various versions of the system. The LekBot project, part 2, started in 2012, and is a collaboration between GU and DART. The focus now is on the analysis of the recorded interactions. In the talk we will present current results of the analysis, and discuss implications for further development of the LekBot system.

Date: 2013-02-14 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

X
Loading