seminar

SEMINAR

Writer-based and reader-based views of text-meaning are reflected by the respective questions "What is the author trying to tell me?" and "What does this text mean to me personally?" Contemporary computational linguistics, however, generally takes neither view; applications do not attempt to answer either question. Instead, a text is regarded as an object that is independent of, or detached from, its author or provenance, and as an object that has the same meaning for all readers. This is not adequate, however, for the further development of sophisticated NLP applications for intelligence gathering and question answering.

I will discuss different views of text-meaning from the perspective of the needs of computational text analysis, and then extend the analysis to include discourse as well – in particular, the collaborative construction of meaning and the collaborative repair of misunderstanding.

Graeme Hirst (Toronto)

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Karin Cavallin (PhD student in GSLT) will present her forthcoming PhD thesis:

Detecting Lexical change via Semantic Distribution - Investigating meaning change in and through Lexical Sets

Opponent: Richard Johansson, Språkbanken

Location: T340, Olof Wijksgatan 6

Permalink

SEMINAR

Research in Spoken language technology (SLT) has done tremendous steps during the last 50 years. From the offline isolated word recognition in late 60's/early 70's to nowadays real-time multimodal (and multipurpose) spoken dialogue systems, the initially limited scientific area has been expanded, and its research outcome has become broadly available. As a result, new fields of application have been developed with an increasing demand for high quality products. Still, the available technology has not yet reached the performance of human communication.

A recent trend includes biologically-inspired hypotheses and perceptually-relevant assumptions in order to find a path for a new breakthrough in human-machine interaction by incorporating different scientific areas in a joint, multidisciplinary research effort. In this talk, we examine the importance of human perception in SLT and give examples of relevant applications.

Web page at KTH

Location: T346, Olof Wijksgatan 6

Permalink

SEMINAR

Wordnets are mostly constructed either on the basis of the transfer method applied to Princeton WordNet or on the basis of knowledge extraction from monolingual dictionaries. Neither of the methods could be applied in the construction of plWordNet. There were no publicly available bilingual Polish-English dictionaries nor monolingual Polish lexical resources. Moreover, we wanted plWordNet to be a faithful description of the Polish lexicalsystem.

Thus, from the very beginning plWordNet development process was based on the exploration of a huge Polish corpus. Language tools were employed in plWordNet development on every possible step: from data gathering through data analysis to data presentation. A set of language tools for advanced corpus browsing, as well as for the extraction of lexical semantic knowledge was developed and applied. The extracted knowledge was the input to the WordnetWeaver system which suggested nodes in the wordnet structure as potential attachment places for new synsets. The suggestions are presented visually on the relation network graph. Linguists can browse suggestions, modify and edit the wordnet structure. Automatically discovered senses are also described by automatically identified usage examples.

During the seminar, we will discuss the complete plWordNet development cycle: corpus gathering and preprocessing, lemma and lexico-semantic relation extraction, visual wordnet editing supported by the extracted knowledge and coordination supported by a system for monitoring the work of a team of linguists. Expansion of derivationally motivated lexico-semantic relations is facilitated by tools for example-based relation learning and corpus-based discovering of new relation instances. Next, we will present plWordNet to Princeton WordNet mapping process and tools facilitating it.

The most recent size of the corpus is 1.8 billion words. The complete process of data processing and relation extraction will be discussed from the perspective of our experience of wordnet building. A corpus-based lexicographic process supported by the Wordnet Weaver system will be presented. Possibilities and limitations of the semi-automated wordnet expansion will be discussed on the basis of examples collected during plWordNet expansion.

The work was co-funded by the European Union Innovative Economy Programme (Project POIG.01.01.02-14-013/09) and the Polish Ministry of Science and Higher Education (Project N N516 068637).

www.nlp.pwr.wroc.pl

www.plwordnet.pwr.wroc.pl

Location: L307, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Wordnets are built of synsets, not of words. A synsets consists of words. Synonymy is a relation between words. Words go into a synset because they are synonyms. Later, a wordnet treats words as synonymous because they belong in the same synset. . . Such circularity, which is a well-known problem, poses a practical difficulty in wordnet construction, notably when it comes to maintaining consistency.

plWordNet – a very large Polish wordnet – is a net of lexical units. We will discuss our assumptions and present their implementation in a steadily growing Polish wordnet. A small set of constitutive relations allows us to construct synsets automatically out of groups of lexical units of the same connectivity.

plWordNet system of relations will be presented and compared to systems of relations in several influential wordnets. Additional synset-forming mechanisms such as stylistic registers and verb aspect will be also discussed. The rich morphology of Polish pertains to the important role of lexico-semantic relations that are derivationally motivated.

The work was co-funded by the European Union Innovative Economy Programme (Project POIG.01.01.02-14-013/09) and the Polish Ministry of Science and Higher Education (Project N N516 068637).

www.nlp.pwr.wroc.pl

www.plwordnet.pwr.wroc.pl

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

The LekBot project, part 1, was a collaboration between DART, Talkamatic and GU in 2010-2011. The project developed a talking and playing robot for children with communicative disabilities, with the aim of providing a toy that is easy and fun to use, and that provides opportunities for genuine play in the sense of play that is spontaneous, independent, on equal terms, etc. Three test groups participated in the project, with each test group consisting of a child with cerebral palsy, a peer and pre-school staff. All groups were recorded in interactions with various versions of the system.

The LekBot project, part 2, started in 2012, and is a collaboration between GU and DART. The focus now is on the analysis of the recorded interactions. In the talk we will present current results of the analysis, and discuss implications for further development of the LekBot system.

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

In this talk I deal with automated acquisition of linguistic knowledge as a means of enhancing robustness of lexicalised grammars for real life applications.

I focus on Multiword Expressions (henceforward MWEs). Specifically, in the first part of the talk I am taking a closer look at the linguistic properties of MWEs, in particular, their lexical, syntactic, as well as semantic characteristics.

With the observations about the linguistic properties of MWEs at hand, I turn in the second part of the talk to methods for the automated acquisition of these properties for robust grammar engineering and parsing. To this effect, I first investigate the hypothesis that MWEs can be detected by the distinct statistical properties of their component words, regardless of their type, comparing various statistical measures, a procedure which leads to extremely interesting conclusions. I then investigate the influence of the size and quality of different corpora, using the BNC and the Web search engines Google and Yahoo. I conclude that, in terms of language usage, web generated corpora are fairly similar to more carefully built corpora, like the BNC, indicating that the lack of control and balance of these corpora are probably compensated by their size.

Then, I show a qualitative evaluation of the results of automatically adding extracted MWEs to existing linguistic resources. To this effect, I first discuss two main approaches commonly employed in NLP for treating MWEs: the words-with-spaces approach which models an MWE as a single lexical entry and it can adequately capture fixed MWEs like "by and large", and compositional approaches which treat MWEs by general and compositional methods of linguistic analysis. On this basis, I argue that the process of the automatic addition of extracted MWEs to existing linguistic resources improves qualitatively, if a more compositional approach to grammar/lexicon automated extension is adopted.

Finally, I propose that the methods developed for the acquisition of linguistic knowledge in the case of English MWEs can be tuned to enhance robustness of parsing with lexicalised grammars for languages with richer morphology and freer word order, as is the case of German.

Valia Kordoni is at Humboldt University, Berlin and at Saarland University

Location: TBA

Permalink

SEMINAR

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Jyrki Nummenmaa (Tampere)

Location: L308, Lennart Torstenssonsgatan 8

Permalink

X
Loading