• seminar

seminar

SEMINAR

In natural language processing, finite-state string transducer methods have been found useful for solving a number of practical problems ranging from language identification via morphological processing and generation to part-of-speech tagging and named-entity recognition, as long as the problems lend themselves to a formulation based on matching and transforming local context. In his talk, Krister Lindén will focus on probabilistic parallel replace rules and their applications using HFST–Helsinki Finite-State Technology.

Date: 2014-04-24 10:30 - 11:30

Location: K333, Lennart Torstenssonsgatan 6

Permalink

SEMINAR

Driver distraction is one common cause of accidents, and is often caused by the driver interacting with technologies such as mobile phones, media players or navigation systems. In SIMSI, we have taken steps towards developing a system which enables safe interaction with technologies in vehicles, by reducing the cognitive load imposed by the interaction and minimizing head-down time.

The primary goal of the project has been to carry out research focusing on reducing driver distraction using integrated multimodality and dialogue strategies for cognitive load management. Based on the research, the project has developed interaction strategies for minimizing distraction, and empirically investigated different interaction strategies from a safety perspective. To reach this goal, a number of activities have been carried out:

  • A technical setup for testing and demonstrating the system was constructed to enable simulator tests and demonstrations of the system.

  • A couple of applications to interact with were designed and implemented, to enable testing of the generic interaction strategies developed in the project.

  • Interaction strategies for reducing visual and cognitive distraction were designed and implemented. The interaction strategies were divided into (1) multimodal solutions to reduce head-down time (visual distraction), and (2) solutions for reducing the cognitive load of the driver.

  • Tests of the implemented applications and strategies were carried out that potentially allows results to feed back into the development cyclle

We will present some initial results at the seminar. Although test results are still being analyzed, we believe that the SIMSI system will be shown to reduce distraction, cognitive load and head-down time considerably when compared to other state-of-the-art in-vehicle interaction models.

Date: 2014-04-10 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Machine translation (MT) can be divided into quality-oriented and coverage-oriented approaches (also known as dissemination and assimilation, respectively). The current main stream is coverage-oriented: most people use MT for getting an idea of what a document is about, but don't rely on it when they want to publish their own documents. Coverage-oriented systems must be able to translate everything, whereas quality-oriented systems usually have to sacrifice coverage and specialize on some domain.

Most available coverage-oriented systems are statistical (Google translate, Bing), but there are also rule-based systems available (Systran, Apertium). In MT research, the main line of research is hybrid systems combining statistics with linguistic knowledge. In this talk, we will present a hybrid MT approach based on GF, Grammatical Framework.

Most of the previous work in GF has focused on small, quality-oriented systems working on controlled languages; the main asset has been the scalability to high numbers of parallel languages. But recent developments in GF runtime algorithms and language resources have made it possible to address the coverage-oriented task of "translating everything". This happens of course with some loss of quality, but the great advantage of GF (and some other knowledge-based systems) is that we can make a clear distinction between levels of confidence. We have used this knowledge in translation programs by marking translations as green (reliable), yellow (grammatically correct but unreliable), and red (unreliable but "still better than nothing"). There is also a clear recipe for improving the quality by increasing the size of the "green" area.

The talk will explain how grammars of different levels are created and combined, how statistics is used in the translation process and for bootstrapping grammars, and how the resulting system performs in comparative evaluation. The current system is available in ten languages and will soon be released both as a web service and as a mobile Android app.

Aarne Ranta, Krasimir Angelov, Inari Listenmaa, Prasanth Kolachina, Ramona Enache, Thomas Hallgren

Date: 2014-04-03 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Domain knowledge is supposed to help guide automated natural language understanding (NLU) but how exactly? In this talk we present work on using domain models to guide text interpretation, in the context of a project that aims to interpret English questions as a sequence of queries to be answered from structured databases.

We discuss how a broad-coverage and ambiguity-enabled natural language processing (NLP) system is adapted to produce domain-specific logical forms, using knowledge of the domain to zero in on the appropriate interpretation. A theorem prover then attempts to prove the logical forms by reasoning over an axiomatic domain theory that constitutes a higher-level abstraction of the contents of a set of related databases, identifying the groundings, and retrieving the values through procedural attachments semantically linked to the databases. The linguistic analysis component is thus responsible for providing a high-level, implementation-independent description of the domain-specific information in the NL question. The reasoning and retrieval component then aligns the language with the database structure and pulls out the desired information from the databases.

Knowledge of the domain can be used to construct the intended interpretation by providing a basis for disambiguation and relation specification. But an external domain model is not directly a model of interpretation for the language of the questions. To get at such an interpretation requires an intermediate layer of semantic analysis that constructs a network of implicit semantic links based on the syntactically available links between expressions. We describe how these intermediate structures help to both disambiguate the text, and convert it to appropriate logical forms that can be instantiated by values constructed from fields in the relevant databases.

Date: 2014-03-20 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

One challenge for dialogue modelling, as well as other aspects of artificial intelligence, is the fact that human reasoning is non-monotonic. Among other things this reflects the fact that we usually do not have access to all information regarding an issue or that we have access to information and principles of reasoning which are in fact incompatible.

We suggest that rather than default rules of logic, we use rhetorical rules of thumb – topoi – to underpin our non-logical arguments, which in rhetoric are referred to as enthymemes. Enthymemes and topoi are defeasible and a set of topoi accessed by one individual may be inconsistent. A rhetorical perspective highlights the importance of individual agents’ point of view and goals in interaction, and in order to account for dialogue participants’ individual takes on the interaction we model their information states during the course of a reasoning dialogue in Type Theory with Records.

Date: 2014-03-13 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Automatic summarization can help users extract the most important pieces of information from the vast amount of text digitized into electronic form everyday. Central to automatic summarization is the notion of similarity between sentences. We propose the use of continuous vector representations for semantically aware representations of sentences as a basis for measuring similarity. The approach is evaluated on a standard dataset using the ROUGE evaluation measures. Our experiments show that this method improves performance of a state-of-the-art summarization framework and strongly indicate the benefits of continuous word vector representations for automatic summarization.

Date: 2014-03-06 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

There have been a number of hand-crafted computational grammar development projects in the last few decades attempting to build comprehensive wide-coverage grammars for natural languages using different grammatical formalisms. Such formalisms have been designed to be richer than context-free grammars in terms of their generative capacity. While efforts in defining these formalisms have contributed to grammars with detailed linguistic analysis, such grammars also lack the distributional information necessary for disambiguation tasks such as parsing. Alternatively, grammars constructed with necessary distributional information from annotated corpora like treebanks have shown to be effective in a wide variety of NLP applications, but are typically not linguistically interesting.

However, these efforts to construct grammars from annotated corpora are often interleaved with language-specific and annotation-specific information to extract linguistic units of grammars. Such annotation-specific information can be abstracted away during grammar extraction, allowing uniform extraction of grammars for multiple languages. This has been verified in the case of context-free grammars where language-independent methods to construct grammars from corpora have been proposed over time. In my talk, I will address this issue in the context of Tree Adjoining Grammars. TAG grammars, proposed by Joshi et. al (1976) have been developed for a wide range of languages and put to use in a multitude of NLP applications ranging from parsing to generation.

I propose a ‘normative’ grammar extraction procedure to extract multi-lingual TAG grammars by seperating out language- and annotation-specific details out of the extraction procedure. As part of this, I will address the specific problem of inducing argument/adjunct distinction in syntactic structures without using annotation-specific details. I will present the results of my experiments on the Swedish treebank Talbanken, and show that the procedure can indeed work in an annotation-neutral manner. The results show that the extracted grammars can serve as a first-order approximation to hand-crafted grammars useful in creating wide-coverage grammars.

Date: 2014-02-27 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

The question of whether grammaticality is a binary categorical or a gradient property has been the subject of ongoing debate in linguistics and psychology for many years. Linguists have tended to use constructed examples to test speakers’ judgements on specific sorts of constraint violation.

We applied machine translation to randomly selected subsets of the British National Corpus (BNC) to generate a large test set which contains well-formed English source sentences, and sentences that exhibit a wide variety of grammatical infelicities. We tested a large number of speakers through (filtered) crowd sourcing, with three distinct modes of classification, one binary and two ordered scales. We found a high degree of correlation in mean judgements for sentences across the three classification tasks. We also did two visual image classification tasks to obtain benchmarks for binary and gradient judgement patterns, respectively.

The sentence judgement distributions for individual speakers strongly resemble the gradience benchmark pattern. This evidence suggests that speakers represent grammatical well-formedness as a gradient property. In current work we are studying the extent to which enriched lexical n-gram models and other probabilistic models track the AMT judgements in our experiment. We are also employing machine learning techniques to identify the most significant features of these models. We briefly describe this modeling work.

Shalom Lappin, King's College London

Joint work with Jey Han Lau and Alexander Clark, King's College London

Date: 2014-02-20 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Date: 2014-02-06 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Date: 2014-01-30 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

X
Loading