• seminar

seminar

SEMINAR

A computational grammar and lexicon for Maltese

John J. Camilleri will defend his M. Sc. thesis in Computer Science (Algorithms, Languages and Logic track).

Abstract:

Maltese is the national language of Malta and an official language of the European Union. While classified as Semitic, Maltese has been heavily influenced by the Romance languages and English, and features both root-and-pattern and concatenative morphologies. Despite its active use, the language is highly under-resourced in digital terms. This thesis contributes two computational resources for Maltese: a grammar and an online full-form lexicon.

The first part of this thesis deals with a computational grammar for Maltese, which is implemented using the Grammatical Framework (GF). GF's Resource Grammar Library (RGL) already covers the morphology and basic syntax of some 27 languages from around the world.  Maltese is the 28th addition to the RGL, and the first Semitic language in the library. The smart paradigms implemented in the morphological part of grammar allow full inflection tables to be produced for any lexical unit, often requiring only a lemmatised form. We will look at some of the more interesting implementational details of the grammar, and discuss the compromises that had to be made along the way.

The second part covers the collection of various small lexical resources into a single searchable collection, using a schema-less database to accommodate partial data from heterogeneous sources. We then use the smart paradigms from the morphological part of the grammar to automatically produce some 4 million inflection forms and extend the collection into a full-form computational lexicon, which can be used in for morphological lookup and spell checking.

All the software and resources described in this thesis are open-source and free to use for any purpose.

The thesis is available on the web in a draft version.

Date: 2013-09-12 15:15 - 16:00

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

SEMINAR

Translation is a text production mode that imposes cognitive (and cultural) constraints on the text producer. The product of this process, known as translationese, reflects these constraints; translated texts are therefore ontologically different from texts written originally in the same language. Many of the special properties of translationese are believed to be universal, in that they are manifest in any translated text regardless of the source and target languages.

In this work we test several Translation Studies hypotheses using a computational methodology that is based on supervised machine learning. Casting the problem in the paradigm of authorship attribution, we define dozens of classifiers that implement various linguistically-informed features that reflect translation universals. While the practical task of distinguishing original from translated texts is easy, we focus not on improving the accuracy of classification, but rather on designing linguistically meaningful features and assessing their contribution to the task. We demonstrate that some feature sets are indeed good indicators of translationese, thereby corroborating some hypotheses, whereas others perform much worse (sometimes at chance level), indicating that some 'universal' assumptions have to be reconsidered.

While our results are limited to the case of translationese, this methodology can be adopted for studying other kinds of texts produced under different cognitive constraints, such as texts produced by non-native speakers, by people with learning disabilities or medical problems, or by children acquiring a language.

Shuly Wintner, Department of Computer Science, University of Haifa

Date: 2013-09-12 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Tobias Günther in the MLT programme will defend his master's thesis "Sentiment Analysis of Microblogs".

In this work we examine the problem of sentiment analysis in microblogs, which has become a popular research topic in the last years. We provide a detailed review of previous work in the field and a survey summarizing common practices and available resources. Furthermore, we conduct a series of machine learning experiments using the largest manually annotated dataset for this task so far, evaluating techniques that were not compared before, not used before at all, or for which conflicting results have been reported in previous studies.

Our final model outperforms previous results for different datasets and our entry in the SemEval-2013 shared task on sentiment analysis in Twitter developed as part of this work was ranked first or second in three of the four experimental conditions of the shared task.

Supervisor: Richard Johansson
Opponent: Ildikó Pilán
Examiner: Torbjörn Lager

Date: 2013-06-20 14:30 - 15:30

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

Ildikó Pilán in the MLT programme will defend her master's thesis "NLP-based Approaches to Sentence Readability for Second Language Learning Purposes".

The aim of this thesis was to select sentences suitable for learners of Swedish as a second language in a (semi-)automatic way from native language corpora using Natural Language Processing (NLP) techniques. Besides the aspect of being understandable, sentences also needed to correspond to the criteria of appropriate candidate items for exercises and for the illustration of the meaning of vocabulary terms.

A number of lexical, semantic and syntactic factors have been identified as influential for these purposes. We experimented with a purely heuristic approach as well as with a combination of rules and classification with supervised machine learning methods. Both algorithms have been made available as a web service in Lärka, an online language learning platform. The selected sentences can be used, among others, during the automatic generation of exercises and for finding example sentences during dictionary compilation. Thus the potential users of the web service include not only Swedish L2 teachers and learners, but also lexicographers.

We carried out an empirical evaluation with the participation of members representing each of these categories of users. The results indicate that the algorithms satisfy rather well the criteria of understandability, but there is room for improvement in the selection of more optimal exercise items and dictionary examples.

Supervisors: Elena Volodina and Richard Johansson
Opponent: Tobias Günther
Examiner: Torbjörn Lager

Date: 2013-06-20 13:15 - 14:15

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

Cuicui Sun in the MLT programme will defend her master's thesis "Free Robust Parsing".

This master's thesis is dedicated to the implementation of a free robust parser, which helps to simplify the building of a dialogue interface for a dialogue-controlled application. This parser is built both grammar and domain independently. Specifically, on one hand, this parser does not require a specified grammar which brings a limitation for parsing; on the other hand, this parser is built into a generalized common template which can be applied to more than one domain. For the reason of testing, in this thesis, the parser is implemented and tested in the domain of a music player application. This parser aims for translating the input given by the user into the correct command which drives the application to run the tasks.

Free and robust reflects that the user is not restricted in any specified pattern when s/he gives a request (input). Instead, the user is relaxed to request a domain-specific arbitrary input. The goal for building this parser is to extract the key information from the input as more expected and accurate as possible to satisfy the user to the most degree.

The parser has a python interface and was implemented on Linux. There are two main collections involved in building the parser: a test corpus of various inputs; and a music database which was created based on the raw data obtained from internet.

Supervisor: Peter Ljunglöf
Opponent: Dijana Pijetlovic
Examiner: Torbjörn Lager

Date: 2013-06-19 15:45 - 16:45

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

Dijana Pijetlovic in the MLT programme will defend her master's thesis "Swedish spelling game: Developing Swedish spelling exercises on the ICALL platform Lärka using Text-to-Speech Technology"

The purpose of this thesis is to develop web services on the ICALL platform Lärka for automatic generation of Swedish spelling exercises using Text-To-Speech (TTS) technology.

The spelling exercises contain 5 different linguistic levels, whereby the user has the choice between word, inflected word, phrase, sentence and performance based level. The performance-based level allows the user to train linguistic levels according to the user performance. The embedded Avatar pronounces a random item of the desired level, which the user has to spell. The user has also the possibility to train his/her own words for different linguistic levels. A result tracker containing a total and correct answer score keeps track of the user performance.

This paper describes the user interface, exercise principles, giving especial attention to algorithms of item generation at different linguistic levels. In order to analyze typical spelling mistakes and provide better feedback, misspellings are collected in a database. The usability of the spelling exercises, concerning the different linguistic levels and the quality of speech, was tested by carrying out an evaluation based on a questionnaire with 10 participants.

From the research point of view, the spelling error analysis and the relevant feedback as well as the question whether the text-to-speech technology for Swedish is applicable in a language learning environment, play an important role in this thesis.

The results of the evaluation showed that the text-to-speech technology for Swedish is mature enough for use in L2 context.

Supervisor: Elena Volodina
Opponent: Cuicui Sun
Examiner: Torbjörn Lager

Date: 2013-06-19 14:30 - 15:30

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

David Junger in the MLT programme will defend his master's thesis "Client-side SCXML for Multimodal User Interfaces".

This thesis explores the appeal and use of client-side SCXML to control multimodal Web applications, and details the workings of a client-side JavaScript implementation of SCXML enabling the proposed design.

Supervisors: Simon Dobnik and Torbjörn Lager
Opponent: Benjamin Glass
Examiner: Bengt Nordström

Date: 2013-06-10 15:45 - 16:45

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

Sevasti Louizou in the MLT programme will defend her master's thesis "GF Modern Greek Resource Grammar".

The subject of this master thesis is the development of a bilingual lexical resource which includes psych-verbs in Greek and Swedish extracted from available monolingual, bilingual and multilingual lexical resources. The semantic and syntactic properties of Greek and Swedish verbs of psychological state are encoded mainly in terms of selectional restrictions and subcategorisation information.

Also, the report presents the methodology of developing the Computational Bilingual (Greek-Swedish and Swedish-Greek) Lexicon of Psych-verbs from existing resources. The thesis provides a detailed description of the Lexicon and possible application scenarios.

Supervisors: Dimitrios Kokkinakis and Voula Giouli
Opponent: Ioanna Papadopoulou
Examiner: Torbjörn Lager

Date: 2013-06-10 14:30 - 15:30

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

Apostolos Apostolidis in the MLT programme will defend his master's thesis "Text-to-Speech News Article Reader on Talkamatic Dialog Manager".

This thesis describes the development of an automatic text-to-speech News Article Reader (NAR) on Talkamatic Dialog Manager (TDM). TDM is an issue-based dialog manager developed by Talkamatic and enables the building of complex multi-modal applications. NAR is an application that provides multi-modal access to news article abstracts, basic full article reading functions and a number of preference functionalities.

NAR was built by implementing specific dialogue design techniques and adapting them to enhance the intuitiveness and efficiency of the application. It is also examined as a demonstration on how TDM's strongest features can enhance application development and how its current limitations can be overcome. Additionally, it serves as a proving ground for a stream ranking algorithm that was adapted from the Reddit one to better meet NAR's requirements. Lastly, a reflection is made on the application's usability, based on results of a number of small evaluation sessions.

Supervisors: Peter Ljunglöf and Alexander Berman
Opponent: Rajsekhar Iyer
Examiner: Torbjörn Lager

Date: 2013-05-30 15:45 - 16:45

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

Benjamin Glass and Rajsekhar Iyer in the MLT programme will defend there master's thesis "A Dialog Based Search System".

This thesis describes the creation of a chat-based dialog system for the purpose of drilling down search engine results in an easy to use manner. We describe the use of named entity recognition in facet based filtering of search results as well as n-grams for query refinement. We then evaluate the system using a survey and present the results of the evaluation.

Supervisors: Peter Ljunglöf, Svetoslav Marinov, and Alexander Berman
Opponent: Apostolos Apostolidis
Examiner: Torbjörn Lager

Date: 2013-05-30 14:30 - 15:30

Location: T219, Olof Wijksgatan 6

Permalink

X
Loading