• seminar

seminar

SEMINAR

Within the NLP community, interoperability has been a major issue in the last 25 years, it has been subject of several standardization efforts, but nevertheless remains a problem partially solved at best.

Interoperability of linguistic resources involves two major aspects: Structural interoperability (annotations of different origin are represented using the same formalism) and conceptual interoperability (annotations of different origin are linked to a common vocabulary). Recently, it has been argued that both aspects can be addressed by representing linguistic resources using Semantic Web formalisms and in accordance with the Linked Data paradigm (Chiarcos et al., 2013).

In particular, the RDF data model (labeled directed multi-graphs) allows to generalize over the concept of feature structures which is underlying existing efforts to standardize corpora (ISO TC37/SC4:LAF, TEI), linguistic annotations (EAGLES, ISO TC37/SC4:ISOcat), and lexical resources (ISO TC37/SC4:LMF, TEI), thereby contributing to the interoperability between these standardization efforts.

This talk provides a general introduction into the topic and elaborates on two selected use cases:
– exploiting structural interoperability: combining annotated corpora and lexical resources
– exploiting conceptual interoperability: dealing with heterogeneous annotations in NLP pipelines

References:

Christian Chiarcos (2010), Towards Robust Multi-Tool Tagging. An OWL/DL-Based Approach. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, July 2010, 659--670.

Christian Chiarcos (2012), POWLA: Modeling linguistic corpora in OWL/DL. In: E. Simperl et al. (eds.), Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012). Springer, Heidelberg, Heraklion, Crete, May 2012 (LNCS 7295), 225--239.

Christian Chiarcos, John McCrae, Philipp Cimiano, and Christiane Fellbaum (2013), Towards open data for linguistics: Lexical Linked Data. In: Alessandro Oltramari, Piek Vossen, Lu Qin, and Eduard Hovy (eds.), New Trends of Research in Ontologies and Lexical Resources. Springer, Heidelberg.

Date: 2015-02-12 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

In my thesis I investigate textbooks in natural science used in the Swedish secondary and upper school with regards to traditional readability measurements, e.g. lix, ovix and nominal-ratio. In addition, I extract typical vocabulary, nominal phrases and syntactic structures according to a proposed quantitative method, based on linguistic features, labelled "the index principle". This method assures the variables being frequently used within a wide range and with an even distribution. The variables selected are also typical to the specific text type in question, i.e. textbooks, in relation to their occurrences in reference corpora, such as textbooks in social sciences and mathematics, and narrative and academic texts.

The result shows that textbooks in natural science contain a lot of content-specific, technical, vocabulary, separated from every day language. It is also highly nominal with most complexity lying within nominal phrases. The textbooks language at large shows a relatively low complexity in proportion to academic language. In the transition between secondary and upper secondary school, the texts score higher in almost every readability measure, indicating an increase in linguistic demands from the readers.

Examiner: Kristina Nilsson Björkenstam, Stockholm university

Date: 2015-05-04 15:15 - 18:00

Location: G312, Renströmsgatan 6

Permalink

SEMINAR

The Master's Programme in Language Technology welcomes you to the upcoming thesis defences:

10:30 - 12:00 -- Scharolta Sienčnik, "Improving GF German resources with HPSG. A study of extending grammatical knowledge across frameworks."
Examiner: Elisabet Engdahl. Supervisor: Aarne Ranta.
Thesis draft: http://demo.spraakdata.gu.se/richard/mlt2015/siencnik.pdf

14:30 - 16:00 -- Mehdi Ghanimifard, "Improving word-sense embeddings with context enrichment. Translations as supplementary context."
Examiner: Lars Borin. Supervisor: Richard Johansson.
Thesis draft: http://demo.spraakdata.gu.se/richard/mlt2015/ghanimifard.pdf

Date: 2015-06-11 10:30 - 16:00

Location: K332, Lennart Torstenssonsgatan 6

Permalink

SEMINAR

GF Offline Translator is the demonstrator for our heroic effort to scale GF from small controlled languages to a framework for processing free text. We will review the current status, we will reflect on the feedback that we get from current users and we will outline the existing problems and how those can be solved. Last but not least we will also present the first prototype of GF Offline Translator for iOS. A feature that was requested by far too many users.

Link to the presentation

Date: 2015-06-04 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Natural language processing (NLP) becomes increasingly important in people's everyday life if we look, for example, at the success of word prediction, spelling correction and instant on-line translation. Building linguistic resources and tools, however, is expensive and time-consuming, and one of the great challenges in computational linguistics is to port existing models to new languages and domains. Modern NLP requires data, often annotated with explicit linguistic information and tools that can learn from it. However, sufficient quantities of electronic data sources are available only for a handful of languages whereas most other languages do not have the privilege to draw from such resources. Speakers of low density languages and the countries they live in are not able to invest in large data collection and time-consuming annotation efforts, and the goal of cross-lingual NLP is to share the rich linguistic information with poorly supported languages making it possible to build tools and resources without starting from scratch. In this talk I will look in particular at transfer models for statistical dependency parsing. In my experiments I test these approaches on the recently released data sets with cross-lingually harmonized dependency annotation and I will show the potentials of simple yet effective annotation and treebank translation techniques. I will also include a discussion on shortcomings and problems of these models and welcome suggestions for future work.

Date: 2015-05-28 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Date: 2015-05-21 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

In this talk, I will report on ongoing work with colleagues that seeks to acquire scale-related information for sets of lexical items. Building on earlier work in which we compared various methods to induce intensity orderings for adjectives (good < great < excellent), we are currently exploring ways to order adverb-adjective combinations (involving the same adjective) by intensity, and to generalize from there to an overall ordering of adverbs. In presenting and trying to make sense of various unexpected findings, I will extensively discuss connections to and relevance for the purposes of linguistics, lexicography, and (frame-based) sentiment analysis.

Date: 2015-04-23 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Statistical parsers are effective but typically limited to producing projective dependencies or constituents. On the other hand, linguistically rich parsers recognize long-distance relations, analyze both form and function phenomena but rely on extensive manual grammar engineering. We combine advantages of the two by building a statistical parser that produces richer analyses.

We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase-structure trees, allowing for efficient context-free grammar parsing.

The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data.

Andreas van Cranenburgh
Institute for Logic, Language and Computation
University of Amsterdam
http://andreasvc.github.io/

Date: 2015-04-16 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

A growing responsibility on the part of individuals to make decisions in health issues implies the need of access to health information and personal skills to comprehend the information. Health literacy comprises skills in obtaining, understanding and acting on information about health issues in ways that promote and maintain health. The phenomenon may be approached in different ways, one in which health literacy is expressed as a polarized phenomenon, focusing on the extremes of low and high health literacy. The definitions of health literacy in this approach are characterized by a functional understanding, pointing out certain basic skills needed to understand health information. The other approach represents a complex understanding of health literacy, acknowledging a broadness of skills in interaction with the social and cultural contexts, which means that an individual’s health literacy may fluctuate from one day to another according to the context. The complex approach stresses the interactive and critical skills needed to use information or knowledge as a basis for appropriate health decisions. Health literacy is a heterogeneous phenomenon that has significance for both the individual and society.​

Lena Mårtensson
Institute of Neuroscience and Physiology, Sahlgrenska

Creator av http://www.halsolitteracitet.se/

Note: this seminar will be given in Swedish.

Date: 2015-04-09 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

The role of semantic aspects in the automatic assessment of linguistic complexity (readability) remains little explored in the literature, mostly due to the lack of reliable word-sense disambiguation (WSD) methods. This talk will give an overview of our research carried out in both the area of readability classification and WSD with the aim of working towards their future combination. In the first part, I will present a machine learning approach for classifying coursebook materials for teaching Swedish as a second language according to their linguistic complexity. Besides results for text-level analysis, I will also discuss performance at a finer-grained (sentence) level. The second half of the presentation will be dedicated to a first attempt at a knowledge-based WSD system using information from the SALDO lexicon. After the description of the method, some example sentences with both correctly and incorrectly disambiguated senses will follow. Finally, I will conclude with outlining the potentials of sense-based information for readability classification.

Date: 2015-03-19 10:30 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

X
Loading