• seminar

seminar

SEMINAR

 


Data and Methodology in the PhD-Project “Poetry Processes on the Internet”

Poetry communities are good examples of how Internet during the last decade has radically changed the condition for production and consumption of literary texts. On the Web, everyone can publish, read and discuss poetry. One of these new arenas is the Swedish community www.poeter.se, with about 26 000 registered members. This community is the main focus of Julia Pennlert’s PhD project “Poetry processes on the Internet”. The project deals with questions concerning the content of web-published poetry, the comments on these poems and the valuation of these poems (i.e. processes of canonization) in this special web-based context.

On poeter.se thousands of users publish an enormous amount of poems, comments, discussions posts and value judgements. There are different ways of “using” this arena for anyone interested in poetry. Together they constitute a large research material with a complex structure. This kind of material also opens up for fascinating possibilities for using language technology as a tool in the task of answering new kinds of questions for the literary scholars, as well as other web-based researches. The material and these questions also pose a challenge to language technology.

In the seminar we talk about how we work together and present examples of how the material can be represented and explored in ways meaningful from the point of view of literary studies.

Julia Pennlert (Umeå university) and Sverker Lundin (University of Gothenburg)

 

Date: 2011-03-24 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR


So far GF was used only for parsing small controlled languages but the improvements in the parsing performance in the last few years made it possible to dream about parsing open domain unrestricted text. In the resource libraries, we already have wide coverage grammars for many languages but having a grammar is only part of the problem. Even if we improve more and more our grammars it will be always possible to find syntactic constructions which are not covered by the grammar. Another problem is that when we add more syntactic constructions in the grammar, this usually makes it more ambiguous. We need a parser that is robust and is able to do statistical ranking when there are ambiguities in the grammar. I did some preliminary expreriments in robust parsing but the general conclusion is that we need a good treebank which we can use for statistical training. Since we don't want to build our own treebanks an attractive alternative is to try to convert some existing one.

In this talk I will present the current state of the GF port of Penn Treebank. All parse trees in the treebank were converted to abstract syntax trees for the English Resource Grammar. When there are unknown syntactic constructions then we just leave placeholders in the abstract tree. Currently we have matched 69% of the constructions with the grammar. More is possible but takes time.

Date: 2011-03-17 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR


In this talk, we present some explorations in using phoneme n-grams for the task of automatic language classification using Swadesh word lists. In recent years, historical linguistics has seen a huge amount of work in verifying hypotheses using computational methods. Some of the tasks which are addressed are estimation of language distance, the relation between the diversity of phonological inventory and the time depth of language families and spatial spread. We observe that there is a good correlation between the phonological diversity and spatial spread and time depth of a language family.

Date: 2011-03-03 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR


Never before in history have data been generated and collected at such high volumes as today. As the volumes of data available to business people, scientists, and the public increase, their effective use becomes more challenging. Keeping up to date with the flood of data, using standard tools for data analysis and exploration is fraught with difficulty. The field of visual analytics seeks to provide people with better and more effective ways to understand and analyze large datasets, while enabling them to act upon their findings immediately. Visual analytics integrates the analytic capabilities of the computer and the abilities of the human analyst, allowing novel discoveries and empowering individuals to take control of the analytical process. The talk presents the challenges of visual analytics and exemplifies them with several application examples, illustrating the exiting potential of visual analysis techniques but also their limitations.

http://www.informatik.uni-konstanz.de/en/arbeitsgruppen/infovis/members/prof-dr-daniel-keim/

 

Date: 2011-02-24 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

 


Planning event for the CLT seminar series - Spring semester 2011

 

Date: 2011-01-13 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

 


Anette Hulth (Swedish Institute for Communicable Disease Control)

Traditional infectious diseases surveillance is based on reports from clinicians and laboratories. It thus requires that those who are ill seek care. As a complement to this traditional surveillance, sources collected for purposes other than surveillance (for example sales of certain medicines or the number of telephone calls to a medical help line) can be exploited. At the Swedish Institute for Communicable Disease Control, a system based on queries submitted to a Swedish medical web site – Vårdguiden.se – has been implemented and functions on a regular basis as an additional source of surveillance. In the work performed at the institute, we have shown that web queries are an accurate, cheap and labour extensive source, which gives access to individuals who are not (yet) seeking care. In this presentation, I will focus on the web query-based influenza and norovirus surveillance. I will also give a very short introduction to infectious disease surveillance in general.

Date: 2011-02-10 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR


http://www.speech.kth.se/~annah/

Date: 2011-03-10 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR


A talk on two related subjects:

1. Information Extraction in the Clinical E-Science Framework

The Clinical E-Science Framework (CLEF) project built a system to extract clinically significant information from the textual component of medical records. Conventional clinical Information Extraction (IE) systems often use purpose built software, and many involve some degree of knowledge engineering to encode clinical and linguistic knowledge. The CLEF IE system is built largely from off-the-shelf components, and involves no additional knowledge engineering. Instead, clinical knowledge is provided by human annotated examples, which are used to learn statistical models of the text. This talk will describe the CLEF IE system, and the building of a training data set and gold standard. The talk will give evaluations of system performance for both entity and for relation extraction, with comparisons between training sets of different sizes and types. Finally, the talk will illustrate the quantity of data that can be extracted, by describing application of the system to a corpus of half a million clinical narratives and reports.

2. Pattern grammar based clinical information extraction: an agile process for building practical systems

When developing any IE application, both software and data must be considered. From a software engineering point of view, the last decade has seen the emergence of re-useable NLP frameworks and tool-kits. The task of building an NLP application for processing medical records can thus move from de novo systems development to the adaptation of these tool-kits and frameworks. From a data point of view, it must be considered that virtually all usable NLP techniques require significant volumes of manually prepared examples. One of the main stumbling blocks to developing medical NLP applications is the lack of such example data.

This talk will describe the application of an increasingly popular software engineering technique - an agile methodology - to tackle both IE systems adaptation and annotation of example texts at the same time, in a large hospital setting. Agile methodologies replace the linear requirements-design-implement approach to software engineering with early implementation and the iterative evolution of requirements. The approach taken maximises the involvement of clinician and medical researcher end-users, at low cost to their time. We believe that this has a beneficial effect on requirements gathering, and on final system quality. The talk will be illustrated with quantitative results from a working Proof-of-Concept application, and will discuss ongoing work to develop further applications in the same institutional setting, where we now have a successful and expanding production system.

In contrast to CLEF, the system implemented is based on hand-crafted pattern matching grammars. The talk will discuss this difference, and make comparisons between the two approaches.

Angus Roberts on the web: http://www.dcs.shef.ac.uk/~angus/

 

Date: 2011-01-20 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Monday seminar at the Department of Swedish:

Bernard Comrie

Max Planck Institute for Evolutionary Anthropology and
University of California Santa Barbara

Typology of Ditransitive Constructions

A ditransitive construction can be defined as a construction including an agent (A) argument, a recipient (R) argument, and a theme (T) argument, as in "Mary (A) gives a book (T) to John (R)". While some languages express corresponding content without using a three-place predicate (e.g. by using a serial verb construction), the vast majority of the world's languages have three-place predicates. Such languages present different alignment types (the relation between ditransitive and monotransitive constructions), the three main ones being indirective (T = P ≠ R), secundative (T ≠ P = R), and neutral (T = P = R), where P is the patient-like argument of a monotransitive predicate. Differences in alignment can be seen clearly in flagging (marking by case/adpositions) and verb indexing, i.e. coding properties, but a variety of other syntactic constructions are considered that provide evidence of alignment of behavioral properties, including passivization and relativization. Finally, some issues of lexical variation among ditransitive predicates will be considered, including from the perspective of semantic maps.

Date: 2010-10-25 15:15 - 17:00

Location: Lilla hörsalen, Humanisten

Permalink

SEMINAR

Stefan Schulz (Freiburg) - Automated codification of medical documents

Abstract:

The advance of electronic health records not only in industrialized countries but also in the developing world has made large amounts of medical narratives electronically available. These texts exhibit special characteristics, such as un- and paragrammatical sentences, the use of acronyms and abbreviations, and the abundance of typing errors.

In a case study from a Brazilian university hospital, selected content of medical discharge summaries in Portuguese was automatically mapped to SNOMED CT, a comprehensive medical terminology. SNOMED CT provides (chiefly multi-word) English and Spanish terms. In order to optimally align text sequences and terms, token chains that exhibited term-typical POS tags were selected from the source. These term candidates, together with all SNOMED terms, were  submitted to a morphosemantic normalization process, which consists of the extraction of meaningful subwords and their mapping to a language-independent interlingua.

The accuracy of the mapping, measured against a gold standard built by two domain experts (kappa 0.89) ranged from 0.66 for two-word terms to 0.89 for five-word terms.

Schulz on the web

Date: 2010-10-08 10:15 - 12:00

Location: L307, Lennart Torstenssonsgatan 8

Permalink

X
Loading