• Home
  • A Semantically Annotated Swedish Medical Corpus

A Semantically Annotated Swedish Medical Corpus

Sourcetitle: 
roceedings of the 6th Language Resources and Evaluation Conference (LREC)
Year of publication: 
2008
PublicationType: 
Conference paper - peer reviewed

With the information overload in the life sciences there is an increasing need for annotated corpora, particularly with biological and biomedical entities, which is the driving force for data-driven language processing applications and the empirical approach to language study. Inspired by the work in the GENIA Corpus, which is one of the very few of such corpora, extensively used in the biomedical field, and in order to fulfil the needs of our research, we have collected a Swedish medical corpus, the MEDLEX Corpus. MEDLEX is a large structurally and linguistically annotated document collection, consisting of a variety of text documents related to various medical text subfields, and does not focus at a particular medical genre, due to the lack of large Swedish resources within a particular medical subdomain. Out of this collection we selected 300 documents which were manually examined by two human experts who inspected, corrected and/or accordingly modified the automatically provided annotations according to a set of provided labelling guidelines. The annotations consist of medical terminology provided by the Swedish and English MeSH® (Medical Subject Headings) thesauri as well as named entity labels provided by an enhanced named entity recognition software.

To the top

Page updated: 2012-01-30 14:04

Send as email
Print page
Show as pdf

X
Loading