Morphologically complex languages are rich in inflectional and derivational morphology and compound formation. In contrast to languages like English, Hindi or Chinese, complex languages may offer tens or even hundreds of inflectional forms for nouns, for instance. Such variation is a challenge to Information Retrieval (IR) methods that are based on matching keywords to text indexes. The talk discusses reductive (such as stemming and lemmatization) and generative (inflectional stem and full word form generation) methods for IR in several languages, covering both index construction and query processing. Emphasis is given to IR effectiveness and the contribution of morphological processing in this.
Kalrvo Järvelin (Tampere)