• Home
  • Automating the development of multilingual grammars

Automating the development of multilingual grammars

Year of publication: 
Licentiate thesis

The thesis aims at exploring alternative ways of automating the development of multilingual GF (Grammatical Framework) grammars. The goal is to achieve semantics-preserving machine translation within a limited or semi-limited domain.

First there is an experiment that investigates the relation between language skills, programming skills and the effort to develop a grammar for natural language. Along with this, we present a prototype for an example-based system aimed at automating and simplifying the task of GF grammar writing. This is done by partially alleviating the burden of GF programming and facilitating the integration of SMT (statistical machine translation) tools and feedback from native informants into the development of a GF grammar.

Secondly, there is work on ontology representation in GF. The goal is to automatically build a robust language-independent semantic interlingua for the multilingual grammars, by using the projection of the ontology as a GF grammar. In this way the ontology can be verbalized with little effort into a number of languages. The resulting GF grammar is a controlled language describing ontological concepts and their relations as defined in the initial ontology.

Another approach towards automating and enhancing GF grammars is the grammar-based approach to a rule-based SMT hybrid translation system. In order to increase the coverage of a grammar, we enriched it with a bilingual lexicon built on the fly, with the aid of an SMT system specialised on the corpus. The goal of the hybrid system is to parse English patent claims from the biomedical domain and translate them to French. The work represents the first large-scale experiment to use GF for parsing arbitrary unstructured text.

A final direction considered is the development of a general language resource for Romanian. It can be used to build domain-specific resources which will use it as a library for handling syntactic phenomena. In this way more multilingual GF grammars can be ported to Romanian, without the need to re-implement the linguistic technicalities of the language every time.

To the top

Page updated: 2012-01-30 14:04

Send as email
Print page
Show as pdf