• Home
  • CLT seminar: Detmar Meurers - On the Automatic Analysis of Learner Corpora: Modeling between Surface Features and Linguistic Abstraction

CLT seminar: Detmar Meurers - On the Automatic Analysis of Learner Corpora: Modeling between Surface Features and Linguistic Abstraction

SEMINAR

Learner corpora as collections of language produced by language learners have been systematically collected since the 90s, and with increasing numbers and types of learner corpora becoming available, in principle there is a growing empirical basis on which theories of second language acquisition can be informed and applications can be trained and tested. While most research on learner corpora has analyzed the (co)occurrence of (sequences of) words or manual error annotation, tools for automatically analyzing large corpora in terms of linguistic abstractions such as parts-of-speech, syntactic constituency, or dependency are in principle available - though they also raise fundamental conceptual questions related to the linguistic annotation of learner language.

The situation also raises some questions which are reminiscent of the discussion on the role of exemplars vs. prototypes in language, namely surface forms as such and when linguistic categories abstracting and generalizing over surface forms are useful in a corpus-based analysis.

In this talk, I want to illustrate some of the underlying conceptual issues and then exemplify the trade-off between surface-based and deeper linguistic modeling based on our experiments in Native Language Identification, the task of automatically determining the native language of a non-native writer.

This talk is based on joint work with Serhiy Bykh and Julia Krivanek:

  • Detmar Meurers, Julia Krivanek and Serhiy Bykh (2014): On the Automatic Analysis of Learner Corpora: Native Language Identification as Experimental Testbed of Language Modeling between Surface Features and Linguistic Abstraction. Diachrony and Synchrony in English Corpus Studies edited by Alexandro Alcaraz Sintes and Salvador Valera. Frankfurt am Main: Peter Lang. 285-314.

Depending on interests/time, I may also include aspects of:

  • Serhiy Bykh and Detmar Meurers (2014): Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization. Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland.

  • Serhiy Bykh, Sowmya Vajjala, Julia Krivanek, and Detmar Meurers (2013): Combining Shallow and Linguistically Motivated Features in Native Language Identification. Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), Atlanta, GA, USA.

  • Serhiy Bykh and Detmar Meurers (2012): Native Language Identification Using Recurring N-grams - Investigating Abstraction and Domain Dependence. Proceedings of the 24th International Conference on Computational Linguistics (COLING), Mumbai, India.

http://www.sfs.uni-tuebingen.de/~dm/

Date: 2014-09-25 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

add to Outlook/iCal

To the top

Page updated: 2014-08-29 11:34

Send as email
Print page
Show as pdf

X
Loading