• Home
  • MLT seminar: Detmar Meurers - Readability analysis as an experimental sandbox for exploring linguistic complexity

MLT seminar: Detmar Meurers - Readability analysis as an experimental sandbox for exploring linguistic complexity

SEMINAR

The analysis of readability has traditionally relied on surface properties of language, such as average sentence and word lengths and specific word lists. At the same time, there is a long tradition analyzing the Complexity, Accuracy, and Fluency (CAF) of language produced by language learners in second language acquisition (SLA) research. Reusing SLA measures of learner language complexity to analyze readability, Sowmya Vajjala and I explored which aspects of linguistic modeling can successfully be employed to predict the readability of a native language text.

Using various machine learning setups and corpora, we show that a broad range of linguistic properties are highly indicative of the readability of documents, from graded readers to web pages and TV programs targeting different age groups. The readability model using our full linguistic feature set currently is the best non-commercial readability model available for English (and second overall, with the commercial ETS model coming in first), based on the performance on the Common Core State Standard data set.

This talk focuses on our document-level readability models, and links it with our proficiency classification work, i.e., the task of determining the language proficiency of a writer based on a text they wrote in the second language. Some publications available at http://purl.org/dm/papers provide more detail:

Sowmya Vajjala and Detmar Meurers (in press) "Readability Assessment for Text Simplification: From Analyzing Documents to Identifying Sentential Simplifications". International Journal of Applied Linguistics, Special Issue on Current Research in Readability and Text Simplification edited by Thomas Fran├žois & Delphine Bernhard.

Sowmya Vajjala and Detmar Meurers (2014) "Assessing the relative reading level of sentence pairs for text simplification". Proceedings of EACL. Gothenburg, Sweden.

Sowmya Vajjala and Detmar Meurers (2014) "Exploring Measures of 'Readability' for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs. Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), EACL. Gothenburg, Sweden

Sowmya Vajjala and Detmar Meurers (2013) "On The Applicability of Readability Models to Web Texts." Proceedings of the Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), ACL. Sofia, Bulgaria.

Julia Hancke, Sowmya Vajjala and Detmar Meurers (2012) "Readability Classification for German using lexical, syntactic, and morphological features". Proceedings of COLING, Mumbai, India.

Sowmya Vajjala and Detmar Meurers (2012) "On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition". Proceedings of BEA7, ACL. Montreal, Canada.

Date: 2014-09-25 15:15 - 16:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

add to Outlook/iCal

To the top

Page updated: 2014-09-19 16:22

Send as email
Print page
Show as pdf

X
Loading