Body
Most machine learning methods we use in NLP are methods for learning from unbiased labeled data. However, in NLP we always learn from biased data. When we train a parser on a treebank, for example, and apply it to emails, legal text, or university websites, our training data is biased in terms of genre, style, recency, possibly dialect, etc. In this talk we present learning algorithms for automatically correcting bias - or algorithms for learning under sample bias.
The first part of the talk focuses on large-margin perceptron learning algorithms for learning from weighted data. We discuss sampling vs. weighting and different weight functions. In the second part of the talk we consider the more challenging scenario where the target data cannot be assumed to form a single, coherent distribution, but where instead we need to adapt our model to every new data point on the fly.
Anders Søgaard at Center For Sprogteknologi, Copenhagen: http://cst.dk/anders/main.html