• seminar

seminar

SEMINAR

The publication boom of the last decade has created demand for automatic access to the content within the scientific literature. I will present a method which exploits automatically generated scientific discourse annotations and uses them for a number of biomedical applications, including the classification of paper types, facilitating the detection of relevant evidence and to create a content model for the summarisation of scientific articles. The scientific discourse is captured in terms of the CoreSC scheme, which annotates 11 content-based concepts such as Hypothesis, Result, Conclusion etc at the sentence level.

Date: 2013-12-05 10:30 - 11:30

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

SEMINAR

I have designed a protocol on top of HTTP and written a first version of a small (fewer than 1000 lines of code) Prolog library and a tiny JavaScript program that implement it. The protocol (tentatively named Logic Programming Transfer Protocol or LPTP for short) enables us to "program in logic" on the web, concurrently if we choose to, and distributed if we want. Allowing us to develop applications in a style that I would like to refer to as "web logic programming" it has changed the way I think about web programming in a big way.

I'm going to make a number of claims (outrageous claims, you may think) about the potential I see this having: 1) It might provide us with an interesting basis for designing and implementing new and interesting agent programming platforms. 2) It might be a suitable point of departure for a new kind of semantic web, a "wild semantic web" that we could start spinning tomorrow if we wanted to. 3) It constitutes an ideal way to interface Prolog (and probably many other relational/logic programming languages) with JavaScript, the most important programming language available in web browsers. 4) It allows us to fairly easily build dialogue managers, based on state machines (as in SCXML), frames (as in VoiceXML), information states or even AI-style planning. 5) It suggests a logical interface with other NLP tools (such as GF) and might thus be a promising way to finally implement CLT Cloud.

In this talk I will argue for 1, 2, and 3, as well as demonstrate with running examples. I may also touch briefly on 4 and 5, but most of that will probably have to wait.

Date: 2013-11-28 10:30 - 11:30

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

SEMINAR

The seminar will consist of brief presentations and discussions related to the CLT members' participation at the CIKM conference - The 22nd International
Conference on Information and Knowledge Management - that was held in Burlingame, USA between the Oct 27 - Nov 1, 2013 <http://www.cikm2013.org/workshops.php>.

The topics of the seminar include the following:

  • Workshop: Mining Unstructured Big Data Using Natural Language Processing
  • Workshop: Exploiting semantic annotations in information retrieval
  • Selected Papers from the main conference

Date: 2013-11-14 10:30 - 11:30

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

SEMINAR

Date: 2013-11-07 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Who is the female counterpart of Justin Bieber?

Recent advances in the field of neural networks have enabled the training of deep networks, i.e. networks that are able to learn multiple levels of representation. This allows the network to learn highly non-linear mappings, less dependent on clever feature engineering to achieve good performance. In fact, these networks automatically learn feature representations, directly from the data, that may be extracted and used in other applications. When trained on text, the learned word representations have been shown to express multiple dimensions of similarity, encoded as a simple superposition of semantic and syntactic basis vectors.

I will describe how these word representations may be derived using deep learning, and subsequently employ them to answer the posed question in the beginning of this text.

Date: 2013-10-31 10:30 - 11:30

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

SEMINAR

We present a novel method for entity disambiguation in anonymized graphs based on local neighborhood structure.

Most existing approaches leverage node information (entity attributes), which might not be available in several contexts due to privacy concerns, or information about the sources of the data. We consider this problem in the supervised setting where we are provided only with a base graph and a set of nodes labelled as ambiguous or unambiguous. We characterize the similarity between two nodes based on their local neighborhood structure using graph kernels; and solve the resulting classification task using SVMs. We give empirical evidence on two real-world datasets, comparing our approach to a state-of-the-art method, highlighting the advantages of our approach. We show that using less information, our method is significantly better in terms of either speed or accuracy or both.

We also present extensions of two existing graphs kernels, namely, the direct product kernel and the shortest-path kernel, with significant improvements in accuracy. For the direct product kernel, our extension also provides significant computational benefits. Moreover, we design and implement the algorithms of our method to work in a distributed fashion using the GraphLab framework, ensuring high scalability.

Read more about the project here.

Date: 2013-10-17 09:00 - 10:00

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

SEMINAR

Finding coordinations provides useful information for many NLP endeavors. However, major treebanks do not reliably annotate coordination. This makes it virtually impossible to detect coordinations in which two conjuncts are separated by punctuation rather than by a coordinating conjunction.

In this talk I present an annotation scheme for the Penn Treebank which introduces a distinction between coordinating and non-coordinating punctuation. General annotation guidelines and problematic cases are discussed, as well as classification experiments.
 

Wolfgang Maier (Düsseldorf)

Date: 2013-11-21 10:30 - 11:30

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

SEMINAR

One the fundamental tasks of Grammatical Inference (GI) is the induction of a formal grammar that, on the basis of access to either only positive or both positive and negative examples, correctly characterizes (linguistic) data.  The task is known to be computationally intractable in various incarnations, even when the grammar chosen is fairly limited in generative power, such as a finite-state automaton.  However, recent heuristic and machine learning approaches have improved the picture somewhat: fairly large non-probabilistic and probabilistic automata can now be learned with some degree of success. These can both yield useful generalizations and be used in applications.

In this talk I present some recent results related to the grammatical inference of finite automata, discuss some practical applications related to phonology and morphology, and draw some connections with alternate approaches to inducing small grammars from data using Bayesian approaches like Minimum Description Length/Minimum Message Length.

Date: 2013-09-26 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8

Permalink

SEMINAR

Figurative language poses a serious challenge to NLP systems. The use of idiomatic and metaphoric expressions is not only extremely widespread in natural language; many figurative expressions, in particular idioms, also behave idiosyncratically. These idiosyncrasies are not restricted to a non-compositional meaning but often also extend to syntactic properties, selectional preferences etc. To deal appropriately with such expressions, NLP tools need to detect figurative language and assign the correct analyses to non-literal expressions.

While there has been quite a bit of work on determining the general 'idiomaticity' of an expression (type-based approaches), this only solves part of the problem as  many expressions, such as "break the ice" or "play with fire", can also have a literal, perfectly compositional meaning (e.g. "break the ice on the duck pond"). Such expressions have to be disambiguated in context (token-based approaches). Token-based approaches have received increased attention recently. In this talk, I will present an unsupervised method for token-based idiom detection. The method exploits the fact that well-formed texts exhibit lexical cohesion, i.e. words are semantically related to  other words in the context.

Caroline Sporleder (Computational Linguistics and Digitization, Universität Trier)

Date: 2013-09-19 10:30 - 11:30

Location: T219, Olof Wijksgatan 6

Permalink

SEMINAR

Miriam R L Petruck holds a Ph.D. in Linguistics from the University of California, Berkeley. Having written the first dissertation in Frame Semantics and the lexicon under the supervision of Charles J. Fillmore, Petruck has been contributing to the development of the theory from before Fillmore's founding of FrameNet in 1997. She has published numerous papers demonstrating the efficacy of Frame Semantics for the characterization of the lexicon, as well as on FrameNet methodolgy and practice. Petruck is a member of the Aritificial Intelligence group at the International Computer Science Institute (ICSI), working primarily on FrameNet.

The abstract for the talk is attached as a pdf document.

Date: 2013-10-24 10:30 - 11:30

Location: EDIT Room 3364, E-building, Chalmers (Johanneberg)

Permalink

X
Loading