In an Enterprise environment a search system often crawls and indexes a large number of different data sources – databases, Content Management Systems, external web pages, file shares with different types of documents, etc. Each of the data sources or sub sources may have a primary target group – e.g. Sales, Engineers, Marketing, Doctors, Nurses, etc. all dependent on the type of organization.
The purpose of the (unified) Search system is to serve as a platform (a single entry point) to satisfy the information need for all the different groups in an organization. However, given that search queries are often short (~2.2 words) and ambiguous, and the users have different background, the system employs a number of techniques for filtering of and drill down into the search results. One such technique is facets, e.g. a filtering based on data source, additional keywords, dates, time, etc.
On the other hand, there are at least two types of users (behaviour) - those that know exactly what they look for and how to find it, and use the search in stead of menu clicks; and those who do not know exactly what they look for, nor where the potential information may be found. We can consider these two groups as the two extremes in much fine-grained scale.
We would like to concentrate on the second group of users, who often engage in some sort of dialogue with the search system. Such users may interact in several ways with the system during a search session – they may rewrite and expand they original query, they may filter it by facets, click on some documents until they finally discover or not what they were looking for.
Spoken dialogue systems are computer systems which use speech as their primary output and input channels. Dialogue systems are primarily used in situations where the visual and tactile channels are not available, for instance while driving, but also to replace human operators for instance in call centers. Recently, spoken dialogue systems have become more widespread with the arrival of Apple’s Siri and Google’s Voice Actions, even outside of the traditional areas of use. As speech and voice has the potential of transmitting large quantities of information very fast compared to traditional GUI interaction, this is a development which is likely to continue.
A spoken dialogue system typically consists of a dialogue manager, an Automatic Speech Recogniser, a Text-to-speech engine, modules for interpretation and generation of utterances and finally some kind of application logic.
Voice search is a term which has emerged the last years. The user speaks a search query, and the system responds by returning a hit list, much like an ordinary Google search. If the hitlist doesn’t contain the desired hit (document, music file, web site etc.) the user needs to do a new voice search with a modified utterance.
The idea of this project is to replace voice search by dialogue-based search, where the user and the system engage in a dialogue over the search results in order to refine the search query.
The task of the Masters thesis is to explore the possibilities of using Dialogue-Systems/Dialogue acts in order to satisfy the information needs of certain groups of users in a Search system. The target group consists of several types of users:
Before documents are being sent for indexing in the Search System, they have been augmented with META-data. The metadata allows us to do a number of things:
The format of the indexed document could look like:
<doc> <field name="id">6H500F0</field> <field name="name">Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300</field> <field name="manufacturer">Maxtor Corp.</field> <field name="category">electronics</field> <field name="category">hard drive</field> <field name="features">SATA 3.0Gb/s, NCQ</field> <field name="features">8.5ms seek</field> <field name="features">16MB cache</field> <field name="price">350</field> <field name="popularity">6</field> <field name="inStock">true</field> <field name="manufacturedate_dt">2006-02-13T15:26:37Z</field> </doc> <doc> <field name=”id”>1</field> <field name=”title”>London</field> <field name=”body”>London is the capital of UK. London has 7.8 million inhabitants</field> <field name=”places”>London</field> <field name=”date”>2012-11-30</field> <field name=”author”>John Pear</field> <field name=”author_email”>firstname.lastname@example.org</field> <field name=”author_phone”>+44 123 456 789</field> </doc>
Peter Ljunglöf (CS) together with Findwise AB and Talkamatic AB.
Talkamatic build dialogue systems and are currently using a GF-based grammar tool for parsing and generation. A unified language description is compiled into a speech recognition grammar (for Nuance Vocon ASR, PocketSphinx and others), a parser and a generator.
The problem with this is that the parser can only handle the utterances which the ASR can recognize fom the ASR grammar. The parser is thus not robust, and if an open dictation grammar is used (such as Dragon Dictate used in Apple’s Siri) the parser is mostly useless.
Currently TDM (the Talkamatic Dialogue Manager) requires all concepts used in the dialogue to be known in advance. Hence, for a dialogue-controlled music player, all artists, songs, genres etc. need to be known and explicitly declared beforehand.
There are disadvantages with this approach. For example, it requires access to a extensive music database in order to be able to build a dialogue interface for a music player.
To simplify the building of dialogue interfaces for this kind of application, it would be useful to have a more robust parse, which can identify sequences of dialogue moves from arbitrary user input strings.
”Play Like a Prayer with Madonna”
answer(”Like a Prayer”:song_title)
”Play Sisters of Mercy”
answer(”Sisters of Mercy”:song_name)
”Play Sisters of Mercy”
answer(”Sisters of Mercy”:artist_name)
”I would like to listen to Jazz”
Several different methods can be used: Named Entity Recognizers, regular expressions, databases etc., or combinations of several of these. A strong requirement is that the parser should be built automatically or semiautomatically from a small corpus or database. Computational efficiency is also desirable but less important. The parser must have a Python interface and run on Linux.
Peter Ljunglöf, Chalmers Data- och informationsteknik or Staffan Larsson, FLoV, together with Talkamatic AB. Talkamatic is a university research spin-off company based in Göteborg.
A smallcompensation may be paid by Talkamatic AB when the thesis is completed.
The goal of the project is to equip a robotic companion/dialogue manager with topic modelling and information extraction from corpora, for example Wikipedia articles and topic oriented dialogue corpora, to guide the conversation with a user. Rather than concentrating on a task, a companion engages in free conversation with a user and therefore must supplement traditional rule-based dialogue management with data-driven models. The project thus attempts to examine ways in which text-driven semantic extraction techniques can be integrated with rule-based dialogue management.
Possible directions of this project are:
A. Topic modelling
The system must recognise robustly the topics of user's utterances in order to respond appropriately. This method can be used in addition to a rule-based technique. Given a suitable corpus of topic oriented conversations:
B. Named entity recognition and information extraction for question
The system could take initiative and guide the conversation. It could start with some (Wikipedia) article and identify named entities. If any of the entities match the domain of questions that it can handle, it should generate questions about them.
User: I've been to Paris for holiday.
DM: Paris... I see. Have you been to the Eiffel tower?
C. Question answering
Supervisors: Simon Dobnik and possible others from the Dialogue Technology Lab
The task of the project is to learn a mapping between natural language descriptions on one hand and sensory observations and commands issued to a simple mobile robot (Lego NX) using machine learning. The project would involve building a corpus of descriptions paired with actions - one person is guiding the robot and another person is describing. Multimodal ML models would then be built from this corpus both to predict a description, action or perceptual observation. Finally, the models should be integrated with a simple dialogue manager with which humans can interact and test the success of learning in context.
The system should be implemented in ROS (Robot Operation System) which provides access to sensors and actuators of the robot and allows writing new models in a simplified (well-organised) manner in Python.
Contributions/possible research directions of this thesis:
Supervisors: Simon Dobnik and possible others from the Dialogue Technology Lab
Call for papers
The Workshop on Language, Action and Perception (APL) is intended to a be a networking and community building event for researchers that are interested in any form of interaction of natural language with the physical world in a computational framework. Both theoretical and practical proposals are welcome. Example areas include semantic theories of human language, action and perception, situated dialogue, situated language acquisition, grounding of language in action and perception, spatial cognition, generation and interpretation of scene descriptions from images and videos, integrated robotic systems and others. We would also like to welcome researchers from computer vision and robotic communities who are increasingly using linguistic representations such as ontologies to improve image interpretation, object recognition, localisation and navigation.
Johan Boye (KTH)
Robin Cooper (University of Gothenburg)
Nigel Crook (Oxford Brookes University)
Simon Dobnik (University of Gothenburg)
Raquel Fernandez (University of Amsterdam, The Netherlands)
John Kelleher (Dublin Institute of Technology, Ireland)
Staffan Larsson (University of Gothenburg)
Peter Ljunglöf (Chalmers University of Technology)
Robert Ross (Dublin Institue of Technolog, Ireland)
Please submit your abstract as a pdf document with your author details removed through EasyChair here.
The submitted abstracts will be published on the workshop web-page and the authors will be given an opportunity to present their work at the workshop in a form of brief oral presentations followed by a poster session.
Following the workshop the contributing authors will be invited to submit full-length (8 page) papers to be published in the CEUR Workshop Proceedings (ISSN 1613-0073) online.
The workshop will take place on October 25th 2012 in the room E 1145 of the E building of LTH (Lunds tekniska högskola, a part of Lund university) close to the other two SLTC 2012 workshops. You can find a map here.
The room will have a projector and a wifi connection. Eduroam should allow you to connect to the internet. If you don't have an Eduroam account, let us know in advance to request from the SLTC organisers wifi vouchers.
Workshop programme and proceedings
Simon Dobnik, Staffan Larsson, Robin Cooper, Centre for Language Technology and Department of Philosophy, Linguistics, and Theory of Science, Gothenburg University
name [dot] surname [at] gu [dot] se or apl2012 [at] easychair [dot] org
Image of a Red Rome apple courtesy of New York Apple Association, © New York Apple Association.
Develop a version of the FraCaS test suite in your native language
The FraCaS test suite was created as part of the FraCaS project back in the nineties. A few years ago Bill McCartney (Stanford) made a machine readable xml version of it and it has been used in connection with textual entailment. This project involves developing the test suite further as a multilingual web accessible resource for computational semantics.
Robin Cooper, Department of Philosophy, Linguistics and Theory of Science. The project will be carried out in connection with Dialogue Technology Lab associated with the Centre for Language Technology.
Based on the previous TrindiKit implementation of the ISU approach to dialogue management (which used a proprietary Prolog), we are now developing Maharani, an open-source Python-based ISU dialogue manager together with Talkamatic AB. The first release is expected in the spring of 2012.
Funding: DTL internal
Researchers: Staffan Larsson, Sebastian Berlin
Our purpose is to annotate seven pragmatic categories in the DICO (Villing and Larsson, 2006) corpus of spoken language in an in-vehicle environment, in order to find out more about the distribution of these categories and how they correlate. Some of the annotations have already been made, by one annotator.
To strengthen the results from this work, we are interested in establishing the degree of inter-coder reliability for the annotations. Also, as far as we know, no attempts have been made to annotate enthymemes (Breitholtz and Villing, 2008), a type of defeasible arguments, in spoken dialogue. A corpus of spoken discourse annotated for enthymemes would therefore be a welcome addition to the resources that are currently available.
Researchers: Jessica Villing, Ellen Breitholtz, Staffan Larsson (supervisor)
Funding: CLT internal
The current dialogue system used at the Dialogue Lab, GoDiS, depends on cut-off values to control turn-taking. This means that when the user has not spoken for a period of time, the system assumes the user is finished and takes the turn. This can lead to both interruptions and unnecessary long waits for the user.
To solve this problem, the system has to be able to detect when a speaker is finished or when he is just making a pause within his utterance. If the system can reliably detect the users end-of-utterance, it can take the turn more rapidly when the user is finished (and avoid interrupting the user when he/she is not finished).
To detect end of utterance, we assume that the system needs information from several sources: syntactic information, prosodic information and also information state. We will create a statistical language model for end-of-utterance detection, using machine learning. For this we will use the Weka toolkit.
We will attempt to create a model that allows the system to differentiate between user pauses within an utterance, and user pauses at the end of an utterance.
Funding: CLT internal.
Duration: August 2011 - October 2012
Researchers: Kristina Lundholm Fors, Staffan Larsson (supervisor).
Driver distraction is a common cause of accidents, and is often in turn caused by the driver interacting with technologies such as mobile phones, media players or navigation systems. A Multimodal HMI system complements traditional human-machine interaction modalities (visual output and haptic input) with spoken interaction. Speech solutions generally aim to increase safety but immature solutions may end up distracting the driver and decreasing safety.
In the SIMSI project, we aim to integrate an existing safety-oriented multimodal HMI system based on academic research into a commercial-grade HMI platform and uses this integrated system for research on dialogue strategies for cognitive load management and integrated multimodality. We expect to achieve an HMI solution which can be reliably shown to increase safety by reducing distraction, cognitive load and head-down time considerably when compared to other state-of-the-art in-vehicle interaction models. The HMI will be evaluated in simulators and real traffic.
Funding agency: Vinnova (FFI programme)
Partners: Talkamatic AB, Mecel AB
Contact person at CLT: Staffan Larsson