Current Question Answering (QA) technology is focused mainly on the mining of written text sources for extracting the answer to questions both from open-domain and restricted-domain document collections. However, most human interaction occurs through speech, e.g. meetings, seminars, lectures, telephone conversations. All these scenarios provide large amounts of information that could be mined by QA systems. As a consequence, the exploitation of speech sources brings QA a step closer to many real world applications.
In addition, speech transcriptions differ from classical written text in many aspects, and this makes QA for speech transcriptions an interesting research area. The most common differences are:
The aim of this third year of QAST is to provide a framework in which QA systems can be evaluated in a real scenario, where the answers of oral and written questions (factual and definitional) in English, French and Spanish have to be extracted from speech transcriptions (manual and automatic transcriptions) in the respective language.
The particular scenario consists in answering oral and written questions related to speech presentations
- European Parliament Sessions (Spanish or English) and Broadcast News (French) -.
Relevant points will be:
The proposed evaluation of QA on automatic speech transcriptions can be best understood from the perspective of the target application: searching audio streams with natural language questions. In this application, the input is a spontaneous oral question or a written question that is matched against the automatic transcriptions generated behind the scenes for all the audio streams available. However, even though the QA system searches automatic transcriptions, the output made available to the user is start/end pointers in the audio stream where the exact answer is located.
Consider the following example: one audio stream contains the information "Jacques Chirac went to Berlin" and the user wants to know where the French president has been: "Where did Jacques Chirac go?". If perfect transcriptions of the audio stream were available, this example would have an obvious solution and the whole problem would be no different than regular QA on written text. However, consider the case when the automatic transcription of the above stream contains two errors: "went" is transcribed as "ate" and "Berlin" as "Barcelona". Hence the automatic transcription of the full stream is: "Jacques Chirac ate to Barcelona". In this case, the correct answer to be extracted is "Barcelona", because this is the text that points to the correct answer in the audio stream.
The above example illustrates the two principles that guide this track: