Menu Content/Inhalt
Background arrow Tasks

News

REGISTRATION IS CLOSED
Data release

Past Editions

QAst 2007
QAst 2008

Tasks

The participants in QAST 2009 will be able to deal with six different tasks, each one involving different data sets related to European Parliament debates in English (T1) and Spanish (T2), and French broadcast news (T3), including the manual transcriptions and three different ASR outputs (automatic transcriptions).
 
The tasks defined in the QAST track are the following:
  • T1a: Question Answering of English written questions in the manual and automatic transcriptions of European Parliament Plenary sessions in English (EPPS English corpus).
  • T1b: Question Answering of manual transcriptions of English spontaneous oral questions in the manual and automatic transcriptions of European Parliament Plenary sessions in English (EPPS English corpus).
  • T2a: Question Answering of Spanish written questions in the manual and automatic transcriptions of European Parliament Plenary sessions in Spanish (EPPS Spanish corpus).
  • T2b: Question Answering of manual transcriptions of Spanish automatic oral questions in the manual and automatic transcriptions of European Parliament Plenary sessions in Spanish (EPPS Spanish corpus).
  • T3a: Question Answering of French written questions in manual and automatic transcriptions of broadcast news for French (ESTER corpus)
  • T3b: Question Answering of manual transcriptions of French automatic oral questions in manual and automatic transcriptions of broadcast news for French (ESTER corpus)
 

Evaluation procedure:

For each one of the corpora, a development set and an evaluation set will be released:
  • Development sets:
    • European Parliament Plenary Session English: 6 sessions and 50 questions
    • European Parliament Plenary Session Spanish: 6 sessions and 50 questions
    • French broadcast news: 18 shows and 50 questions
  • Evaluation sets:
    • European Parliament Plenary Session English: the sessions used in Dev set and 100 new questions
    • European Parliament Plenary Session Spanish: the sessions used in Dev set and 100 new questions
    • French broadcast news: the shows used in Dev set and 100 new questions
Two types of question are considered this year:
  • Factual questions
  • Definition questions
The ratio of factual questions, definition questions and NIL questions (e.g. questions having no answer in the document collection) depends on the corpus (EPPS EN, EPPS ES, ESTER) due to the fact that the questions have been generated in an spontaneous way.
 
Factual questions are those whose expected answer is a Named Entity. The possible Named Entities are:
  • Person: names of humans, real and fictional, fictional or real non-human individuals (the names of animals are not tagged as “Person” but as “Misc”)
  • Organisation: names of business, multinational organizations, political parties, religious groups, etc. For example: “CIA”, “IBM”, but also sometimes the “Bulls”, “Washington” when they display the characteristics of an organisation.
  • Location: geographical, political or astronomical entities. For example: “California”, “South of California”, “Earth”...
  • Time: a date or a specific moment in time, absolute and relative time expressions. For example: “March 28th”, “last week”, at “four o’clock in the morning”
  • Measure: measures of length, width or weight. Generally, a quantity and a unit of measurement. For example: “five kilometres”, “20 hertz”... But also ages, period of time...
Definition questions are divided into the following type:
  • Person: question asking information about someone
    Q: Who is George Bush?
    A: The President of the United States of America
  • Organisation: questions asking information about an organization
    Q: What is the Cortes?
    A: Parliament of Spain
  • Other: question asking for the description of natural phenomena, technologies, legal procedures, etc.
    Q: What is Eurovision?
    A: Song contest

Correctness:

A token sequence will be considered as a correct answer if it consists of the lowest number of tokens that are required to contain the correct answer in the audio stream.

For instance, consider the following perfect manual transcription of an audio recording (with capitalization and punctuation):
The next meeting will be in Barcelona and it would be interesting to know who will be the responsable. It's the first time that the committee will meet in Barcelona, so we need to start finding an approapriated hotel for the event.
the following automatic transcription:
The next Met [unknown] being bears alone and it would be interested in tomorrow who will be the responsable [silence] it is the first time that they become i. t. will need in Berlin, so we need to start finding an approapriated hotel for the event
and the following question:
Where will the next meeting be organized? 

The right answer on the manual transcription is Barcelona because this is the minimum sequence of tokens required to express the right answer with respect to the audio.

The correct answer in the automatic transcription, however, is not unique. Possible answers are: bears alone and Berlin. This is because both are different sequences of tokens in the automatic transcriptions that contain the right answer in the audio with the minimum number of tokens. (an equivalent definition is possible regarding lattices)

Metrics:

The two following metrics used in CLEF will be used in the evaluation:
Mean Reciprocal Ranked (MRR) measures how well ranked is the right answer in the list of 5 possible answers in average.
Accuracy: The fraction of correct answers ranked in the first position in the list of 5 possible answers.

Submissions:

For each task, it is mandatory, to submit results for all the data: the manual transcriptions and the three ASR outputs (automatic transcriptions).
 
Each participant will be able to perform up to 48 runs (2 runs per task and transcription file), and to provide a list of 5 possible answers per question and run to ELDA, which will perform the evaluation procedure.
 
Each participant will send one archive (.tar or .zip) to ELDA. This archive should include all the submission files (one per task and run). A return receipt will be sent within 24 hours.