The participants in QAST 2009 will be able to deal with six different tasks, each one involving different data sets related to European Parliament debates in English (T1) and Spanish (T2), and French broadcast news (T3), including the manual transcriptions and three different ASR outputs (automatic transcriptions).
The tasks defined in the QAST track are the following:
Evaluation procedure:For each one of the corpora, a development set and an evaluation set will be released:
The ratio of factual questions, definition questions and NIL questions (e.g. questions having no answer in the document collection) depends on the corpus (EPPS EN, EPPS ES, ESTER) due to the fact that the questions have been generated in an spontaneous way.
Factual questions are those whose expected answer is a Named Entity. The possible Named Entities are:
Correctness:A token sequence will be considered as a correct answer if it consists of the lowest number of tokens that are required to contain the correct answer in the audio stream.
For instance, consider the following perfect manual transcription of an audio recording (with capitalization and punctuation):
The next meeting will be in Barcelona and it would be interesting to know who will be the responsable. It's the first time that the committee will meet in Barcelona, so we need to start finding an approapriated hotel for the event.the following automatic transcription:
The next Met [unknown] being bears alone and it would be interested in tomorrow who will be the responsable [silence] it is the first time that they become i. t. will need in Berlin, so we need to start finding an approapriated hotel for the eventand the following question:
Where will the next meeting be organized?
The right answer on the manual transcription is Barcelona because this is the minimum sequence of tokens required to express the right answer with respect to the audio.
The correct answer in the automatic transcription, however, is not unique. Possible answers are: bears alone and Berlin. This is because both are different sequences of tokens in the automatic transcriptions that contain the right answer in the audio with the minimum number of tokens. (an equivalent definition is possible regarding lattices)
Metrics:The two following metrics used in CLEF will be used in the evaluation:
Mean Reciprocal Ranked (MRR) measures how well ranked is the right answer in the list of 5 possible answers in average.
Accuracy: The fraction of correct answers ranked in the first position in the list of 5 possible answers.
For each task, it is mandatory, to submit results for all the data: the manual transcriptions and the three ASR outputs (automatic transcriptions).
Each participant will be able to perform up to 48 runs (2 runs per task and transcription file), and to provide a list of 5 possible answers per question and run to ELDA, which will perform the evaluation procedure.
Each participant will send one archive (.tar or .zip) to ELDA. This archive should include all the submission files (one per task and run). A return receipt will be sent within 24 hours.