The participants in QAst track will be able
to deal with four different tasks, each one involving
different data sets (automatic or manual transcriptions) related to two
different scenarios (lectures in English about "speech and
language processing" or meetings in English about "design
of television remote control").
The tasks defined in QAst track are the following:
T1: Question Answering in manual transcriptions of lectures.
T2: Question Answering in automatic transcriptions of
lectures.
(Word lattices from an ASR will be made
available as an additional input source, for systems that
prefer to decide internally what the best automatic
segmentation is.)
T3: Question Answering in manual trancripts of meetings.
T4: Question Answering in automatic transcriptions of meetings.
Evaluation procedure:
For each one of the scenarios, a development set and an evaluation set will be released:
Development sets:
Lectures: 10 seminars and 50 questions (the same 10 seminars used in the CHIL evaluation).
Meetings: 50 meetings and 50 questions.
Evaluation sets:
Lectures: 15 seminars (probably more if more records become available) and 100 questions.
Meetings: 118 meetings and 100 questions.
All the questions will be written in English. The possible
answer types can be the following:
person
location
organization
language
system/method
measure (including money -e.g., twelve fifty euros, Pound- and percent)
time (including data and duration -e.g., two weeks, couple of minutes-)
colour -e.g., grey, tomato red-
shape -e.g., curve, square, oyster shape-
material -e.g., plastic, foam, rubber-
Each participant will be able to perform 2 runs for each
one of the tasks they select, and to provide a list of 5 possible
answers per question and run to ELDA, which will perform the evaluation
procedure. It is desirable, but not mandatory, to perform both
tasks related to the same scenario (i.e. T1+T2 and T3+T4).
Correctness:
A token sequence will be considered as a correct answer if
it consists of the lowest number of tokens that are required to contain
the correct answer in the audio stream.
For instance, consider the following perfect
manual transcription of an audio recording (with
capitalization and punctuation):
The next meeting will be in Barcelona and it would
be interesting to know who will be the responsable. It's the
first time that the committee will meet in Barcelona, so we need
to start finding an approapriated hotel for the event.
the following automatic transcription:
The next Met [unknown] being bears alone and it would
be interested in tomorrow who will be the responsable
[silence] it is the first time that they become i. t. will
need in Berlin, so we need to start finding an approapriated
hotel for the event
and the following question:
Where will the next meeting be organized?
The right answer on the manual transcription is
Barcelona because this is the minimum sequence of
tokens required to express the right answer with respect to the audio.
The correct answer in the automatic transcription, however, is
not unique. Possible answers are: bears alone and
Berlin. This is because both are different sequences of tokens in the automatic
transcriptions that contain the right answer in the audio with
the minimum number of tokens. (an equivalent definition is
possible regarding lattices)
Metrics:
The two following metrics used in CLEF will be used
in the evaluation:
Mean Reciprocal Ranked (MRR) measures how well ranked is the right answer in the list of 5 possible answers in average.
Accuracy: The fraction of correct answers ranked in the first position in the list of 5 possible answers.
Participants:
In CLEF 2006 several groups were interested
in this task, including Laboratoire d'Informatique pour la
mécanique et les sciences de l'ingénieur (LIMSI), Tokyo Institute of
Technology and Univ. Politècnica de Catalunya (UPC).
Please, contact the organizers if you are interested in
participating in this track.
Submissions:
Each participant will send one archive (.tar or .zip) to
ELDA. This archive should
include all the submission files (one per task and run). Each one of
these submission files should consist of the following template per
line: