Menu Content/Inhalt
Background arrow Resources

News

REGISTRATION IS CLOSED
Data release

Past Editions

QAst 2007
QAst 2008

Resources

The data for the QAst pilot track consists of three different resources, each one for dealing with a different language:
  1. French broadcast news scenario: the ESTER corpus will be used. This corpus is made up of 10 hours of broadcast news for French from different sources (Radio France International, France Inter, France Culture, Radio Classique, Radio Television du Maroc). There are 3 different automatic speech recognition outputs with different WER. The manual transcriptions were done by ELDA.
  2. Spanish parliament scenario: the TC-STAR05 and TC-STAR06 EPPS Spanish corpus will be used. This corpus is made of 6 hours of recordings from the European Parliament in Spanish. The data was firstly used in the TC-STAR project. There are 3 different automatic speech recognition outputs with different WER. The manual transcriptions were done by ELDA.
  3. English parliament scenario: the TC-STAR05 and TC-STAR06 EPPS English corpus will be used. This corpus is made of 6 hours of recordings from the European Parliament in English. The data was firstly used in the TC-STAR project. There are 3 different automatic speech recognition outputs with different WER. The manual transcriptions were done by ELDA.

How to get the data

If you are interested in participating in the QAst tasks, please first visit www.clef-campaign.org -> 2009 -> "How to Participate" for instructions and registration form.
Download the agreement document and follow the instructions explained in order to be provided with the appropriated data by ELDA.