Task #09: Multilevel Semantic Annotation of Catalan and Spanish


Home     Technical Setting   Download     Systems & Results

Formats and examples | Data | Evaluation

Formats and examples

Here you can find an example of a fully annotated sentence in the column-based format:   Updated! 9th March

INPUT-------------------------------------------------------------->  OUTPUT-------------------------------------->
BASIC_INPUT_INFO----->  EXTRA_INPUT_INFO--------------------------->  NE--->  NS------>  SR----------------------->
WORD        TN  TV  LEMMA       POS      SYNTAX              NE      NS        SC  PROPS---------------->

Las             -   -   el          da0fp0   (S(sn-SUJ(espec.fp*)    *   -          -            *   (Arg1-TEM*
conclusiones    *   -   conclusión  ncfp000        (grup.nom.fp*      *   05059980n  -            *           *
de              -   -   de          sps00              (sp(prep*)      *   -          -            *           *
la              -   -   el          da0fs0         (sn(espec.fs*) (ORG*   -          -            *           *
comisión        *   -   comisión    ncfs000        (grup.nom.fs*      *   06172564n  -            *           *
Zapatero        -   -   Zapatero    np00000           (grup.nom*) (PER*)  -          -            *           *
,               -   -   ,           Fc                   (S.F.R*       *   -          -            *           *
que             -   -   que         pr0cn000       (relatiu-SUJ*)         *   -          -   (Arg0-CAU*)          *
ampliará        -   *   ampliar     vmif3s0                 (gv*)      *   -          a1         (V*)          *
el              -   -   el          da0ms0      (sn-CD(espec.ms*)         *   -          -   (Arg1-PAT*           *
plazo           *   -   plazo       ncms000        (grup.nom.ms*          *   10935385n  -            *           *
de              -   -   de          sps00              (sp(prep*)      *   -          -            *           *
trabajo         *   -   trabajo     ncms000     (sn(grup.nom.ms*)))))     *   00377835n  -            *)          *
,               -   -   ,           Fc                         *))))))    *)  -          -            *           *)
quedan          -   *   quedar      vmip3p0                 (gv*)         *   -          b3           *         (V*)
para            -   -   para        sps00           (sp-CC(prep*)         *   -          -            *  (ArgM-TMP*
después_del     -   -   después_del spcms              (sp(prep*)         *   -          -            *           *
verano          *   -   verano      ncms000     (sn(grup.nom.ms*))))      *   10946199n  -            *           *)
.               -   -   .           Fp                         *)         *   -          -            *           *

There is one line for each token, and a blank line after the last token of each sentence. The columns, separated by blank spaces, represent different annotations of the sentence with a tagging along words. For structured annotations (named entities, parse trees and arguments), we use the Start-End format.

The Start-End format represents phrases (syntactic constituents, named entities, and arguments) that constitute a well-formed bracketing in a sentence (that is, phrases do not overlap, though they admit embedding). Each tag is of the form STARTS*ENDS, and represents phrases that start and end at the corresponding word. A phrase of type k places a (k parenthesis at the STARTS part of the first word, and a ) parenthesis at the END part of the last word.

The different annotations in a sentence are grouped in five main categories:

[1] BASIC_INPUT_INFO. The basic input information that the participants need:
  • WORD (column 1): words of the sentence.
  • TN (column 2): target nouns of the sentence (those that are to be assigned WordNet synsets); marked with '*'
  • TV (column 3): target verbs of the sentence (those that are to be annotated with semantic roles); marked with '*'
[2] EXTRA_INPUT_INFO. The extra input information provided to the participants:
  • LEMA (column 4): lemmas of the words
  • POS (column 5): part-of-speech tags
  • SYNTAX (column 6): Full syntactic tree.
[3] NE (column 7). Named Entities
     (output information = to be predicted when testing ; available only for trial/training sets).

[4] NS (column 8). WordNet sense of target nouns (output information)

[5] SR. Information on semantic roles:
  • SC (column 9). The lexico-semantic class of the verb (output information).
  • PROPS (columns 10-[10+N-1]). For each of the N target verb, a column representing the argument structure of the target verb (output information). Core numbered arguments are enriched with the thematic role label (e.g., Arg1-TEM). ArgM's are the adjuncts. Columns have been ordered according to the textual order of the predicates. By textual order we understand that the left-most SRL column corresponds to the first predicate in the sentence, the second left-most SRL column corresponds to the second predicate in the sentence and so on. Both prediction and gold files must follow textual order of the predicates since it is a requirement to perform the evaluation.   Updated! 9th March

Note 1
  Full documentation explaining the tagsets and other information needed to understand all the annotation levels is provided through the Download section

Note 2  All these annotations in column format are extracted automatically from the syntactic-semantic trees from the CESS-ECE corpora, which are also distributed with the datasets (see description below). These are constituency trees enriched with semantic labels for NE, NS and SR. The format is similar to that of Penn Treebank and it is fully described in the accompanying documentation. As an example, the following tree represents the complete previous example in column format.
        (da0fp0 Las el))
        (ncfp000 conclusiones conclusión 01207975n)
            (sps00 de de))
              (da0fs0 la el))
              (ncfs000 comisión comisión 01207975n)
                  (np0000p Zapatero Zapatero)))
                (Fc , ,)
                  (pr0cn000 que que))
                  (vmif3s0 ampliará ampliar-a1))
                    (da0ms0 el el))
                    (ncms000 plazo plazo 01207975n)
                        (sps00 de de))
                          (ncms000 trabajo trabajo 01207975n))))))
                (Fc , ,)))))))
      (vmip3p0 quedan quedar-b3))
        (sps00 para para))
          (spcms después_del después_del))
            (ncms000 verano verano 01207975n)))))
    (Fp . .)))

The scripts for automatically converting these trees into the column format are also distributed as part of the resources for the task. If you want to use them, see the Download section for instructions on how to download, install, and use the software.


The data corpus is split into three sets, i.e., trial, training, and test sets. All of them share the above formatting, with the exception that trial and training sets contain both input and output information, while the test set only comes with the input information. The output columns (NE, NS, and SR) of the test set are the semantic information to be produced by the participant systems (also in the same column format).
Trial datasets have been released on January 10, 2007. Training and test sets were distributed in compliance with the general scheduling of the SemEval-2007 evaluation period (see the official Webpage).

When registering as participants, which will occur just before downloading training data, teams will have to sign a license agreement (free of any charge) which will allow them to use all data and resources for the SemEval-2007 evaluation exercise and also for any academic and research purpose. Visit the Download section to get the license form.

Note 1  Initial definitions of the task assumed that we would be able to produce both gold-standard and automatic syntactic annotations (using state-of-the-art statistical parsers for Spanish and Catalan). Our goal was to put the evaluation under a realistic scenario and provide only automatic parses for the test sets. Unfortunately, the parsers are not yet ready so we will have to restrict the evaluation to the use of gold-standard syntactic annotation.

Note 2  Datasets are not to be distributed in this task specific webpage. Downloading of datasets is centralized by the SemEval-2007 official website.


The evaluation of task#9 was run under the general scheduling of the SemEval-2007 evaluation and following the general rules described in the Guidelines-for-participants document.  Visit the official Semeval-2007 website to consult this information. However, the evaluation period for task#9 was extended to 4 weeks. This means that from the moment you download the training dataset you will have 4 weeks to upload the outputs of your system on the test dataset. Updated information!

Note that the trial dataset has been updated (a number of errors have been fixed) and posted again on February 22. Download the new version at the SemEval-2007 webpage for task #9 and check the changes in the README file. The corrected trial dataset is already included in the training material, so you don't have to take the trial dataset into account anymore for developing the final systems. Updated information!

Also, it is not forbidden to use some external resources apart from the training datasets to produce a system for the task. However, we strongly encourage participant teams to explicitely comment on all the external resources used in the system description paper to be prepared after the evaluation period. By "external resources" we mean any knowledge or data that cannot be directly inferred from the training sets provided in this release. Updated information!

As said in the task description, we will use standard evaluation metrics for each of the defined subtasks (SRL, NSD, NER), that is precision/recall/F1 measures since they are basically recognition tasks. Classification accuracy will be also calculated for verb disambiguation and NSD. Consult the Download section to find the evaluation software. The first version including  basic evaluation measures for all subtasks has already been posted.

The evaluation on the test set will be carried out by the organizers based on the outputs submitted by participant systems. When analyzing systems' performance two verbosity levels will be set:
  • All systems will be ranked and studied according to the official evaluation metrics in each of the six subtasks independently (SRL-ca, NSD-ca, NER-ca, SRL-es, NSD-es, NER-es).

  • Additionally, global measures will be derived as a combination of all partial evaluations to rank systems' performance per language and for the complete global task (language independent).
The organization will prepare a baseline processors for each of the subtasks. Participant teams not presenting results in any of the subtasks will be evaluated using the baseline processors in those tasks in order to get global performance scores. Participants are free to download the processors and improve them directly.

Note 3
  Columns have been ordered according to the textual order of the predicates. By textual order we understand that the left-most SRL column corresponds to the first predicate in the sentence, the second left-most SRL column corresponds to the second predicate in the sentence and so on. Both prediction and gold files must follow textual order of the predicates since it is a requirement to perform the evaluation.   Updated! 9th March

Last update: May 22nd, 2007

 For more information, visit the SemEval-2007 home page.