Task #09: Multilevel Semantic Annotation of Catalan and Spanish

 

 

This section provides partial post-competition information including:

  • Task description and participant papers.
  • Full datasets (train + test) and gold standard.
  • Participants outputs.
  • Baselines and results obtained by the best participant system in the three subtasks on the test set.

The full version of this section will be available after the SemEval-2007 Workshop.

Task description and participant papers

(Links to the papers will be available after the SemEval-2007 Workshop.)
  • Task description paper:
    • SemEval-2007 Task 09: Multilevel Semantic Annotation of Catalan and Spanish.
      Lluís Màrquez, Luis Villarejo, M. Antònia Martí and Mariona Taulé.

  • Participant papers:
    • UPC: Experiments with Joint Learning within SemEval Task 9.
      Lluís Màrquez, Lluís Padro, Mihai Surdeanu and Luis Villarejo
    • ILK2: Semantic Role Labeling of Catalan and Spanish using TiMBL.
      Roser Morante and Bertjan Busser.

Full datasets and gold standard


Participants outputs

Only 2 participants from a set of a dozen interested teams.
  • ILK2 (Tilburg University): SRL system is based on memory-based classification of syntactic constituents using a rich feature set (including semantic features from WordNet generalizations). A post-process using manual rules was performed to improve results on adjuncts. [ILK2 output]
  • UPC (Technical University of Catalonia) used several machine learning algorithms for addressing the different subtasks (AdaBoost, SVM, Perceptron). For SRL, the system implements a re-ranking strategy using global features. The candidates are generated using a state-of-the-art SRL base system. [UPC output]
No system attempted a collaborative resolution of several subproblems.

Baselines & results

Baselines and results are presented along 2 dimensions:
  • (a) language ('ca' = Catalan; 'es' = Spanish)
  • (b) corpus source ('in' = in domain corpus; 'out' = out of domain corpus)
We will use a 'language.source' pair to denote a particular test set. And finally, '*' will denote the addition of the two subcorpora, either in the language or source dimensions.


  • Baseline and best system results on the NERC subtask: 

  • Test Baseline Best system
    Prec. Recall F1 Prec. Recall F1
    ca.* 75.85 15.45 25.68 80.94 77.96 79.42
    es.* 71.88 12.07 20.66 70.65 65.69 68.08
    *.in 83.06 17.43 28.82 78.21 74.04 76.09
    *.out 68.63 12.20 20.72 76.21 72.51 74.31
    *.* 74.45 14.11 23.72 76.93 73.08 74.96


  • Baseline and best system accuracies on the NSD subtask: 

  • Test All words Selected words
    Baseline Best system Baseline Best system
    ca.* 85.49 86.47 70.06 72.75
    es.* 84.22 85.10 61.80 65.17
    *.in 84.84 86.49 67.30 72.24
    *.out 85.02 85.33 67.07 67.87
    *.* 84.94 85.87 67.19 70.12


  • Baseline and best system results on the SRL subtask: semantic class tagging (SC) 

  • Test Baseline Best system
    F1 Prec. Recall F1
    ca.* 63.99 90.25 88.50 89.37
    es.* 49.21 84.30 83.63 83.83
    *.in 52.50 84.68 83.11 83.89
    *.out 60.69 90.04 89.08 89.56
    *.* 56.60 87.12 85.81 86.46


  • Baseline and best system results on the SRL subtask: semantic role labeling (SR) 

  • Test Baseline Best system
    Prec. Recall F1 Prec. Recall F1
    ca.* 83.28 76.88 79.95 84.72 82.12 83.40
    es.* 81.61 76.05 78.73 84.30 83.98 84.14
    *.in 82.07 80.70 81.38 84.71 84.12 84.41
    *.out 82.88 71.48 76.76 84.26 81.84 83.03
    *.* 82.42 76.46 79.32 84.50 83.07 83.78


Last update: June 15th, 2007


 For more information, visit the SemEval-2007 home page.