================================================================ [2] Semantic Role Labeling. Combining outputs ================================================================ GOAL: The goal of this work is to use an already built Semantic Role Labeling system (SwiRL) to perform several experiments on the CoNLL-2005 datasets. SETTING: The task setting will be fixed to that of the CoNLL-2005 shared task on Semantic Role Labeling. Please, visit the website: http://www.lsi.upc.edu/~srlconll/ to obtain the datasets, the problem description, useful scripts for data processing, and a large state-of-the-art bibliography. SOFTWARE: The SRL prototype to be used is SwiRL, an standard tree-node labeling system that uses both AdaBoost and SVMs as basic learners, and which is released under a GNU GPL license. Find it at the following URL http://www.lsi.upc.edu/~surdeanu/swirl.html. STEPS TO BE CARRIED OUT * Download, install and check the SwiRL software on the CoNLL-2005 datasets. Also, read the description paper of the core of SwiRL (Surdeanu and Turmo, 2005). * Use the Charniak parser that comes with the distribution to generate the N best parses from the training data (say the 5 best trees). You can convert these annotations to column-style formatting (CoNLL-2005) using the scripts downloaded from the shared task webpage (srlconll-1.0.tgz package; wsj-to-se.pl script). Generate N variants of the training set, one with each of the syntactic parses, and train a different SwiRL system from each of the training sets. * Test the previous N systems on the test set. Is the performe degrading with lower-scoring parse trees? * Program a simple combination module (voting style) that combines the output of the N SRL systems to generate a new single output. To go a step further the simple voting combination,(weighted combination) you can make use of the probabilities of the source parse trees and the scores assigned by SwiRL to each of the arguments (to obtain these numbers you might have to go over the source code of SwiRL). No learning is intended to be applyied in this step. NOTE-1: due to the high computational cost required in each experiment, we suggest you to select a subset of the training corpus to do all the experiments regarding tuning and system development (say 20% of the data). Then you can run a single final experiment on the complete dataset (if you have time and resources available). * Report the best F1 values obtained with the combination system and compare them to the basic system and also to the state-of-the-art (see the CoNLL-2005 webpage; use always the official scorer). OPTIONAL (just in case you are still hungry!) [a] Check the impact of feature types: report results of the basic SwiRL system trained with the different families of features separately (you have to modify the source code of SwiRL). Where does the biggest contribution come from? [b] Training set size: provide a learning curve with performance for increasing training set sizes (e.g., 1%, 5%, 10%, 20%,... 100%). Is the training set size critical? ================================================================