%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CoNLL Shared Task on Semantic Role Labeling Authors: Xavier Carreras and Lluis Marquez TALP Research Center Technical University of Catalonia (UPC) Contact: carreras@lsi.upc.edu PropBank Version : PropBank-1.0 Created : January 2004 Revised : March 2005 Revised 2005/12/01 : Added options to process NomBank; see below This software is distributed to support the CoNLL-2005 Shared Task. It is free for research and educational purposes. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + LINKING PropBank TO THE TreeBank : link_tbpb.pl +------------------------------------------------------------ The link_tbpb.pl script takes a WSJ tree and PropBank annotations of a sentence, and outputs the sentece in the CoNLL-2005 shared task column format, with a column for each predicate of the sentence representing the arguments of the predicate. The script expects two parameters: the name of the file containing WSJ trees, and the name of the file containing the PropBank annotations corresponding to the trees. The PB annotations must be ordered with respect to sentence number (2nd column) and predicate position in each sentence (3rd column). In the srlconll package, the directory data contains two files, WSJ trees and PB annotations for the 1500 WSJ file. With such files, we can play with the link_tbpb.pl script : $ link_tbpb.pl data/tb_1500 data/pb_1500 |& less <...> Moreover, a number of options can be specified to control the behavior of link_tbpb. Executing the script without parameters lists the options with a brief explanation. The verbosity level can be controlled by the -v option. The messages always are output through the STDERR channel. Apart from this, the options control which propositions are processed (or, actually, which ones are skipped). When generating the data for the CoNLL-2005 shared task, only the options related to traces were used: -st and -ft. The command was: $ link_tbpb.pl -noi -st -ft data/tb_1500 data/pb_1500 > data/srltask_1500 <...> PREPROCESSING OF THE PropBank "prop.txt" file -------------------------------------------------- The PropBank project distributes the propositional annotations for the WSJ TreeBank in a single file, containing the annotations for all the WSJ sections and files of the section. However, the link_tbpb.pl script expects in the second argument the propositions corresponding to a WSJ TreeBank file. Thus, the PropBank "prop.txt" file must be split into the TreeBank file structure. The "nb_propsbyfiles.sh" script performs this operation for the WSJ sections which compose the CoNLL datasets. + + NOMBANK +------------------------- - Before using "link_pbtb.pl", it is necessary to split the nombank propositions file into many files, following WSJ TreeBank structure. To do so, use the "nb_probsbyfiles.sh" script under "bin". - With link_pbtb.pl use option "-nb" to read nombank entries. E.g. : $ link_pbtb.pl -nb data/tb_1500 data/nb_1500