================================================================ Comparative Study of Learning Approaches for Sequential Labeling: A Case Study on Syntactic Chunking ================================================================ GOAL: The goal of this work is to compare several learning architectures for sequential learning on the concrete NLP task of syntactic base chunking. The problem datasets and feature types will be fixed and the comparative study will be performed among learning approaches at several levels: accuracy on the task, efficiency (training and test), parameter tuning (stability), sensitivity to the features, etc. SETTING The task setting will be fixed to that of the CoNLL-2000 shared task on chunking. Please, visit the website: http://www.cnts.ua.ac.be/conll2000/chunking to obtain the datasets, the problem description, and a large state-of-the-art bibliography from which you can borrow the set of standard features commonly used in the task. COMPARATIVE STUDY Regarding the learning approaches to compare, you can consider, in increasing level of complexity: * A simple HMM-based tagger (e.g., use TnT with the lexical specialization described at http://www.dsic.upv.es/~fpla/demo.html) * A machine learning sequential tagger based on local classifiers trained for each of the labels. Here, we can diferentiate between greedy local inference with/without left chaining and Viterbi style decoding for optimizing the probability of the whole sequence. You may use: SVMtool (for fast linear SVM training/decoding), YamCha (if you want to use kernels), Maximum Entropy models (MEMM), which can be found in the MALLET suite. * A global learning algorithm for sequential labeling. For instance you can use Conditional Random Fields (find them at MALLET) or SVMstruct if you want to apply a large margin Other approaches suggested by students can also be considered, but first you have to have your proposal accepted by the course advisor. Links to all the software mentioned above are posted at the course webpage. You have to try a minimum of three approaches (one per student). You can work independently, but make sure to share exactly the same setting and then to make an integrated comparison and presentation. The level of difficulty may vary greatly depending on your choices. Please, contact L. Màrquez before starting your work.