Some time ago, building a supervised WSD system implied compiling a corpus, performing the manual semantic annotation, applying semantic and syntactic analyzers on the corpus, applying a feature extractor, and implementing some automatic learning technique to finally have a system. Today, we have access to lots of freely available resources and the effort in building a supervised WSD system has been dramatically reduced.
The task for this assignment is to build two versions of a supervised WSD system making use of freely available resources.
In the context of the Semeval 2007 competition, a lot of resources have been made available. Among them, the IXA NLP group has released machine learning features for all content words with more than 10 occurrences in SemCor. These features can be freely used for developing all-words supervised Word Sense Disambiguation systems. The sense tags correspond to synsets of WordNet v. 1.6, but the senses can be easily mapped to other versions (see for instance http://www.lsi.upc.es/~nlp/tools/mapping.html).
You can download it from the Semeval WSD-CLIR task website or directly from here.
The task consists in:
Write a short report describing the experiment set, the features you used and focusing on the differences in the results achieved by the two systems (around 3 pages).
Here you can find a study by Audibert on different feature types and their impact in WSD. Be creative! ;)
Word sense disambiguation criteria: a systematic study (Audibert L. 2004) In Proceedings of the 20th international conference on Computational Linguistics
Don't hesitate to send me any questions/doubts to villarejo at gmail dot com