PLN-PMT: Natural
Language Processing for Massive Textual Data Management
Master
in Artificial
Intelligence
First term 2009/10
News
2009/09/16: Detailed program and the main materials for the course have
been posted
2009/09/16: Classes
will start next
week (October 22 and 23)
2009/09/15: The Web page has
been set; welcome to the course!
Timetable
Thursday: 11:00 -
13:00
Friday: 12:00 -
14:00
Course start:
October 22nd 2009
Rooms:
(Thursdays) A5104,
UPC, Campus Nord
(Fridays) A6105, UPC, Campus Nord
Advisors
Lluís
Màrquez
(LM, classes on Thursdays)
(Campus Nord, Omega-S120, lluism@lsi.upc.edu
)
Jordi Turmo
(JT, classes on Fridays)
(Campus Nord, Omega-215, turmo@lsi.upc.es)
Summary
The main goal of this course is to
provide the students with an in
depth knowledge of the techniques, methods and tools, both symbolic and
empirical, of Natural Language Processing (NLP). The course focuses on
the systems dealing with the analysis and processing of massive
quantities of textual data. The applications in this domain
usually work in a batch mode and have their basic framework in Internet
and very large textual data bases. After taking this course we expect
students to be familiar with the basic bibliography of this area of NLP
and have the capacity and skills for performing a future in-depth
research in any of the themes covered by the course. Also, the range of
applications studied allows the students to bridge the gap between the
language technologies studied and the real-world applications in which
they take part. A final goal of the course is the presentation of the
most active research areas within the topics of the course.
This course is highly coupled with the course covering Natural
Language
applications for person-machine communication (Natural Language
Processing for Human-Machine Communication). By taking both courses,
the student will be able to get a sufficient knowledge of the two basic
paradigms of NLP in the framework of the two most frequent
scenarios.
Find a full description of the course and the evaluation method here
(an even more complete description in Catalan)
Detailed program
1. Introduction
(5%)
1.1 The necessity of automatically processing
massive quantities of textual data.
Main
applications in this domain.
2. Advanced Topics in Machine Learning
(30%)
2.1 Review of the main concepts of Machine
Learning
2.2 Discriminative Learning Methods: Boosting,
Support Vector Machines
2.3 Machine Learning for relational and
structured prediction
2.4 Semi-supervised Learning: Bootstrapping,
co-training and variants not
covered this year
3. Generic Subtasks (20%)
3.1 Partial parsing:
chunking and clause boundary detection
3.2 Word
Sense Disambiguation not
covered this year
3.3 Semantic Role
Labeling
4. Information
Extraction: typology, adaptability,
multilinguality, evaluation
(45%)
5. Other Applications (5%)
4.1. Document
Categorization: thematic
classification, using hierarchies of concepts
from the
Web, subjective classification (intention, sentiment, etc.)
4.2.
Automatic Summarization: single document,
multi-document, multilingual
not covered this year
Scheduling
October 2009
22 (LM), 23
(JT), 29 (LM), 30 (LM)
November 2009
5
(LM), 6 (JT), 12 (LM), 13 (JT), 19
(LM), 20 (JT), 26 (LM)
December 2009
3
(LM), 4 (JT), 10 (LM), 11 (JT), 17 (LM), 18 (JT)
January 2010
14 (tentative): presentation and discussion of students'
complementary readings
21 (tentative): Public
presentation of students'
practical works
Course materials
First
package for topics in points 1, 2 and 3
Tutorial
on Semantic Role Labeling at ACL-IJCNLP 2009 (pdf,
bibliography)
(point 3.2)
Complementary
materials for points 1, 2 and 3
Slides
on Information Extraction (point 4):
Introduction
and architectures of IE
systems
Multilinguality
and Evaluation
Adaptability
Presentation of
complementary readings
List with candidate
papers to appear soon
Some guidelines for the presentation
Tentative
date: January 14, 2009 (morning session)
Room and
hour to be announced
Practical works
Some guidelines for the presentation
Tentative
date: January
21, 2009 (morning session)
Room and
hour to be announced
References
Natural
Language Processing
* R. Dale, H. Moisl, H.Somers, ed.
Handbook of natural Language
Processing, Marcel Dekker, New York, 2000.
* D. Jurafsky, James H. Martin.
Speech and Language Processing: An
Introduction to Natural Language Processing, Computational Linguistics,
and Speech Recognition, Prentice Hall, Upper Saddle River, N.J. ,2000.
* C. Manning, H. Schütze.
Foundations of statistical Natural
Language Processing, MIT Press Cambridge, Mass., 1999.
* R. Mitkov (editor). The Oxford
handbook of Computational Linguistics,
Oxford University Press, 2004.
Machine
Learning
* N. Cristianini and J.
Shawe-Taylor, An Introduction to Support Vector
Machines (and other kernel-based learning methods). Cambridge
University Press, 2000.
* Hastie, T., Tibshirani, R. and
Friedman, J. H. (2001). Elements of
Statistical Learning. Springer
* Tom Mitchell, Machine Learning,
McGraw Hill, 1997.
* J. Hernández-Orallo, M. J.
Ramírez-Quintana, C. Ferri.
Introducción a la Minería de Datos, Prentice Hall /
Addison-Wesley, 2004.
Surveys/Tutorials
on
techniques, tasks,
and applications
* Xavier Carreras,
Lluís
Màrquez, and Erique Romero. Máquinas
de Vectores Soporte,
Capítulo en Introducción
a la Minería de
Datos, Hernández, J.
and Ramírez and M. J. and
Ferri, C. (eds.), Pearson Prentice
* HC. J. C. Burges. A Tutorial
on Support Vector Machines for Pattern Recognition. Knowledge
Discovery and Data Mining, 2(2), 1998.all, 353-382.
* Ide, N., & Véronis, J.
(1998). Introduction
to the special issue on word sense disambiguation: the state of the art.
Computational Linguistics, 24(1), 1-40.
* L. Màrquez, G. Escudero,
D.
Martínez and G. Rigau. Supervised
Corpus-based Methods for Word Sense
Disambiguation. Chapter in
Eneko Agirre and Phil Edmonds (Eds.) Word Sense
Disambiguation. Algorithms
and
Applications, Kluwer,
2006 (draft
version
available).
* J. Turmo, A. Ageno, N.
Català (2006).
Adaptive Information Extraction. ACM Computing Surveys, vol. 38, issue
2. (draft
version in pdf)
* Fabrizio Sebastiani. Text
categorization. In Alessandro
Zanasi (ed.), Text Mining and
its Applications, WIT Press,
Southampton, UK, 2005, pp. 109--129.
* Fabrizio Sebastiani. Machine
learning in automated text categorization.
ACM Computing Surveys,
34(1):1-47, 2002.
* Alonso, Laura; Castellon, Irene;
Climent, Salvador; Fuentes,
María, Padró, Lluís; Rodríguez, Horacio
(2003)
Approaches
to Text Summarization: Questions and Answers.
Revista Iberoamericana de Inteligencia Artificial (noviembre de 2003).
Special Issue on Multilingual Information Access
* Mani, Inderjeet. Automatic
Summarization. John Benjamins,
xi+285pp, paperback ISBN 1-58811-060-5, Natural Language Processing, 3,
2001.
If you need more information don't hesitate to email me: lluism@lsi.upc.es
Last Update: October 16,
2009