German Rigau
Claramunt Home Page
Associate Professor
PhD
Personal
Bookmarks
Teaching Activities
Teaching
Bookmarks Calendars
Research Activities
Research
Bookmarks Call
for Papers NLP
group Seminar 99 NLP
group Seminar 00 SVM
Seminar 01 AI
group seminar
I am a member of the Artificial
Intelligence Section of my department,
and more specifically, I work in the Natural
Language Processing Research Group within TALP
Research Center for Language and Speech Technology and Applications.
I research on the automatic acquisition of lexical knowledge from MRDs
and Word Sense Disambiguation. Recently, we participated in SENSEVAL,
a Word Sense Disambiguation evaluation exercise (sponsored by ACL SIGLEX).
We are now involved in the second edition SENSEVAL-2.
Co-authored with Eneko Agirre,
Semi-automatic
Methods for WordNet construction tutorial of Global WordNet in Mysore,
India 2002.
Related topics: Artificial Intelligence,
Natural Language Processing, Computational Linguistics, Lexical Semantics.
PhD thesis
Automatic
Acquisition of Lexical Knowledge from MRDs
German
Rigau i Claramunt
Advisor:
Horacio Rodríguez Hontoria.
PhD Thesis,
Departament de Llenguatges i Sistemes Informàtics.
Universitat
Politècnica de Catalunya.
Barcelona,
1998.
PhD thesis Students
Jordi
Atserias (co-advisor Lluís
Padró)
Jordi
Daudé (co-advisor Lluís
Padró)
Gerard
Escudero (co-advisor Lluís
Màrquez)
Andrés
Montoyo (co-advisor Manuel
Palomar)
Toni Tuells
Past projects
-
Acquilex
I and II
Duration: 3 + 3 years
Funding: European Commission
under the Basic Research
initiative (BRA-3030 and BRA-7315).
Summary: The goal of the
first project was to explore the utility of constructing a multilingual
lexical knowledge base from machine-readable versions of conventional dictionaries.
The second project extended this goal by exploring the utility of machine
readable textual corpora as a source of lexical information not coded in
conventional dictionaries, and by adding dictionary publishing partners
to exploit the lexical database and corpus extraction software developed
by the projects for conventional lexicography. The ACQUILEX II project
finished in September 1995. Partners in this project are: University
of Cambridge,
University of
Amsterdam, Instituto
di Linguistica Computazionale of CNR, and the cooperation of the NL
research groups from UPC
and
UB.
-
EuroWordNet:
Building a multilingual wordnet with semantic relations between words
Duration: 3 years
Funding: European Union
through Language Engineering
sector of the Telematics
Applications Programme (LE-2
4003).
Summary: This project aims
to develop a generic multilingual database with WordNets for several European
languages -English, Dutch, Italian and Spanish- with 30,000 senses each
one. Those WordNets will be linked through the English WordNet, so each
English synonym will be associated with its equivalent in the other languages.
Partners in this project are: University
of Amsterdam, University
of Sheffield, Instituto
di Linguistica Computazionale of CNR, Lernout
& Hauspie and the NL research groups from UPC,
UNED and UB.
Wordnet homepage at http://www.cogsci.princeton.edu/~wn/
Now you can try our Web Interface
to Multilingual (English, Spanish, Catalan and Basque) WordNet.
EuroWordNet homepage at http://www.illc.uva.nl/EuroWordNet/
Now you can try our Web Interface
to Multilingual (English, Spanish, Catalan and Basque) EuroWordNet.
Now available the final results of EuroWordNet.
Now the complete mapping between WordNet 1.5 and WordNet 1.6.
Now the complete mapping between WordNet 1.6 and WordNet 1.7.
-
ITEM:
Textual Information Retrieval in a multilingual environment using NL Techniques
Duration: 3 years
Funding: CICYT (TIC96-1234-C03-02)
Summary: The project aims
to explore how natural language processing techniques can improve state
of the art of retrieval systems on full-text and multimedial databases.
The project aims to find improvements in methods to retrieve relevant documents
and to allow multilingual acces to them.
ITEM is a project developed by our
group from Technical University of Catalonia (UPC),
toghether with members of the University of Barcelona (UB)
Computational Linguistics research
group, the NLP group
from Basque Country University (EHU), and
the NLP research group in the
Universidad Nacional de Educación a Distancia (UNED).
Researchers from the Jaume I University of Castellón also participate
and work in close relationship with us. More information on the ITEM
project.
Now
you can test our analyzers
for unrestricted Spanish and Catalan text.
Now
you can test the Cross-Language
Information Retrieval System developed by UNED.
-
RILE:
Spanish Linguistic Resources Server
Duration:
6 months.
Funding: ATYCA (Culture
and Language sector, Technologies and Aplications for the Information Society
program). Ministerio de Industria y Energia.
Summary: This project is
a pilot test for the development of a Spanish Linguistic Resources Server
including some of the linguistic engineering tools developed in ITEM.
This project is developed by our
group from Technical University of Catalonia (UPC),
toghether with members of the University of Barcelona (UB)
Computational Linguistics research
group, the NLP research group
in the Universidad Nacional de Educación a Distancia (UNED),
Sema
Group and Observatorio
Español de Industrias de la Lengua - Instituto Cervantes.
-
Catalan
WordNet
Duration: 3 years
Funding:CREL
(Centre de Referència en Enginyeria Lingüística) recently
created by CIRIT.
Summary: This project aims
to develop the Catalan WordNet following the framework of EuroWordNet.
The Catalan WordNet will be linked through the English WordNet, so each
English synonym will be associated with its equivalent in Catalan.
The partner of this project is the NL
research group of UB.
Wordnet homepage at http://www.cogsci.princeton.edu/~wn/
Now
you can try our Web Interface
to Multilingual (English, Spanish, Catalan and Basque) WordNet.
EuroWordNet homepage at http://www.illc.uva.nl/EuroWordNet
Now the complete mapping between WordNet 1.5 and WordNet 1.6.
Now the complete mapping between WordNet 1.6 and WordNet 1.7.
Spontaneus-Speech
Dialogue System In Limited Domains
Duration: 3 years
Funding: CICYT (TIC98-423-C06)
Summary: This project propose
the development of an oral human-machine interface, by way of dialogue,
for a semantically limited task (for instance: queries into a database
with information regarding ticket reservation, tourist information or first
aid care, or similar tasks). The oral interfaces based on dialogue will
allow the development of services unavailable today, for accessing information
or help or for carrying out transactions over the telephone (alternatively,
they may render current services with important performance shortcomings
more efficient).
This project is developed by our
group from Technical University of Catalonia (UPC),
toghether with Computational Linguistics research
group of the University of Barcelona (UB),
Grupo
de Tratamiento del Habla also from the (UPC),
the Grupo de Reconocimiento Automático
del Habla from Basque Country University (EHU),
the Grupo de Aprendizaje
Computacional, Reconocimiento Automático y Traducción del
Habla from the Jaume I University of Castellón, the
Grupo de Reconocimiento de Formas e Inteligencia Artificial from the
University of Valencia, and the Grupo
de Tecnologias de las Comunicaciones from the University of Zaragoza.
More information on the project
.
-
NAMIC:
News Agencies Multilingual Information Categorisation
Duration:
2 years.
Funding: European Union
through Information Society Technologies
programme (IST-1999-12392).
Summary: This project aims
to develop and bring to marketable stage advanced technologies of Natural
Language Processing for multilingual news customisation and broadcasting
throughout distributed services, a major problem for International and
National News Agencies (NA) as well as for the spread of Web technologies.
Within their own business cases, NAs need to integrate within their own
repositories news distributed by other NAs usually in different languages
and according to different classification standards. Mismatching is at
language level, a different languages are used, as well as at conceptual
level, as the organization/storage of news proceeds according to diverging
schemes.This project will include the linguistic engineering tools developed
in
ITEM , RILE
and EuroWordNet.
This project is developed by our
group from Technical University of Catalonia (UPC),
toghether with members of the University of Barcelona (UB)
Computational Linguistics research
group with the research groups of University
of Sheffield, University
of Roma Tor Vergata and Vrije Universiteit
Brussel, and Itaca, International
Press Telecommunications Council, Agenzia
ANSA, Agencia EFE and Financial
Times.
Current projects
-
HERMES:
Electronic Libraries with Multilingual Information Retrieval and Semantic
Processing
Duration: 3 years (2001-2003)
Funding: CICYT (TIC2000-0335-C03-02)
Summary: In the HERMES project
two applications will be developed which facilitate the retrieval of multilingual
textual information. These applications will offer significant improvements
over current systems due to the incorporation of techniques taken from
the field of linguistic engineering. The improvements will be seen interms
of the relevance of the obtained results and also in the analysis and presentation
of the selected information. The first application will be a system that
deals with requests for information contained in a multilingual (Basque,
Catalan, English and Spanish) digital news archive. The second application
will be a Web-based multilingual search system of online news (e.g., Web-based
newspapers).
HERMES is a project developed by
our
group from Technical University of Catalonia (UPC),
toghether with members of the University of Barcelona (UB)
Computational Linguistics research
group, the NLP group
from Basque Country University (EHU), and
the NLP research group in the
Universidad Nacional de Educación a Distancia (UNED).
-
MEANING:
Developing Multilingual Web-scale Language Technologies
Duration: 3
years (2002-2005).
Funding: European Union
through Information Society Technologies
programme (IST-2001-34460).
Summary:
MEANING
will be concerned with automatically collecting and analysing language
data from the WWW on a large scale, and building more comprehensive multilingual
lexical knowledge bases to support improved word sense disambiguation (WSD).
Current web access applications are based
on words; MEANING will open the way for access to the Multilingual Web
based on concepts, providing applications with capabilities that significantly
exceed those currently available. MEANING will facilitate development of
concept-based open domain Internet applications (such as Question/Answering,
Cross Lingual Information Retrieval, Summarisation, Text Categorisation,
Event Tracking, Information Extraction, Machine Translation, etc.). Furthermore,
MEANING will supply a common conceptual structure to Internet documents,
thus facilitating knowledge management of web content.
This project will include the linguistic
engineering tools developed in
ITEM
, RILE and EuroWordNet and NAMIC.
This project is developed by our
group from Technical University of Catalonia (UPC)
with the research groups of ITC-IRST,
Ixa
Group at EHU,
University
of Sussex,
Irion Technologies B.V.
and Reuters Limited.
Publications
-
Ageno A., Castellón I., Martí
M.A., Ribas F., Rigau G., Rodriguez H., Taulé M. and Verdejo M.F.,SEISD:
An environment for extraction of Semantic Information from on-line dictionaries.
Proceedings of the 3rd Conference on Applied Natural Language Processing
ANLP'92. Trento, Italy, 1992.
-
Rigau G., An
experiment on Automatic Semantic Tagging of Dictionary Senses.
WorkShop The Future of Dictionary. Aix-les-Bains, France, 1994.
Also as a research report LSI-95-31-R. Dept. de Llenguatges i Sistemes
Informàtics. UPC. Barcelona. June 1995.
-
Ageno A., Ribas F., Rigau G., Rodríguez
H. and Samiotou A., TGE:
Tlinks Generation Enviroment. Proceedings of 15th International
Conference on Computational Linguistics, COLING'94. Kyoto, Japan, 1994.
-
Copestake A., Briscoe E., Vossen P., Ageno
A., Castellón I., Ribas F., Rigau G., Rodríguez H. and Samiotou
A., Acquisition
of Lexical Translation Relations from MRDs. Machine Translation:
Special Issue on the lexicon, 9:3,33-69, 1995.
-
Rigau G. and Agirre E., Disambiguating
bilingual nominal entries against WordNet. Proceedings of workshop
"The Computational Lexicon". Ed. F. Verdejo. 7th European Summer School
in Logic, Language and Information, ESSLLI'95. Barcelona, Spain, 1995.
-
Agirre E. and Rigau G., A
Proposal for Word Sense Disambiguation using Conceptual Distance.
Proceedings of the International Conference "Recent Advances in Natural
Language Processing" RANLP'95, Tzigov Chark, Bulgaria, 1995. Also in Recent
Advances in Natural Language Processing. Eds. Mitkov R. and Nicolov
N., Vol. 136 of the series Current Issues in Linguistic Theory. Jon Benjamins
Publishing Company. Amsterdam, The Netherlands. 1997.
-
Rigau G., Rodríguez H. and Turmo J.,
Automatically
extracting Translation Links using a wide coverage semantic taxonomy.
Proceedings 15th International Conference AI'95. Montpellier, France. 1995.
-
Agirre E. and Rigau G., Word
Sense Disambiguation using Conceptual Density. Proceedings of 15th
International Conference on Computational Linguistics, COLING'96. Copenhagen,
Denmark, 1996.
-
Atserias J., Climent S., Farreres J., Rigau
G. and Rodríguez H., Combining
Multiple Methods for the Automatic Construction of Multilingual WordNets.
Proceedings of the International Conference "Recent Advances on Natural
Language Processing" RANLP'97. Tzigov Chark, Bulgaria, 1997.
-
Rigau G., Atserias J. and Agirre E.,
Combining
Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation.
Proceedings of joint 35th Annual Meeting of the Association for Computational
Linguistics and 8th Conference of the European Chapter of the Association
for Computational Linguistics ACL/EACL'97. Madrid, Spain, 1997.
-
Benítez L., Cervell S., Escudero G.,
López M., Rigau G. and Taulé M., Methods
and Tools for Building the Catalan WordNet. Proceedings of
the ELRA Workshop on Language Resources for European Minority Languages,
First International Conference on Language Resources & Evaluation,
Granada, Spain. 1998.
-
Rigau G., Rodríguez H. and Agirre E.,
Building
Accurate Semantic Taxonomies from Monolingual MRDs. Proceedings
of the 17th International Conference on Computational Linguistics and 36th
Annual Meeting of the Association for Computational Linguistics COLING-ACL'98.
Montreal, Canada. 1998.
-
Farreres X., Rigau G. and Rodríguez
H., Using
WordNet for Building WordNets. Proceedings of COLING-ACL Workshop
"Usage of WordNet in Natural Language Processing Systems". Montreal, Canada.
1998.
-
Vossen P., Bloksna L., Alonge A., Marinai
E., Peters C., Castellón I., Martí M.A. and Rigau G., Compatibility
in Interpretation of Relation in EuroWordNet. Computers and
the Humanities, Double Special Issue on EuroWordNet. Eds. Nancy Ide and
Dan Greenstein. 32:2,3, 153-184, 1998. Also in EUROWORDNET A Multilingual
Database with Lexical Semantic Networks ed. Piek Vossen. Kluwer Academic
Publishers. Dordrecht, The Netherlands. 1998.
-
Daudé J., Padró L. and Rigau
G., Mapping
Multilingual Hierarchies using Relaxation Labelling, Proceedings
of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing
and Very Large Corpora (EMNLP/VLC'99). Maryland, United States, 1999.
-
Atserias J., Castellón I., Civit M.
and Rigau G., Using
a Diathesis Model for Semantic Parsing, 1th Venezia per il Trattamento
Automatico delle Lingue (VExTAL'99). Venice, Italy, 1999.
-
Agirre E., Atserias J., Padró L. and
Rigau G., Combining
Supervised and Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation
Computers and the Humanities, Special Double Issue on SensEval. Eds. Martha
Palmer and Adam Kilgarriff. 34:1,2, 2000.
-
Atserias J., Castellón I., Civit M.
and Rigau G., Semantic
Analysis based on Verbal Subcategorization. Proceedings of the
Conference on Intelligent Text Processing and Computational Linguistics
(Cicling'2000), Mexico. 2000.
-
Escudero G., Màrquez L. and Rigau G.,
Boosting
Applied to Word Sense Disambiguation. Proceedings of the 11th
European Conference on Machine Learning, ECML 2000. Barcelona, Spain. 2000.
Lecture Notes in Artificial Intelligence 1810. R. L. de Mántaras
and E. Plaza (Eds.). Springer Verlag 2000. Also as a research report LSI-00-03-R.
Dept. de Llenguatges i Sistemes Informàtics. UPC. Barcelona. 2000.
-
Escudero G., Màrquez L. and Rigau G.,
Naive
Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited.
Proceedings of the 14th European Conference on Artificial Intelligence,
ECAI-2000. Berlin, Germany. 2000. Also as a research report LSI-00-05-R.
Dept. de Llenguatges i Sistemes Informàtics. UPC. Barcelona. 2000.
-
Escudero G., Màrquez L. and Rigau G.,
A
Comparison between Supervised Learning Algorithms for Word Sense Disambiguation.
Proceedings of Fourth Computational Natural Language Learning Workshop
(CoNLL-2000). Lisbon. Portugal. 2000.
-
Escudero G., Màrquez L. and Rigau G.,
An
Empirical Study of the Domain Dependence of Supervised Word Sense Disambiguation
Systems. Proceedings of Joint SIGDAT Conference on Empirical Methods
in Natural Language Processing and Very Large Corpora (EMNLP/VLC'00). Hong
Kong, China. 2000. Derived from On
the Portability and Tuning of Supervised Word Sense Disambiguation Systems.
Research Report LSI-00-30-R. Dept. de Llenguatges i Sistemes Informàtics,
UPC. 2000.
-
Daudé J., Padró L. and Rigau
G., Mapping
WordNets using Structural Information, Proceedings of the 38th
Annual Meeting of the Association for Computational Linguistics ACL'00.
Hong Kong, China. 2000.
-
Daudé J., Padró L. and Rigau
G., A
Complete WN1.5 to WN1.6 Mapping, Proceedings of NAACL Workshop
"WordNet and Other Lexical Resources: Applications, Extensions and Customizations".
Pittsburg, PA, United States, 2001.
-
Montoyo A., Palomar M. and Rigau G.,
WordNet
Enrichment with Classification Systems. Proceedings of NAACL Workshop
"WordNet and Other Lexical Resources: Applications, Extensions and Customizations".
Pittsburg, PA, United States, 2001.
-
Montoyo A., Palomar M. and Rigau G.,
Lexical
Enrichment of WordNet with Classification Systems using Specification Marks
Method. 6th International Workshop on Applications of Natural Language
for Information Systems NLDB'01. Madrid, Spain. 2001. Published in Lecture
Notes in Informatics, P-3, German Informatics Society GI-Edition, 2001.
-
Basili R., Catizone R., Padró L., Pazienza
M.T., Rigau G., Setzer A., Webb N., Wilks Y., Zanzotto F.,
Multilingual
Authoring: the NAMIC Approach. Proceedings of the ACL Workshop
"Human Language Technology and Knowledge Management". Toulouse, France.
2001.
-
Montoyo A., Palomar M. and Rigau G.,
Method
and Interface for WordNet Enrichment with Classification Systems.
Proceedings of 12th International Conference on Database and Expert Systems
Applications DEXA-2001. Munich, Germany. 2001. Published in Lecture Notes
in Computer Science 2113, Springer-Verlag, 2001.
-
Atserias J., Padró L. and Rigau G.,
Integrating
Multiple Knowledge Sources for Robust Semantic Parsing. Proceedings
of the International Conference "Recent Advances on Natural Language Processing"
RANLP'01. Tzigov Chark, Bulgaria. 2001.
-
Montoyo A., Palomar M. and Rigau G.,
Method
for WordNet Enrichment Using WSD, Proceedings of 4th International
Conference on Text Speech and Dialogue TSD'2001. Selezná Ruda -
Spièák, Czech Republic. 2001. Published in Lecture Notes
in Artificial Intelligence 2166, Springer-Verlag, 2001.
-
Rigau G., Taulé M. and Fernandez A.
and Gonzalo J., Framework
and Results for the Spanish SENSEVAL. Proceedings of 2nd International
Workshop "Evaluating Word Sense Disambiguation Systems", SENSEVAL-2. Toulouse,
France. 2001.
-
Escudero G., Màrquez L. and Rigau G.
Using
LazyBoosting for Word Sense Disambiguation. Proceedings of the
2nd International Workshop "Evaluating Word Sense Disambiguation Systems",
SENSEVAL-2. Toulouse, France. 2001.
-
Basili R., Catizone R., Padró L., Pazienza
M.T., Rigau G., Setzer A., Webb N. and Zanzotto F., Knowledge-Based
Multilingual Document Analysis. Proceedings of COLING Workshop
"SemaNet'02: Building and Using Semantic Networks" Taipei, Taiwan. 2002.
-
Rigau G., Magnini B., Agirre E., Vossen P.
and Carroll J., MEANING:
A Roadmap to Knowledge Technologies. Proceedings of COLING Workshop
"A Roadmap for Computational Linguistics". Taipei, Taiwan. 2002.
Last updated: July
9, 2002