
My research interests relate to Machine Learning applied to Natural
Language Processing.
My work is mostly focused on Empirical Machine Translation and its
Evaluation.
Birthdate: 03/23/1977
Work address: Jordi Girona 1-3, edifici Omega, despatx
S107. 08034. Barcelona
Work phone number: (+34) 93 4137950
STUDIES:
EXPERIENCE:
- May 2008 - ... technological project leader at Lingo Research Labs. I work
on the incorporation of current NLP technology into the company's
products and services.
- June 2007 - ... as specialized research
assistant, in the AI section NLP
group, "Technical University of Catalonia". OpenMT project, on
the development of open source machine translation software using hybrid methods.
- June 2003 - May 2007 as a phd student granted by the
Spanish Ministery of Science and Technology, in the AI section NLP
group, "Technical University of Catalonia". ALIADO project, about
the study and development of language and speech technology for mobile
personal digital assitants (pda's) in a multilingual environment.
- April 2002 - May 2003 as a research intern, in
the AI section NLP group, "Technical University of Catalonia". LC-STAR project, on the
generation of resources, namely speech signal, lexica and corpora for
speech to speech translation components.
- February 2001 - november 2001 as a research intern,
Panasonic Speech
Technology Laboratories , Santa Barbara, CA, USA. Speech
recognition group. End of Degree project, "A database architecure for
efficient design of acoustic models", describing the construction of an
information system thought to efficiently give support the requirements
of the software training algorithms.
- January 2000 - February 2001. JEDI (Young
Computer Science
Students) as a junior analyst and programmer, responsible for the
construction of a stock management information system for a small
company, (Visual Basic - SQL Server 7.0).
TEACHING:
- 2005-2006
Introduction to Logics. Facultat d'Informàtica de Barcelona (FIB). Universitat Politècnica de Catalunya (UPC).
PUBLICATIONS:
Doctoral Dissertation
- Jesús
Giménez.
Empirical Machine Translation and its Evaluation. Ph.D. Thesis,
Universitat Politécnica de Catalunya (defended July 2, 2008). [.ps] [.pdf] [slides]
2008
- Jesús
Giménez and Lluís Márquez.
Discriminative Phrase Selection for Statistical Machine Translation
. In Learning Machine Translation. NIPS Workshop Series. MIT
Press. [.ps] [.pdf]
- Jesús
Giménez and Lluís Márquez.
A Smorgasbord of Features for Automatic MT
Evaluation. Proceedings of the 3rd ACL Workshop on Statistical
Machine Translation (shared evaluation task). [.ps] [.pdf] [poster] [spot]
- Jesús
Giménez and Lluís Márquez.
Towards Heterogeneous Automatic MT Error Analysis
. Proceedings of the 6th
International Conference on Language Resources and Evaluation
(LREC'08). [.ps] [.pdf] [slides]
- Cristina Espaņa-Bonet, Jesús
Giménez and Lluís Márquez.
The UPC-lsi Discriminative Phrase Selection System: NIST MT Evaluation 2008
. Proceedings of the 2008 NIST Open Machine Translation Evaluation Workshop. [.ps] [.pdf]
[slides]
[mt-eval slides]
- Jesús
Giménez.
Towards Heterogeneous Automatic MT Evaluation
. Talk at the TALP NLP group seminar. [slides]
- Jesús
Giménez and Lluís Márquez.
Heterogeneous Automatic MT Evaluation Through Non-Parametric Metric Combinations
. To appear in Proceedings of IJCNLP 2008. [.ps] [.pdf]
[slides]
2007
- Jesús
Giménez and Lluís Márquez.
Linguistic Features for Automatic Evaluation of Heterogeneous MT Systems
. Proceedings of WMT 2007 (ACL'07). [.ps] [.pdf]
[slides]
- Jesús
Giménez and Lluís Márquez.
Context-aware Discriminative Phrase Selection for Statistical Machine Translation
. Proceedings of WMT 2007 (ACL'07). [.ps] [.pdf]
[spot]
- Jesús
Giménez.
IQMT: A Framework for Automatic Machine Translation Evaluation based on Human Likeness. Technical Report LSI-07-29-R. [.ps] [.pdf]
- David Farwell, Jesús Giménez, Edgar González, Reda Halkoum, Horacio Rodríguez and Mihai Surdeanu.
The UPC System for Arabic-to-English Entity Translation
. Proceedings of ACE 2007. [.ps] [.pdf]
2006
- Patrik Lambert, Jesús
Giménez, Marta R. Costa-jussá, Enrique Amigó, Rafael E. Banchs, Lluís Márquez and J.A. R. Fonollosa.
Machine Translation System Development based on Human Likeness. Proceedings of IEEE/ACL 2006 Workshop on Spoken Language Technology. [.pdf]
- Jesús
Giménez and Lluís Márquez.Low-cost Enrichment of Spanish WordNet with
Automatically Translated Glosses: Combining General and Specialized
Models. Proceedings of COLING-ACL 2006. [.ps] [.pdf]
- Enrique Amigó, Jesús
Giménez, Julio Gonzalo and Lluís Márquez. MT Evaluation: Human-like vs. Human
Acceptable. Proceedings of COLING-ACL 2006. [.ps] [.pdf]
- Jesús Giménez and Lluís Màrquez. The LDV-COMBO system for SMT.
Proceedings of the NAACL 2006 Workshop on Statistical Machine
Translation. [.ps]
[.pdf]
- Jesús Giménez and Enrique Amigó.
IQMT: A Framework for Automatic Machine
Translation Evaluation. Proceedings of the 5th
International Conference on Language Resources and Evaluation
(LREC'06). Genoa, Italy, 22-28 May. 2006. [.ps] [.pdf]
[slides]
2005
- Jesús Giménez, Enrique Amigó and Chiori
Hori. Machine Translation Evaluation
Inside QARLA. In Proceedings of the International Workshop on
Spoken Language Technology (IWSLT'05). Pittsburgh, PA, USA October
24-25 2005. [.ps] [.pdf] [slides]
- Jesús
Giménez, Lluís Márquez and German Rigau. Automatic Translation of WordNet Glosses.
Eurolan Cross-Language Knowledge Induction Workshop. Cluj-Napoca,
Romania, july 25 - August 5, 2005. [.ps] [.pdf] [slides]
- Jesús Giménez. Rich
Linguistic Knowledge for Empirical Machine Translation. PhD
Thesis Project. LSI
Department. Technical University of Catalonia, 2005. [.ps] [.pdf] [slides]
- Jesús
Giménez and Lluís Márquez. Combining Linguistic Data Views for
Phrase-based SMT. ACL Workshop on ``Building and Using Parallel
Texts: Data-Driven Machine Translation and Beyond''. Ann Arbor,
Michigan, USA, June 29-30, 2005. [.ps]
[.pdf] [slides]
- Lluís Màrquez, Pere Comas,
Jesús Giménez and Neus Català.
Semantic Role Labeling as
Sequential Tagging. Ninth
Conference on Computational Natural Language Learning (CONLL'05).
Ann Arbor, Michigan, USA, June 29-30, 2005. [.pdf]
2004
- Victoria Arranz, Núria Castell i Jesús
Giménez. Creació de
recursos lingüístics per a la traducció
automàtica. 2n Congrés d'Enginyeria en Llengua
Catalana. (CELC'04). Andorra, 2004. [.pdf]
[slides]
- Victoria Arranz, Núria Castell y Jesús
Giménez. Creación de
recursos lingüísticos para la traducción
automática. III Jornadas en Tecnología del Habla.
Valencia, Spain. 2004. [.ps] [.pdf] [slides]
- Folkert de Vriend, Núria
Castell, Jesús Giménez and Giulio Maltese.
LC-STAR: XML-coded Phonetic Lexica and Bilingual Corpora for
Speech-to-Speech Translation. In
Proceedings of the Papillon Workshop on Multilingual
Lexical Databases. Grenoble, France.
2004 . [.pdf]
[slides]
- Jesús Giménez and Lluís Márquez.
SVMTool: A general POS tagger generator based on Support Vector
Machines. In Proceedings of the 4th
International Conference on Language Resources and Evaluation (LREC'04), vol. I,
pages 43 - 46. Lisbon,
Portugal, 2004. (ISBN 2-9517408-1-6)
[.ps]
[.pdf]
[slides]
SVMTool
[free
download] Departament Research
Report (LSI-04-34-R), Technical University of Catalonia,
2004. [.ps] [.pdf]
- Victoria Arranz,
Núria
Castell, Josep Maria Crego, Jesús Giménez,
Adrià
de Gispert and Patrik Lambert.
Bilingual Connections for Trilingual Corpora: An XML Approach.
In Proceedings of the 4th International Conference on Language Resources
and Evaluation (LREC'04), vol. IV, pages 1459 - 1462. Lisbon, Portugal. 2004 .
(ISBN 2-9517408-1-6) [.ps]
[.pdf]
[poster]
2003
- Jesús Giménez and Lluís
Márquez. Fast and Accurate Part-of-Speech Tagging: The SVM
Approach Revisited. In Proceedings of the International
Conference RANLP - 2003 (Recent Advances in Natural Language Processing),
pages 158 - 165. September, 10-12, 2003.
Borovets, Bulgary. (ISBN 954-90906-6-3)
[.ps]
[.pdf]
[slides]. Selected as
a chapter in RANLP 2003
volume in CILT series (Current Issues in Linguistic Theory). John
Benjamins Publishers, Amsterdam.
- Victoria Arranz, Núria Castell and
Jesús Giménez. Development
of Language Resources for Speech-to-Speech Translation.
In Proceedings of the International Conference RANLP - 2003 (Recent
Advances in Natural
Language Processing), pages 26 - 30. September, 10-12, 2003. Borovets,
Bulgary.
[.ps] [.pdf]
[poster]
- David Conejero, Jesús Giménez,
Victoria Arranz, Antonio Bonafonte, Neus Pascual, Núria Castell
and Asunción Moreno. Lexica and Corpora for Speech-to-Speech
translation: A Trilingual Approach.
In Proceedings of the 8th European
Conference on Speech Communication and Technology
(EuroSpeech 2003).
September, 1-4,
2003. Geneva, Switzerland.
(ISSN 1018-4074)
[.ps]
[.pdf]
- Victoria Arranz,
Núria Castell, Jesús Giménez, Hermann Ney and
Nicola Ueffing.
Description of language resources used for experiments
Technical Report Deliverable
D4.2, LC-STAR project by the European Community (IST project ref. No.
2001-32216), 2003.
- Victoria Arranz, Núria Castell,
Jesús Giménez and Asunción Moreno. Description of raw corpora.
Technical Report Deliverable 5.3, LC-STAR
project by the European Community (IST project ref. No. 2001-32216),
2003.
- Victoria Arranz, Núria Castell and Jesús
Giménez. Speech Corpora Creation for Tourist Domain.
LSI Department Technical Report (LSI-03-2-T),
Technical University of Catalonia,
2003.
SOFTWARE:
PARTICIPATON IN CONFERENCE/JOURNAL PROGRAM COMMITTEES:
- North American Chapter of the Association for Computational Linguistics - Human Language Technologies. NAACL HLT 2009.
- 13th Annual Conference of the European Association for Machine Translation. EAMT 2009.
- 12th Conference of the European Chapter of the Association for Computational Linguistics. EACL 2009.
- Student Research Workshop at the 8th Conference of the
Association for Machine Translation in the Americas. AMTA 2008.
- Workshop on human judgements in Computational
Linguistics at the 22nd International Conference on Computational Linguistics. COLING 2008.
- The 11th Conference on Theoretical and Methodological Issues in Machine Translation. TMI 2007.
- 45th Annual Meeting of the Association for Computational
Linguistics. ACL 2007.
- Twenty-Second National Conference on Artificial Intelligence. AAAI 2007.
- 2006 Conference on Empirical Methods in Natural Language Processing. EMNLP 2006.
- Twenty-First National Conference on Artificial Intelligence. AAAI 2006.
- 11th Conference of the European Chapter of the Association for Computational Linguistics. EACL 2006.
- Espana for Natural Language Processing. ESTAL 2004.
COURSES/CONFERENCES:
- "Mixing Approaches to Machine
Translation" workshop (MATMT 2008). Donostia, Spain. February 14, 2008.
- The Third International Joint Conference
on Natural Language Processing (IJCNLP 2008). Hyderabad, India. January 7-12, 2008.
- 45th Annual Meeting of the Association for
Computational Linguistics (ACL 2007). Prague, Czech Republic, June 23-30, 2007.
- 44th Annual Meeting of the Association for
Computational Linguistics (ACL 2006). Sydney, Australia, July
17-21, 2006.
- 5th International Conference on Language Resources
and
Evaluation (LREC'06). Genoa, Italy. May 24-26, 2006 .
- International Workshop on Spoken Language
Technology (IWSLT'05). Pittsburgh, PA, USA October 24-25 2005.
- Eurolan Cross-Language Knowledge Induction
Workshop. Cluj-Napoca,
Romania, july 25 - August 5, 2005.
- Eurolan Summer School 2005. Cluj-Napoca,
Romania, july 25 - August 5, 2005.
- ACL Workshop on ``Building and Using Parallel
Texts: Data-Driven Machine Translation and Beyond''. Ann Arbor,
Michigan, USA, June 29-30, 2005.
- Ninth Conference on Computational Natural Language Learning
(CONLL'05). Ann Arbor, Michigan, USA, June 29-30, 2005.
- 43nd Annual Meeting of the Association for
Computational Linguistics (ACL 2005). Ann Arbor, Michigan, USA, June
29-30, 2005.
- 2n Congrés d'Enginyeria en Llengua
Catalana. (CELC'04). Andorra, 2004.
- 2004 Papillon Workshop on Mulitlingual Lexical
Databases. Grenoble, France, August 30 - September 1, 2004.
- 42nd Annual Meeting of the Association for
Computational Linguistics (ACL 2004). Barcelona, Spain, July 22-24, 2004.
- 2004 Conference on Empirical Methods in Natural
Language Processing (EMNLP 2004). Barcelona, Spain July 25-26, 2004.
- 4th International Conference on Language Resources
and Evaluation (LREC'04). Lisbon, Portugal. May 26-28, 2004 .
- NLP Workshop by the IXA Research Group.
Hondarribia, Gipuzkoa, Spain. February 5-6, 2004.
- International Conference on Recent Advances in
Natural Language Processing (RANLP '03). September 10-12, 2003.
Borovets, Bulgaria.
- Statistical
Processing of Natural Language.
Hermann Ney. March 2003. Universidad Politécnica de
Cataluña, Barcelona, Spain.
- Introduction to
Microsoft C#. Microsoft Iberica,
S.R.L. January-February, 2003. Barcelona, Spain.
- Language
Technology Course: Machine Translation.
July 15-19, 2002. Universidad Internacional Menéndez Pelayo,
Barcelona, Spain.
- Business
Integration Methodology. Accenture.
March 2002. Barcelona, Spain.
- 8th ELSNET Summer
School, Text And Speech
Triggered Information Access (TeSTIA'2000). July 15-30, 2000. Chios,
Grecia.
- 9th Conference on
Advanced Techniques in Computer
Science. (JETAI'98). March 25-27, 1998. Zaragoza, Spain.
LANGUAGES:
Anar a la página del
LSI
Anar a la pàgina de la UPC
Grup de Tractament del Llenguatge i la
Parla
(last updated December 10, 2008)