
My research interests relate to Machine Learning applied to Natural
Language Processing and to the study of Evaluation Methods.
Initially, I worked on the problem of sequential tagging. I
implemented the SVMTool which is still
in use by the research community worldwide. Then, for almost a decade
I studied the application of empirical methods to Machine Translation,
which was great because it gave me freedom to play on different
subproblems. I ended up focusing, however, on the lexical selection
problem, building discriminative models. Moreover, throughout my
experience on MT I realised how important evaluation methods are along
the system development cycle in NLP tasks, and decided to study manual
and automatic evaluation methods more closely. I designed and built
novel automatic similarity measures based on linguistic processing of
texts and integrated them inside the Asiya Open
Toolkit. Finally, in 2011, Google happenned to be looking for
somebody exactly like me. I was run a job opportunity, and I
decided to join them!
LinkedIn: http://www.linkedin.com/in/jesusgimenez
STUDIES:
EXPERIENCE:
- March 2011 - ... - as Analytical Linguist at Google.
- March 2010 - March 2011 - as post-doc researcher in the AI section NLP
group, "Technical University of Catalonia". FAUST project
(FP7-ICT-2009-4-247762, EC, 2010-2012), on
feedback analysis for user adaptive statistical translation.
- October 2009 - January 2010 as senior visiting researcher
at ExperienceOn
Ventures. I work on the development of Natural Language
Processing Technology for domain-oriented Information Retrieval.
- June 2007 - October 2009 as specialized research
assistant, in the AI section NLP
group, "Technical University of Catalonia". OpenMT project (TIN2006-15307-C03-02), on
the development of open source machine translation software using hybrid methods.
- May 2008 - September 2009 as technological project leader at Semantix Group (formerly, Lingo Research Labs). I worked
on the incorporation of current NLP technology into the company's
products and services.
- June 2003 - May 2007 as a phd student granted by the
Spanish Ministery of Science and Technology, in the AI section NLP
group, "Technical University of Catalonia". ALIADO project (TIC2002-04447-C02), about
the study and development of language and speech technology for mobile
personal digital assitants (pda's) in a multilingual environment.
- April 2002 - May 2003 as a research intern, in
the AI section NLP group, "Technical University of Catalonia". LC-STAR project, on the
generation of resources, namely speech signal, lexica and corpora for
speech to speech translation components.
- February 2001 - november 2001 as a research intern,
Panasonic Speech
Technology Laboratories , Santa Barbara, CA, USA. Speech
recognition group. End of Degree project, "A database architecure for
efficient design of acoustic models", describing the construction of an
information system thought to efficiently give support the requirements
of the software training algorithms.
- January 2000 - February 2001. JEDI (Young
Computer Science
Students) as a junior analyst and programmer, responsible for the
construction of a stock management information system for a small
company, (Visual Basic - SQL Server 7.0).
TEACHING:
- 2005-2006
Introduction to Logics. Facultat d'Informàtica de Barcelona (FIB). Universitat Politècnica de Catalunya (UPC).
PUBLICATIONS:
Doctoral Dissertation
2011
- Jesús
Giménez and Lluís Màrquez. Linguistic Measures for Automatic Machine Translation Evaluation.
To Appear in Machine Translation, Springer Netherlands, 2011. [.pdf]
2010
- Jesús
Giménez and Lluís Márquez. Asiya: An Open Toolkit for Automatic
Machine Translation (Meta-)Evaluation. The Prague Bulletin of
Mathematical Linguistics, No. 94, 2010. [.pdf] [slides]
- Elisabet Comelles, Jesús
Giménez, Lluís Márquez, Irene Castellón
and Victoria Arranz. Document-level Automatic MT Evaluation
based on Discourse Representations. Proceedings of
the 5th Workshop on Statistical Machine Translation (at ACL'10). [.ps] [.pdf] [spot] [poster]
2009
- Cristina Espaņa-Bonet, Jesús
Giménez and Lluís Márquez.
Discriminative Phrase-Based Models for Arabic Machine
Translation.. ACM Transactions on Asian Language Information
Processing Journal (Special Issue on Arabic Natural Language
Processing). ACM TALIP 2009. [.pdf]
- Enrique Amigó, Jesús
Giménez and Felisa Verdejo.
Procesamiento Lingüístico en Evaluación
Automática de Traducciones. Proceedings SEPLN, 2009.
[.pdf]
- Enrique Amigó, Jesús
Giménez, Julio Gonzalo and Felisa Verdejo.
The Contribution of Linguistic Features to Automatic Machine
Translation Evaluation. Proceedings of ACL-IJCNLP, 2009.
[.pdf]
- Jesús
Giménez Empirical Machine Translation and its
Evaluation. Invited talk at the SMART Workshop (EAMT'09). [slides]
- Jesús
Giménez and Lluís Márquez. On the Robustness of Syntactic and
Semantic Features for Automatic MT Evaluation. Proceedings of
the 4th Workshop on Statistical Machine Translation (at EACL'09). [.ps] [.pdf] [slides]
- Miguel García, Jesús Giménez, and
Lluís Márquez.
Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base (Best Student Paper Award). Proceedings of the 10th International Conference on
Intelligent Text Processing and Computational Linguistics
(CICLing 2009). [.ps]
[.pdf]
- Jesús
Giménez and Lluís Márquez.
Discriminative Phrase Selection for Statistical Machine Translation. In Learning Machine Translation. NIPS Workshop Series. MIT
Press. [.ps] [.pdf] Talk at GLiCom.
[slides]
2008
- Jesús
Giménez and Lluís Márquez.
Discriminative Phrase Selection for Statistical Machine Translation
. In Learning Machine Translation. NIPS Workshop Series. MIT
Press. [.ps] [.pdf]
- Jesús
Giménez and Lluís Márquez.
A Smorgasbord of Features for Automatic MT
Evaluation. Proceedings of the 3rd ACL Workshop on Statistical
Machine Translation (shared evaluation task). [.ps] [.pdf] [poster] [spot]
- Jesús
Giménez and Lluís Márquez.
Towards Heterogeneous Automatic MT Error Analysis
. Proceedings of the 6th
International Conference on Language Resources and Evaluation
(LREC'08). [.ps] [.pdf] [slides]
- Cristina Espaņa-Bonet, Jesús
Giménez and Lluís Márquez.
The UPC-lsi Discriminative Phrase Selection System: NIST MT Evaluation 2008
. Proceedings of the 2008 NIST Open Machine Translation Evaluation Workshop. [.ps] [.pdf]
[slides]
[mt-eval slides]
- Jesús
Giménez.
Towards Heterogeneous Automatic MT Evaluation
. Talk at the TALP NLP group seminar. [slides]
- Jesús
Giménez and Lluís Márquez.
Heterogeneous Automatic MT Evaluation Through Non-Parametric Metric Combinations
. To appear in Proceedings of IJCNLP 2008. [.ps] [.pdf]
[slides]
2007
- Jesús
Giménez and Lluís Márquez.
Linguistic Features for Automatic Evaluation of Heterogeneous MT Systems
. Proceedings of WMT 2007 (ACL'07). [.ps] [.pdf]
[slides]
- Jesús
Giménez and Lluís Márquez.
Context-aware Discriminative Phrase Selection for Statistical Machine Translation
. Proceedings of WMT 2007 (ACL'07). [.ps] [.pdf]
[spot]
- Jesús
Giménez.
IQMT: A Framework for Automatic Machine Translation Evaluation based on Human Likeness. Technical Report LSI-07-29-R. [.ps] [.pdf]
- David Farwell, Jesús Giménez, Edgar González, Reda Halkoum, Horacio Rodríguez and Mihai Surdeanu.
The UPC System for Arabic-to-English Entity Translation
. Proceedings of ACE 2007. [.ps] [.pdf]
2006
- Patrik Lambert, Jesús
Giménez, Marta R. Costa-jussá, Enrique Amigó, Rafael E. Banchs, Lluís Márquez and J.A. R. Fonollosa.
Machine Translation System Development based on Human Likeness. Proceedings of IEEE/ACL 2006 Workshop on Spoken Language Technology. [.pdf]
- Jesús
Giménez and Lluís Márquez.Low-cost Enrichment of Spanish WordNet with
Automatically Translated Glosses: Combining General and Specialized
Models. Proceedings of COLING-ACL 2006. [.ps] [.pdf]
- Enrique Amigó, Jesús
Giménez, Julio Gonzalo and Lluís Márquez. MT Evaluation: Human-like vs. Human
Acceptable. Proceedings of COLING-ACL 2006. [.ps] [.pdf]
- Jesús Giménez and Lluís Màrquez. The LDV-COMBO system for SMT.
Proceedings of the NAACL 2006 Workshop on Statistical Machine
Translation. [.ps]
[.pdf]
- Jesús Giménez and Enrique Amigó.
IQMT: A Framework for Automatic Machine
Translation Evaluation. Proceedings of the 5th
International Conference on Language Resources and Evaluation
(LREC'06). Genoa, Italy, 22-28 May. 2006. [.ps] [.pdf]
[slides]
2005
- Jesús Giménez, Enrique Amigó and Chiori
Hori. Machine Translation Evaluation
Inside QARLA. In Proceedings of the International Workshop on
Spoken Language Technology (IWSLT'05). Pittsburgh, PA, USA October
24-25 2005. [.ps] [.pdf] [slides]
- Jesús
Giménez, Lluís Márquez and German Rigau. Automatic Translation of WordNet Glosses.
Eurolan Cross-Language Knowledge Induction Workshop. Cluj-Napoca,
Romania, july 25 - August 5, 2005. [.ps] [.pdf] [slides]
- Jesús Giménez. Rich
Linguistic Knowledge for Empirical Machine Translation. PhD
Thesis Project. LSI
Department. Technical University of Catalonia, 2005. [.ps] [.pdf] [slides]
- Jesús
Giménez and Lluís Márquez. Combining Linguistic Data Views for
Phrase-based SMT. ACL Workshop on ``Building and Using Parallel
Texts: Data-Driven Machine Translation and Beyond''. Ann Arbor,
Michigan, USA, June 29-30, 2005. [.ps]
[.pdf] [slides]
- Lluís Màrquez, Pere Comas,
Jesús Giménez and Neus Català.
Semantic Role Labeling as
Sequential Tagging. Ninth
Conference on Computational Natural Language Learning (CONLL'05).
Ann Arbor, Michigan, USA, June 29-30, 2005. [.pdf]
2004
- Victoria Arranz, Núria Castell i Jesús
Giménez. Creació de
recursos lingüístics per a la traducció
automàtica. 2n Congrés d'Enginyeria en Llengua
Catalana. (CELC'04). Andorra, 2004. [.pdf]
[slides]
- Victoria Arranz, Núria Castell y Jesús
Giménez. Creación de
recursos lingüísticos para la traducción
automática. III Jornadas en Tecnología del Habla.
Valencia, Spain. 2004. [.ps] [.pdf] [slides]
- Folkert de Vriend, Núria
Castell, Jesús Giménez and Giulio Maltese.
LC-STAR: XML-coded Phonetic Lexica and Bilingual Corpora for
Speech-to-Speech Translation. In
Proceedings of the Papillon Workshop on Multilingual
Lexical Databases. Grenoble, France.
2004 . [.pdf]
[slides]
- Jesús Giménez and Lluís Márquez.
SVMTool: A general POS tagger generator based on Support Vector
Machines. In Proceedings of the 4th
International Conference on Language Resources and Evaluation (LREC'04), vol. I,
pages 43 - 46. Lisbon,
Portugal, 2004. (ISBN 2-9517408-1-6)
[.ps]
[.pdf]
[slides]
SVMTool
[free
download] Departament Research
Report (LSI-04-34-R), Technical University of Catalonia,
2004. [.ps] [.pdf]
- Victoria Arranz,
Núria
Castell, Josep Maria Crego, Jesús Giménez,
Adrià
de Gispert and Patrik Lambert.
Bilingual Connections for Trilingual Corpora: An XML Approach.
In Proceedings of the 4th International Conference on Language Resources
and Evaluation (LREC'04), vol. IV, pages 1459 - 1462. Lisbon, Portugal. 2004 .
(ISBN 2-9517408-1-6) [.ps]
[.pdf]
[poster]
2003
- Jesús Giménez and Lluís
Márquez. Fast and Accurate Part-of-Speech Tagging: The SVM
Approach Revisited. In Proceedings of the International
Conference RANLP - 2003 (Recent Advances in Natural Language Processing),
pages 158 - 165. September, 10-12, 2003.
Borovets, Bulgary. (ISBN 954-90906-6-3)
[.ps]
[.pdf]
[slides]. Selected as
a chapter in RANLP 2003
volume in CILT series (Current Issues in Linguistic Theory). John
Benjamins Publishers, Amsterdam.
- Victoria Arranz, Núria Castell and
Jesús Giménez. Development
of Language Resources for Speech-to-Speech Translation.
In Proceedings of the International Conference RANLP - 2003 (Recent
Advances in Natural
Language Processing), pages 26 - 30. September, 10-12, 2003. Borovets,
Bulgary.
[.ps] [.pdf]
[poster]
- David Conejero, Jesús Giménez,
Victoria Arranz, Antonio Bonafonte, Neus Pascual, Núria Castell
and Asunción Moreno. Lexica and Corpora for Speech-to-Speech
translation: A Trilingual Approach.
In Proceedings of the 8th European
Conference on Speech Communication and Technology
(EuroSpeech 2003).
September, 1-4,
2003. Geneva, Switzerland.
(ISSN 1018-4074)
[.ps]
[.pdf]
- Victoria Arranz,
Núria Castell, Jesús Giménez, Hermann Ney and
Nicola Ueffing.
Description of language resources used for experiments
Technical Report Deliverable
D4.2, LC-STAR project by the European Community (IST project ref. No.
2001-32216), 2003.
- Victoria Arranz, Núria Castell,
Jesús Giménez and Asunción Moreno. Description of raw corpora.
Technical Report Deliverable 5.3, LC-STAR
project by the European Community (IST project ref. No. 2001-32216),
2003.
- Victoria Arranz, Núria Castell and Jesús
Giménez. Speech Corpora Creation for Tourist Domain.
LSI Department Technical Report (LSI-03-2-T),
Technical University of Catalonia,
2003.
PARTICIPATON IN CONFERENCE/JOURNAL PROGRAM COMMITTEES:
- The Prague Bulletin of Mathematical Linguistics (PBML). Special
Issue on "Open Source Tools for Machine Translation". 2010.
- International Conference on Empirical Methods in Natural Language Processing. EMNLP 2010.
- 23rd International Conference on Computational Linguistics. COLING 2010
- Machine Translation Journal. Special Issue on "Pushing the
frontier of Statistical Machine Translation". 2010.
- ACL 2010 Joint Fifth Workshop on Statistical Machine
Translation and Metrics MATR.
- 48th Annual Meeting of the Association for Computational
Linguistics. ACL 2010.
- Human Language Technologies: The 11th Annual Conference of the
North American Chapter of the Association for Computational
Linguistics. NAACL-HLT 2010.
- The seventh international conference on Language Resources and
Evaluation. LREC 2010.
- Machine Translation Summit XII. MT-SUMMIT 2009.
- International Conference on Empirical Methods in Natural Language Processing. EMNLP 2009.
- Joint conference of the 47th Annual Meeting of the Association
for Computational Linguistics and the 4th International Joint
Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. ACL-IJCNLP 2009.
- Fourth Workshop on Statistical Machine Translation (at EACL). WMT 2009.
- North American Chapter of the Association for Computational Linguistics - Human Language Technologies. NAACL HLT 2009.
- 13th Annual Conference of the European Association for Machine Translation. EAMT 2009.
- 12th Conference of the European Chapter of the Association for Computational Linguistics. EACL 2009.
- Student Research Workshop at the 8th Conference of the
Association for Machine Translation in the Americas. AMTA 2008.
- Workshop on human judgements in Computational
Linguistics at the 22nd International Conference on Computational
Linguistics. HJCL'08 at COLING 2008.
- XXIV edición del Congreso Anual de la Sociedad Española para el
Procesamiento del Lenguaje Natural. SEPLN 2008.
- The 11th Conference on Theoretical and Methodological Issues in Machine Translation. TMI 2007.
- 45th Annual Meeting of the Association for Computational
Linguistics. ACL 2007.
- Twenty-Second National Conference on Artificial Intelligence. AAAI 2007.
- International Conference on Empirical Methods in Natural Language Processing. EMNLP 2006.
- Twenty-First National Conference on Artificial Intelligence. AAAI 2006.
- 11th Conference of the European Chapter of the Association for Computational Linguistics. EACL 2006.
- Espana for Natural Language Processing. ESTAL 2004.
COURSES/CONFERENCES:
- AMTA 2010. Denver, CO, USA. October 31 - November 4, 2010.
- MT Marathon 2010. Le Mans, France. 13-18 September, 2010.
- ACL 2010 Joint Fifth Workshop on Statistical Machine
Translation and Metrics MATR (WMT at ACL 2010). Uppsala, Sweden. July 15-16, 2010.
- 14th Annual Conference of the European Association for Machine
Translation (EAMT 2010). May 27-28, 2010.
- Fourth Workshop on Statistical Machine Translation (WMT at EACL
2009). Athens, Greece. March 30-31, 2009.
- 13th Annual Conference of the European Association for Machine
Translation. (EAMT 2009). Barcelona, Spain. May 14-15, 2009.
- NIST 2008 Metrics MATR Challenge at the Eighth Conference of
the Association for Machine Translation in the Americas (AMTA 2008). Waikiki, Hawai'i,
USA. October 25, 2008.
- "Mixing Approaches to Machine
Translation" workshop (MATMT 2008). Donostia, Spain. February 14, 2008.
- The Third International Joint Conference
on Natural Language Processing (IJCNLP 2008). Hyderabad, India. January 7-12, 2008.
- 45th Annual Meeting of the Association for
Computational Linguistics (ACL 2007). Prague, Czech Republic, June 23-30, 2007.
- 44th Annual Meeting of the Association for
Computational Linguistics (ACL 2006). Sydney, Australia, July
17-21, 2006.
- 5th International Conference on Language Resources
and
Evaluation (LREC'06). Genoa, Italy. May 24-26, 2006 .
- International Workshop on Spoken Language
Technology (IWSLT'05). Pittsburgh, PA, USA October 24-25 2005.
- Eurolan Cross-Language Knowledge Induction
Workshop. Cluj-Napoca,
Romania, july 25 - August 5, 2005.
- Eurolan Summer School 2005. Cluj-Napoca,
Romania, july 25 - August 5, 2005.
- ACL Workshop on ``Building and Using Parallel
Texts: Data-Driven Machine Translation and Beyond''. Ann Arbor,
Michigan, USA, June 29-30, 2005.
- Ninth Conference on Computational Natural Language Learning
(CONLL'05). Ann Arbor, Michigan, USA, June 29-30, 2005.
- 43nd Annual Meeting of the Association for
Computational Linguistics (ACL 2005). Ann Arbor, Michigan, USA, June
29-30, 2005.
- 2n Congrés d'Enginyeria en Llengua
Catalana. (CELC'04). Andorra, 2004.
- 2004 Papillon Workshop on Mulitlingual Lexical
Databases. Grenoble, France, August 30 - September 1, 2004.
- 42nd Annual Meeting of the Association for
Computational Linguistics (ACL 2004). Barcelona, Spain, July 22-24, 2004.
- 2004 Conference on Empirical Methods in Natural
Language Processing (EMNLP 2004). Barcelona, Spain July 25-26, 2004.
- 4th International Conference on Language Resources
and Evaluation (LREC'04). Lisbon, Portugal. May 26-28, 2004 .
- NLP Workshop by the IXA Research Group.
Hondarribia, Gipuzkoa, Spain. February 5-6, 2004.
- International Conference on Recent Advances in
Natural Language Processing (RANLP '03). September 10-12, 2003.
Borovets, Bulgaria.
- Statistical
Processing of Natural Language.
Hermann Ney. March 2003. Universidad Politécnica de
Cataluña, Barcelona, Spain.
- Introduction to
Microsoft C#. Microsoft Iberica,
S.R.L. January-February, 2003. Barcelona, Spain.
- Language
Technology Course: Machine Translation.
July 15-19, 2002. Universidad Internacional Menéndez Pelayo,
Barcelona, Spain.
- Business
Integration Methodology. Accenture.
March 2002. Barcelona, Spain.
- 8th ELSNET Summer
School, Text And Speech
Triggered Information Access (TeSTIA'2000). July 15-30, 2000. Chios,
Grecia.
- 9th Conference on
Advanced Techniques in Computer
Science. (JETAI'98). March 25-27, 1998. Zaragoza, Spain.
SPECIAL MERITS:
SOFTWARE:
LANGUAGES:
PROGRAMMING SKILLS:
- Programming Languages (Python, Perl, Java, PhP, C / C++)
- Agile test-driven development
Anar a la página del
LSI
Anar a la pàgina de la UPC
Grup de Tractament del Llenguatge i la
Parla
(last updated March 23, 2011)