I can't log in
 
LSI
Accions del document

PhD Thesis

Announcements of the last step towards the PhD


LogoDelicious  Digg!


Exploiting lexical information and discriminative alignment training in statistical machine translation

PhD Candidate: Patrik G. Lambert

Advisor: Dr. Rafael E. Banchs Martínez

Tutor: Dra. Núria Castell Ariño

Summary: The thesis work mainly focused on three aspects of statistical machine translation: the use of lexical information like basic lexical models and multi-word expressions, minimum error training strategies and word alignment models. These aspects were addressed within the n-gram-based machine translation framework. In this approach, the joint translation probability is modelled via a log-linear combination of a bilingual n-gram model and additional feature functions.

First, a thorough study of word alignment evaluation is carried out. We stress the impact on the scores of the way alignment test data are scribed. After this, we evaluate the impact on alignment quality of linguistic classifications like lemmatising, stemming or verb classification. Although these transformations have a large positive impact on word alignment, we report that this improvement has no effect on translation quality. We also examine the impact on word alignment quality and translation accuracy of grouping data-inferred multi-word expressions before alignment.

Another objective of this build and we give guidelines for manual alignment. The n-gram-based machine translation system is then this was the improvement of minimum error training strategies. Two research lines were considered: the choice of the metric used as objective function and the improvement of the optimisation algorithm itself. In the first research line, parameters were successfully tuned with respect to the Queen score of the Qarla framework, a framework which combines different metrics with a stable and robust criterion. In the second line, the Simultaneous Perturbation Stochastic Approximation algorithm and the downhill simplex method were compared for this parameter optimisation task.

Finally, we propose a novel framework for discriminative training of alignment models with automated translation metrics as maximisation criterion. In order to evaluate this framework, we implemented an alignment system based on discriminative models adapted to the n-gram-based translation system, and we observed a clear improvement of automated translation scores on small corpora. We extended this framework to large corpora, tuning the alignment system parameters on a small part of the corpus, and using them to align the whole corpus. The obtained parameters were able to produce at least as good machine translation systems as with standard word alignment tools, but in a more flexible way and with less computational resource requirements.

Date: 25th of April

Time: 11h

Place: Aula Teleensenyament de l'edifici B3
           Campus Nord.

Press Contact
ilapuente@lsi.upc.edu
 
Darrera modificació: Abril 2008
© UPC. Technical University of Catalonia
Departament de Llenguatges i Sistemes Informàtics
About this web.