Ves al contingut. Salta a la navegació
Esteu aquí: Inici > LSI > +LSI > Newsletter issue 4 > Algorithms for understanding humans
I can't log in
 
LSI
Accions del document

Algorithms for understanding humans

Lluís Màrquez is a researcher of the GRPLN research group. They have spent over fifteen years teaching machines to understand and to talk in human languages. Nowadays his research is focused on the solution of semantic processing problems and on the application of natural language processing to machine translation.


LogoDelicious  Digg!


In this new Newsletter, we have interviewed someone who founded the Research group in Natural Language Processing (GRPLN), Dr. Horacio Rodríguez and now, in this third article, we will talk to one of his former PhD students, Dr.  Lluís Màrquez, now a professor at UPC and a researcher in the same group.

Lluís Màrquez
LluisM Tell us about  your trajectory
 I studied Computer Science at the Informatics School of Barcelona (FIB).  Then I went on to my PhD studies under the supervision of Horacio Rodríguez and I have been lecturing here at UPC since 1993.

What was your PhD Thesis about?
I studied subjects related to machine learning. The main theme of my research was a Part-of-speech tagger. Natural Language has words that can work with different functions inside a grammatical sentence, this different functions lead to different meanings; therefore it is really necessary to distinguish between these meanings.

How would you classify your research?
Our research group GRPLN, belonged to the Artificial Intelligence section of the Software Department (LSI) at UPC. The Software Department has no research sections anymore so there is no a clear section of Artificial Intelligence; however the natural language processing is obviously a branch of Artificial Intelligence science knowledge.

We develop computational models for language processing. In particular, we research in  the understanding of textual language  and its applications, like machine translation.

In fact, Natural Language has a speciality in the el Artificial Intelligence masters and PhD studies. There are two optional subjects

  • Natural Language Processing for the massive treatment of textual information.
  • Natural Language Processing for the machine/human communication.
  I coordinate the first of these two subjects.

TALP We are also part of a bigger interdepartmental group, called Centre de Tecnologies i Aplicacions del Llenguatge i la Parla (TALP). TALP is formed by researchers from our department and from the Departament de Teoria del senyal i la Comunicació at the UPC. Our group is more focused on the textual part of language. We work on language treatment, understanding, content reasoning and solving varied problems. Telecommunication engineers at TALP, on the other hand, work on the acoustic treatment of the signal. The acoustics appear twice: when doing  speech recognition, i.e., conversion of an acoustic signal into a textual signal. It also appears in speech generation, i.e., conversion from text into acoustic signal (synthesized speech). Our job is to deal with the process in between and we apply it to applications at telephone companies, open question services, information services, etc.

How do you apply Artificial Intelligence to your research?
I work with statistical machine learning. As you know, in our department there are other groups who also use Machine Learning in their research. What makes us different from the others is that we apply it to language issues.

For example, if we have a certain translation problem and a good bilingual corpus where we can find many examples of sentences in both languages, we can develop a general algorithm which is able to learn, from examples, the necessary knowledge to translate texts.

Language seems something very complex, how did you start your research?
Xinès In the study of linguistic analysis different important factors appear, each of them quite complex on its own. This is the reason why the text processing is divided into different stages of increasing difficulty: segmentation, morphology, syntax, semantics, ...etc.

At a basic level, it is important to break text into words and to treat their morphology. The difficulty of this task changes depending on the language we are dealing with. The morphological analysis of highly flexive and agglutinative languages can be very hard. Arabic, for example, has no vowels on its written text; therefore it is highly ambiguous and a big challenge. By the way, although we mostly work with Catalan, Spanish and English, we also work with other different languages like Arabic or Chinese. A basic characteristic of Machine Learning is that we pursue and obtain general theories, which are almost independent from the language. So, we can apply our methods to a variety of languages.

On the other hand, syntax is interesting in order to know how to group words in order to generate structures. This knowledge is useful for the right interpretation of texts.

We can find two branches of semantics in relation to text interpretation. Lexical semantics, is the part where you study the meaning of words. Propositional semantics studies the meaning of the predicates of the sentences.

Finally, pragmatics studies the meaning considering discursive properties and world knowledge.

We build tools to solve problems that can be of help in any of these fields.

How do you organise yourselves as a research group?
In our group you can find people who devote their time to anyone of the following different scopes and applications.

  • Machine Learning
  • Information extraction
  • Question-answering Systems
  • Machine Translation
  • Summarization
  • Machine-person dialogue Systems
  • Development of basic linguistic processing tools.

On the other hand, our research group has three specialists in linguistics and we collaborate with other research groups on computational linguistics.
From these specialities, which one you would fell more like yours?
CapIAI worked always by applying Machine Learning. I develop tools to solve semantic processing problems and work on the applications of Machine Learning into Machine Translation.

In the translation field we are trying to create a hybrid system that combines Statistical Machine Translation, rule-based Machine Translation and example-based Machine translation. On the other hand, one of our objectives is to improve Statistical Machine Translation by introducing high level linguistics information.

Have you been successful?
Well, there are international evaluation competitions where they propose a Natural Language Processing problem with all the necessary data and with a an strictly experimental  setting. Participant groups have some months to work on different systems and solutions. Finally there is a Conference where all the results are put in common and the best ones get a gift. We have participate many times and we have god very good results. The "Shared Tasks", from the "Computational Natural Language Leraning" (CoNLL) are some of the most representative NLP competitions and where we have participated and even been involved in the organisation. These competitions are organised by SIGNLL (Special Interest Group on Natural Language Learning) which is a SIG of ACL (Association for Computational Linguistics) and they started in 1999.

In my opinion there is a lot of future and a really good one. Nowadays, there are more researchers in love with Natural Language and its applications. Although there is a large path to go through, we are moving forward in the Natural Language research. We are a group made up by 33 people; 14 professors, 3 researchers, 12 PhD students, 2 master student and 2 developer who participate in many national and european research projects.

I believe that our group is becoming a leading group in the area of Natural Language Processing Research.

Press Contact:
ilapuente@lsi.upc.edu

(Back to the Newsletter)
 
Darrera modificació: Maig 2008
© UPC. Technical University of Catalonia
Departament de Llenguatges i Sistemes Informàtics
About this web.