Project Summary

MEANING will be concerned with automatically collecting and analysing language data from the WWW on a large scale, and building more comprehensive multilingual lexical knowledge bases to support improved word sense disambiguation (WSD).

Current web access applications are based on words; MEANING will open the way for access to the Multilingual Web based on concepts, providing applications with capabilities that significantly exceed those currently available. MEANING will facilitate development of concept-based open domain Internet applications (such as Question/Answering, Cross Lingual Information Retrieval, Summarisation, Text Categorisation, Event Tracking, Information Extraction, Machine Translation, etc.). Furthermore, MEANING will supply a common conceptual structure to Internet documents, thus facilitating knowledge management of web content.

Progress is being made in Human Language Technology (HLT) but there is still a long way towards Natural Language Understanding (NLU). An important step towards this goal is the development of technologies and resources that deal with concepts rather than words. MEANING will develop concept-based technologies and resources through large-scale knowledge processing over the web, robust and fast machine learning algorithms, very large lexical resources and novel strategies for combining them. Small-scale, isolated experiments with limited infrastructure (such as Internet access, processing power, and storage space) have no chance of bridging the gap to understanding. Advances in this area can only be expected in the context of large-scale long-term research projects.

MEANING will treat the web as a (huge) corpus to learn information from, since even the largest conventional corpora available (e.g. the Reuters corpus, the British National Corpus) are not large enough to be able to acquire reliable information in sufficient detail about language behaviour. Moreover, most European languages do not have large or diverse enough corpora available.
more...