research projects

CROSSTEXT: Automatic Generation of Multilingual Semantic Processors

Automatic generation of multilingual semantic taggers
(2017 - 2019)

The lack of hand curated data is a major impediment to developing statistical semantic processors for many of the world languages, including the 4 official languages of Spain. Our project aims at bridging this gap by leveraging existing annotations and semantic processors from multiple source languages by projecting their annotations via statistical word alignments traditionally used in Machine Translation. Furthermore, we will investigate and propose semi- and weakly-supervised techniques to induce robust semantic processors from the (potentially noisy) automatic generated data by the annotation transfer. In addition to addressing a novel and scarcely research problem in NLP, the semantic processors automatically generated in CrossText could then be deployed by public and private institutions to meet their technologies needs
Organization:  Ministerio de Economía , Industria y Competitividad (Explora)
Main researcher: German Rigau
Rodrigo Agerri, Izaskun Aldezabal, Iñaki Alegria, Oier Lopez de Lacalle , German Rigau , Kepa Sarasola, Ruben Urizar

