Information Extraction and Information Retrieval

The ever increasing availability of unstructured textual resources in the Web and their potential to be used in applications for the automatic acquisition of knowledge have caused a dramatic rise in research related to Information Extraction (IE) and Information Retrieval (IR). Traditionally, the required textual content was produced by means of manual annotations by human experts on the task at hand, which is too costly in terms of both economic and human resources. In the last decade, new t...Read More

More researchers

ie_ir_tabs

Demos

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Contracts

Publications

Language independent sequence labelling for Opinion Target Extraction (2019)

Rodrigo Agerri, German Rigau

Artificial Intelligence, 268 (2018) 85-95

Word n-gram attention models for sentence similarity and inference (2019)

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054. Pre-print available arXiv:1612.04868

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

Ander Barrena, Aitor Soroa, Eneko Agirre

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Word Sense Disambiguation (2018)

Mark Stevenson, Eneko Agirre

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Multi-lingual and Cross-lingual timeline extraction (2017)

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Knowledge-Based Systems, 133, 77-89

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Rodrigo Agerri, German Rigau

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

More publications

Projects

Patents

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

Resources

  • EIEC
    Basque Named Entity Recognition corpus.
  • EDIEC
    Basque corpus annotated for Named Entity Disambiguation.
  • MCR: Multilingual Central Repository
    Multilingual lexical database with wordnets for several European languages, including Basque.
  • EPEC-EuSemcor
    Corpus tagged with Basque WordNet senses.

ie_ir_tabs_full

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Language independent sequence labelling for Opinion Target Extraction (2019)

Rodrigo Agerri, German Rigau

Artificial Intelligence, 268 (2018) 85-95

Word n-gram attention models for sentence similarity and inference (2019)

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054. Pre-print available arXiv:1612.04868

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

Ander Barrena, Aitor Soroa, Eneko Agirre

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Word Sense Disambiguation (2018)

Mark Stevenson, Eneko Agirre

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Multi-lingual and Cross-lingual timeline extraction (2017)

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Knowledge-Based Systems, 133, 77-89

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Rodrigo Agerri, German Rigau

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

More publications

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

  • EIEC
    Basque Named Entity Recognition corpus.
  • EDIEC
    Basque corpus annotated for Named Entity Disambiguation.
  • MCR: Multilingual Central Repository
    Multilingual lexical database with wordnets for several European languages, including Basque.
  • EPEC-EuSemcor
    Corpus tagged with Basque WordNet senses.