Information Extraction and Information Retrieval

The ever increasing availability of unstructured textual resources in the Web and their potential to be used in applications for the automatic acquisition of knowledge have caused a dramatic rise in research related to Information Extraction (IE) and Information Retrieval (IR). Traditionally, the required textual content was produced by means of manual annotations by human experts on the task at hand, which is too costly in terms of both economic and human resources. In the last decade, new t...Read More

see more

ie_ir_tabs

Demos

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Contracts

Projects

Patents

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

Resources

  • EIEC
    Basque Named Entity Recognition corpus.
  • EDIEC
    Basque corpus annotated for Named Entity Disambiguation.
  • MCR: Multilingual Central Repository
    Multilingual lexical database with wordnets for several European languages, including Basque.
  • EPEC-EuSemcor
    Corpus tagged with Basque WordNet senses.

Publications

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief. DOI: https://doi.org/10.1016/j.dib.2019.104432

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665. DOI: https://doi.org/10.1016/j.engappai.2019.07.010

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2018)

Artificial Intelligence, 268 (2018) 85-95

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Multi-lingual and Cross-lingual timeline extraction (2017)

Knowledge-Based Systems, 133, 77-89

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Rodrigo Agerri, German Rigau

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

More publications

ie_ir_tabs_full

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

  • EIEC
    Basque Named Entity Recognition corpus.
  • EDIEC
    Basque corpus annotated for Named Entity Disambiguation.
  • MCR: Multilingual Central Repository
    Multilingual lexical database with wordnets for several European languages, including Basque.
  • EPEC-EuSemcor
    Corpus tagged with Basque WordNet senses.

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief. DOI: https://doi.org/10.1016/j.dib.2019.104432

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665. DOI: https://doi.org/10.1016/j.engappai.2019.07.010

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2018)

Artificial Intelligence, 268 (2018) 85-95

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Multi-lingual and Cross-lingual timeline extraction (2017)

Knowledge-Based Systems, 133, 77-89

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Rodrigo Agerri, German Rigau

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

More publications