Information Extraction and Information Retrieval

The ever increasing availability of unstructured textual resources in the Web and their potential to be used in applications for the automatic acquisition of knowledge have caused a dramatic rise in research related to Information Extraction (IE) and Information Retrieval (IR). Traditionally, the required textual content was produced by means of manual annotations by human experts on the task at hand, which is too costly in terms of both economic and human resources. In the last decade, new t...Read More

see more

ie_ir_tabs

Demos

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Contracts



  • (2024 - 2025)


  • (2023 - 2024)
  • Adimen artifizial sortzailea web mintegia (webinar).
    Online course for Gipuzkoa Provincial Council employees
    (2024 - 2024)

  • Data Privacy in Artificial Intelligence for Health Applications: A QA system to extract specific information from medical reports that can be used for better decision making
    (2020 - 2021)

  • Pre-training cross-lingual language models
    (2020 - 2020)


  • (2019 - 2020)
All HiTZ projects.

Projects

Patents

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

Resources

  • EIEC
    Basque Named Entity Recognition corpus.
  • EDIEC
    Basque corpus annotated for Named Entity Disambiguation.
  • MCR: Multilingual Central Repository
    Multilingual lexical database with wordnets for several European languages, including Basque.
  • EPEC-EuSemcor
    Corpus tagged with Basque WordNet senses.

Publications

Olia Toporkov, Rodrigo Agerri

On the Role of Morphological Information for Contextual Lemmatization (2024)

Computational Linguistics (MIT Press).

Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)

The Twelfth International Conference on Learning Representations

Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre

Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (2024)

Proceeding of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy

Eneko Agirre, Itziar Aldabe, Xabier Arregi, Mikel Artetxe, Unai Atutxa, Ekhi Azurmendi, Iker De la Iglesia, Julen Etxaniz, Victor García-Romillo, Inma Hernaez-Rioja, Asier Herranz, Mikel Iruskieta, Oier López de Lacalle, Eva Navas, Paula Ontalvilla, Aitor Ormazabal, Naiara Perez, German Rigau1 Oscar Sainz, Jon Sanchez, Ibon Saratxaga, Aitor Soroa, Christoforos Souganidis, Jon Vadillo and Aimar Zabala

IKER-GAITU: research on language technology for Basque and other low-resource languages (2024)

-

Eneko Agirre, Olatz Arbelaitz, Olatz Arregi, Gorka Azkune, Arantza Casillas, Inma Hernaez, Mikel Iruskieta, Elena Lazkano, Eva Navas, German Rigau, Roberto Santana, Aitor Soroa and Rabih Zbib

ENIA Chair in Artificial Intelligence and Language Technology (2024)

-

Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.

A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)

Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand

Ahmed Elhady, Khaled Elsayed, Eneko Agirre, and Mikel Artetxe

Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 755–761, Mexico City, Mexico. Association for Computational Linguistics.

Iñigo Alonso, Eneko Agirre, Mirella Lapata

PixT3: Pixel-based Table To Text generation (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)

Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea

EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137

Itziar Gonzalez-Dios, Javier Alvez, and German Rigau

Exploiting Metonymy from Available Knowledge Resources. (2023)

20th International Conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part I. Lecture Notes in Computer Science book series (LNCS, volume 13451), pp 34-43

Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

Image captioning for effective use of language models in knowledge-based visual question answering (2023)

Expert Systems with Applications, 2023, vol. 212, p. 118669. Preprint: https://arxiv.org/abs/2109.08029

Nayla Escribano, German Rigau, Rodrigo Agerri

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Nayla Escribano, German Rigau, Rodrigo Agerri, A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods, Knowledge-Based Systems, Volume 273, 2023, 110612, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2023.110612. (https://www.sciencedirect.com/science/article/pii/S0950705123003623) Abstract: Detecting and normalizing temporal expressions is an essential step for many NLP tasks. While a variety of methods have been proposed for detection, best normalization approaches rely on hand-crafted rules. Furthermore, most of them have been designed only for English. In this paper we present a modular multilingual temporal processing system combining a fine-tuned Masked Language Model for detection, and a grammar-based normalizer. We experiment in Spanish and English and compare with HeidelTime, the state-of-the-art in multilingual temporal processing. We obtain best results in gold timex normalization, timex detection and type recognition, and competitive performance in the combined TempEval-3 relaxed value metric. A detailed error analysis shows that detecting only those timexes for which it is feasible to provide a normalization is highly beneficial in this last metric. This raises the question of which is the best strategy for timex processing, namely, leaving undetected those timexes for which is not easy to provide normalization rules or aiming for high coverage. Keywords: Temporal processing; Multilingualism; Sequence labeling; Grammar-based approaches; Deep learning; Natural language processing

Murali Kondragunta, Olatz Perez-de-Viñaspre, Maite Oronoz

Improving and Simplifying Template-Based Named Entity Recognition (2023)

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 79–86, Dubrovnik, Croatia. Association for Computational Linguistics. May 2023, Dubrovnik, Croatia.

Rodrigo Agerri, Eneko Agirre

Lessons learned from the evaluation of Spanish Language Models (2023)

Procesamiento del Lenguaje Natural (70), pp 157-170

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Scaling Laws for BERT in Low-Resource Settings (2023)

Findings of the Association for Computational Linguistics: ACL 2023

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

ACM Computing Surveys. 27 June 2023

Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison

Identifying Token-Level Dialectal Features in Social Media (2023)

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Iñigo Alonso, Eneko Agirre

Automatic Logical Forms improve fidelity in Table-to-Text generation (2023)

Expert Systems with Applications, Volume 238, Part D, 15 March 2024, 121869 https://arxiv.org/abs/2310.17279

Begoña Altuna, Rodrigo Agerri, Lidia Salas-Espejo, José Javier Saiz, Roberto Zanoli, Manuela Speranza, Bernardo Magnini, Alberto Lavelli, Goutham Karunakaran

Overview of TESTLINK at IberLEF 2023: Linking Results to Clinical Laboratory Tests and Measurements (2023)

Procesamiento del Lenguaje Natural, Revista nº 71, 313-320, septiembre de 2023.

Begoña Altuna, Goutham Karunakaran, Alberto Lavelli, Bernardo Magnini, Manuela Speranza, Roberto Zanoli

CLinkaRT at EVALITA 2023: Overview of the Task on Linking a Lab Result to its Test Event in the Clinical Domain (2023)

Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Parma 2023.

Roberto Centeno, Rodrigo Agerri

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

Roberto Centeno and Rodrigo Agerri (2023). Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation. In Proceedings of the Workshop on NLP applied to Misinformation, co-located with the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023).

Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova (2023). HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine. In SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing.

Joseba Fernandez de Landa, Rodrigo Agerri

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Joseba Fernandez de Landa, Rodrigo Agerri (2023). HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 2023.

Arantxa Otegi, Iñaki San Vicente, Xabier Saralegi, Anselmo Peñas, Borja Lozano, Eneko Agirre

Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)

Knowledge-Based Systems, Volume 240.

Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre

Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning (2022)

In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, Washington. Association for Computational Linguistics.

Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min

ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations (2022)

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Seattle, Washington. Association for Computational Linguistics.

Eneko Agirre

Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

E Agirre, M Apidianaki, I Vulić

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland

David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal

Direct Parsing to Sentiment Graphs (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.

Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

In SemEval 2022

Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri

A Semantics-Aware Approach to Automated Claim Verification (2022)

In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics

Cristina Aceta, Johan Kildal, Izaskun Fernández, Aitor Soroa

Towards an optimal design of natural human interaction mechanisms for a service robot with ancillary way-finding capabilities in industrial environments (2021)

Production & Manufacturing Research, 9:1, 1-32

Ainhoa Serna, Aitor Soroa, Rodrigo Agerri

Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport (2021)

Sustainability 13, no. 4: 2397.

Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre

Inferring spatial relations from textual descriptions of images (2021)

Pattern Recognition, Volume 113, 107847. Pre-print: https://arxiv.org/abs/2102.00997

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2021)

In conjunction with NAACL. Association for Computational Linguistics

Elena Zotova, Rodrigo Agerri, German Rigau

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Expert Systems with Applications, 170 (2021).

Jon Alkorta

Hacia el análisis de sentimientos en euskera (2021)

J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.

Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos

Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Ander Barrena, Aitor Soroa, Eneko Agirre

Towards Zero-Shot Cross-Lingual Named Entity Disambiguation (2021)

Expert Systems With Applications ESWA 2021

Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction (2021)

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Rodrigo Agerri, Roberto Centeno, María Espinosa, Joseba Fernández de Landa, Álvaro Rodrigo

VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)

Procesamiento del Lenguaje Natural, 67, pp 173-181

Iker García-Ferrero, Rodrigo Agerri, German Rigau

Benchmarking Meta-embeddings: What Works and What Does Not (2021)

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021

Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Multilingual Counter Narrative Type Classification (2021)

Proceedings of Argument Mining 2021

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project: European Clinical Case Corpus (2021)

Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf

Eneko Agirre

Cross-Lingual Word Embeddings (Book Review) (2020)

Computational Linguistics 46 (1), 245-248. (https://doi.org/10.1162/COLI_r_00372)

Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre

Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)

Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain

Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar

Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2020)

International Joint Conference on Artificial Intelligence (IJCAI 2020)

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning (2020)file2 (2020)

Frontiers in Artificial Intelligence and Applications. Giuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, Jérôme Lang (eds.). Volume 325: ECAI 2020. Pages 585 - 592. IOS Press Ebooks

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcia-Serrano, Mohamed Ben Aouicha, Eneko Agirre, David Sánchez

A large reproducible benchmark of ontology-based methods and word embeddings for word similarity (2020)

Information Systems. Online first.

Iker de la Iglesia, Mikel Martinez-Puente, Alexander Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola

MEDIA team at the CLEF-2020 MultilingualInformation Extraction Task (2020)

Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum Thessaloniki, Greece, September 22-25, 2020.

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2020)

In conjunction with EMNLP. Association for Computational Linguistics

Rodrigo Agerri, German Rigau

Projecting Heterogeneous Annotations for Named Entity Recognition (2020)

In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020). Winner of the

CAPITEL@IberLEF
task on Spanish NER.

María Espinosa, Rodrigo Agerri, Roberto Centeno, Alvaro Rodrigo

DeepReading@SardiStance:Combining Textual, Social and Emotional Features. (2020)

Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). Winners of the

SardiStance@Evalita
2020 shared task

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2019)

Artificial Intelligence, 268 (2019) 85-95

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief, Volume 26.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665.

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Rodrigo Agerri

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)

SemEval@NAACL-HLT
2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf

Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria

Euskaldun gazte eta helduen harremanak Twitterren (2019)

III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90

Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau

Automatic white-box testing of first-order logic ontologies (2019)

Journal of Logic and Computation, Volume 29, Issue 5, September 2019, Pages 723–751

Alvez,J; Lucio,P; Rigau,G

A Framework for the Evaluation of SUMO-Based Ontologies Using WordNet (2019)

IEEE Access, 7, 36075-36093. 2019

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Josu Goikoetxea, Aitor Soroa eta Eneko Agirre

Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Rodrigo Agerri, German Rigau

Simple Language Independent Sequence Labelling for the Annotation of Disabilities in Medical Texts (2018)

Proceedings of the Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), Diann Track, Sevilla, Spain.

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Multi-lingual and Cross-lingual timeline extraction (2017)

Knowledge-Based Systems, 133, 77-89

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Goikoetxea J., Agirre E., Soroa A.

Single or Multiple. Combining Word Representations Independently Learned from Text and WordNet (2016)

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. pp. 2608-26014. ISBN: 978-1-57735-760-5. Phoenix (USA).

Rodrigo Agerri, German Rigau

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

Hugo Manguinhas, Nuno Freire, Antoine Isaac, Juliane Stiller, Valentine Charles, Aitor Soroa, Rainer Simon, Vladimir Alexiev

Exploring Comparative Evaluation of Semantic Enrichment Tools for Cultural Heritage Metadata (2016)

Proceedings of the 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016, Vol 9818, pp 266-278

Ander Intxaurrondo, Eneko Agirre, Oier Lopez de Lacalle, Mihai Surdeanu

Diamonds in the Rough: Event Extraction from Imperfect Microblog Data (2015)file2 (2015)file3 (2015)

Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL HLT), pages: 641-650. ISBN: 978-1-941643-49-5.

Goikoetxea J., Agirre E., Soroa A.

Random Walks and Neural Network Language Models on Knowledge Bases (2015)

Proceedings of the Annual Meeting of the North American chapter of the Association of Computational Linguistics (NAACL HLT 2015), pages 1434-1439. ISBN: 978-1-937284-73-2. Denver (USA).

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)

Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf

Mark Hall, Eneko Agirre, Nikolas Aletras, Runar Bergheim, Kostas Chandrinos, Paul Clough, Samuel Fernando, Kate Fernie, Paula Goodale, Jill Griffiths, Oier Lopez de Lacalle, Andrea de Polo, Aitor Soroa, Mark Stevenson

PATHS - Exploring Digital Cultural Heritage Spaces (2012)

Theory and Practice of Digital Libraries 2012. ISBN 9783642332906 ISSN 0302-9743

Arantxa Otegi

Hedapena informazioaren berreskurapenean: hitzen adiera-desanbiguazioaren eta antzekotasun semantikoaren ekarpenak (2012)file2 (2012)

Lengoaia eta Sistema Informatikoak Saila, EHU/UPV. Informatika Fakultatea. 2012/03/16

Iñaki Alegria, Bertol Arrieta, Arantza Diaz de Ilarraza, Elixabete Izagirre, Montse Maritxalar

Using Machine Learning Techniques to Build a Comma Checker for Basque (2006)

Proceedings of Coling-ACL 2006. Sydney. Australia.ISBN: 1-932432-69-8 pp.1-8. https://aclanthology.org/P06-4000/

A. Casillas, V. Fresno, R. Martínez, S. Montalvo

Evaluación del clustering de páginas web mediante funciones de peso y combinación heurística de criterios (2005)

Revista Española para el Procesamiento del Lenguaje Natural, 35, 417-424 .https://1library.co/document/yn4mkjpz-evaluacion-clustering-paginas-mediante-funciones-combinacion-heuristica-criterios.html

All HiTZ publications

ie_ir_tabs_full

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer



  • (2024 - 2025)


  • (2023 - 2024)
  • Adimen artifizial sortzailea web mintegia (webinar).
    Online course for Gipuzkoa Provincial Council employees
    (2024 - 2024)

  • Data Privacy in Artificial Intelligence for Health Applications: A QA system to extract specific information from medical reports that can be used for better decision making
    (2020 - 2021)

  • Pre-training cross-lingual language models
    (2020 - 2020)


  • (2019 - 2020)
All HiTZ projects.

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

  • EIEC
    Basque Named Entity Recognition corpus.
  • EDIEC
    Basque corpus annotated for Named Entity Disambiguation.
  • MCR: Multilingual Central Repository
    Multilingual lexical database with wordnets for several European languages, including Basque.
  • EPEC-EuSemcor
    Corpus tagged with Basque WordNet senses.

Olia Toporkov, Rodrigo Agerri

On the Role of Morphological Information for Contextual Lemmatization (2024)

Computational Linguistics (MIT Press).

Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)

The Twelfth International Conference on Learning Representations

Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre

Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (2024)

Proceeding of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy

Eneko Agirre, Itziar Aldabe, Xabier Arregi, Mikel Artetxe, Unai Atutxa, Ekhi Azurmendi, Iker De la Iglesia, Julen Etxaniz, Victor García-Romillo, Inma Hernaez-Rioja, Asier Herranz, Mikel Iruskieta, Oier López de Lacalle, Eva Navas, Paula Ontalvilla, Aitor Ormazabal, Naiara Perez, German Rigau1 Oscar Sainz, Jon Sanchez, Ibon Saratxaga, Aitor Soroa, Christoforos Souganidis, Jon Vadillo and Aimar Zabala

IKER-GAITU: research on language technology for Basque and other low-resource languages (2024)

-

Eneko Agirre, Olatz Arbelaitz, Olatz Arregi, Gorka Azkune, Arantza Casillas, Inma Hernaez, Mikel Iruskieta, Elena Lazkano, Eva Navas, German Rigau, Roberto Santana, Aitor Soroa and Rabih Zbib

ENIA Chair in Artificial Intelligence and Language Technology (2024)

-

Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.

A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)

Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand

Ahmed Elhady, Khaled Elsayed, Eneko Agirre, and Mikel Artetxe

Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 755–761, Mexico City, Mexico. Association for Computational Linguistics.

Iñigo Alonso, Eneko Agirre, Mirella Lapata

PixT3: Pixel-based Table To Text generation (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)

Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea

EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137

Itziar Gonzalez-Dios, Javier Alvez, and German Rigau

Exploiting Metonymy from Available Knowledge Resources. (2023)

20th International Conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part I. Lecture Notes in Computer Science book series (LNCS, volume 13451), pp 34-43

Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

Image captioning for effective use of language models in knowledge-based visual question answering (2023)

Expert Systems with Applications, 2023, vol. 212, p. 118669. Preprint: https://arxiv.org/abs/2109.08029

Nayla Escribano, German Rigau, Rodrigo Agerri

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Nayla Escribano, German Rigau, Rodrigo Agerri, A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods, Knowledge-Based Systems, Volume 273, 2023, 110612, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2023.110612. (https://www.sciencedirect.com/science/article/pii/S0950705123003623) Abstract: Detecting and normalizing temporal expressions is an essential step for many NLP tasks. While a variety of methods have been proposed for detection, best normalization approaches rely on hand-crafted rules. Furthermore, most of them have been designed only for English. In this paper we present a modular multilingual temporal processing system combining a fine-tuned Masked Language Model for detection, and a grammar-based normalizer. We experiment in Spanish and English and compare with HeidelTime, the state-of-the-art in multilingual temporal processing. We obtain best results in gold timex normalization, timex detection and type recognition, and competitive performance in the combined TempEval-3 relaxed value metric. A detailed error analysis shows that detecting only those timexes for which it is feasible to provide a normalization is highly beneficial in this last metric. This raises the question of which is the best strategy for timex processing, namely, leaving undetected those timexes for which is not easy to provide normalization rules or aiming for high coverage. Keywords: Temporal processing; Multilingualism; Sequence labeling; Grammar-based approaches; Deep learning; Natural language processing

Murali Kondragunta, Olatz Perez-de-Viñaspre, Maite Oronoz

Improving and Simplifying Template-Based Named Entity Recognition (2023)

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 79–86, Dubrovnik, Croatia. Association for Computational Linguistics. May 2023, Dubrovnik, Croatia.

Rodrigo Agerri, Eneko Agirre

Lessons learned from the evaluation of Spanish Language Models (2023)

Procesamiento del Lenguaje Natural (70), pp 157-170

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Scaling Laws for BERT in Low-Resource Settings (2023)

Findings of the Association for Computational Linguistics: ACL 2023

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

ACM Computing Surveys. 27 June 2023

Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison

Identifying Token-Level Dialectal Features in Social Media (2023)

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Iñigo Alonso, Eneko Agirre

Automatic Logical Forms improve fidelity in Table-to-Text generation (2023)

Expert Systems with Applications, Volume 238, Part D, 15 March 2024, 121869 https://arxiv.org/abs/2310.17279

Begoña Altuna, Rodrigo Agerri, Lidia Salas-Espejo, José Javier Saiz, Roberto Zanoli, Manuela Speranza, Bernardo Magnini, Alberto Lavelli, Goutham Karunakaran

Overview of TESTLINK at IberLEF 2023: Linking Results to Clinical Laboratory Tests and Measurements (2023)

Procesamiento del Lenguaje Natural, Revista nº 71, 313-320, septiembre de 2023.

Begoña Altuna, Goutham Karunakaran, Alberto Lavelli, Bernardo Magnini, Manuela Speranza, Roberto Zanoli

CLinkaRT at EVALITA 2023: Overview of the Task on Linking a Lab Result to its Test Event in the Clinical Domain (2023)

Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Parma 2023.

Roberto Centeno, Rodrigo Agerri

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

Roberto Centeno and Rodrigo Agerri (2023). Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation. In Proceedings of the Workshop on NLP applied to Misinformation, co-located with the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023).

Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova (2023). HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine. In SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing.

Joseba Fernandez de Landa, Rodrigo Agerri

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Joseba Fernandez de Landa, Rodrigo Agerri (2023). HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 2023.

Arantxa Otegi, Iñaki San Vicente, Xabier Saralegi, Anselmo Peñas, Borja Lozano, Eneko Agirre

Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)

Knowledge-Based Systems, Volume 240.

Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre

Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning (2022)

In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, Washington. Association for Computational Linguistics.

Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min

ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations (2022)

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Seattle, Washington. Association for Computational Linguistics.

Eneko Agirre

Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

E Agirre, M Apidianaki, I Vulić

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland

David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal

Direct Parsing to Sentiment Graphs (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.

Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

In SemEval 2022

Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri

A Semantics-Aware Approach to Automated Claim Verification (2022)

In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics

Cristina Aceta, Johan Kildal, Izaskun Fernández, Aitor Soroa

Towards an optimal design of natural human interaction mechanisms for a service robot with ancillary way-finding capabilities in industrial environments (2021)

Production & Manufacturing Research, 9:1, 1-32

Ainhoa Serna, Aitor Soroa, Rodrigo Agerri

Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport (2021)

Sustainability 13, no. 4: 2397.

Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre

Inferring spatial relations from textual descriptions of images (2021)

Pattern Recognition, Volume 113, 107847. Pre-print: https://arxiv.org/abs/2102.00997

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2021)

In conjunction with NAACL. Association for Computational Linguistics

Elena Zotova, Rodrigo Agerri, German Rigau

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Expert Systems with Applications, 170 (2021).

Jon Alkorta

Hacia el análisis de sentimientos en euskera (2021)

J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.

Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos

Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Ander Barrena, Aitor Soroa, Eneko Agirre

Towards Zero-Shot Cross-Lingual Named Entity Disambiguation (2021)

Expert Systems With Applications ESWA 2021

Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction (2021)

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Rodrigo Agerri, Roberto Centeno, María Espinosa, Joseba Fernández de Landa, Álvaro Rodrigo

VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)

Procesamiento del Lenguaje Natural, 67, pp 173-181

Iker García-Ferrero, Rodrigo Agerri, German Rigau

Benchmarking Meta-embeddings: What Works and What Does Not (2021)

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021

Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Multilingual Counter Narrative Type Classification (2021)

Proceedings of Argument Mining 2021

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project: European Clinical Case Corpus (2021)

Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf

Eneko Agirre

Cross-Lingual Word Embeddings (Book Review) (2020)

Computational Linguistics 46 (1), 245-248. (https://doi.org/10.1162/COLI_r_00372)

Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre

Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)

Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain

Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar

Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2020)

International Joint Conference on Artificial Intelligence (IJCAI 2020)

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning (2020)file2 (2020)

Frontiers in Artificial Intelligence and Applications. Giuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, Jérôme Lang (eds.). Volume 325: ECAI 2020. Pages 585 - 592. IOS Press Ebooks

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcia-Serrano, Mohamed Ben Aouicha, Eneko Agirre, David Sánchez

A large reproducible benchmark of ontology-based methods and word embeddings for word similarity (2020)

Information Systems. Online first.

Iker de la Iglesia, Mikel Martinez-Puente, Alexander Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola

MEDIA team at the CLEF-2020 MultilingualInformation Extraction Task (2020)

Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum Thessaloniki, Greece, September 22-25, 2020.

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2020)

In conjunction with EMNLP. Association for Computational Linguistics

Rodrigo Agerri, German Rigau

Projecting Heterogeneous Annotations for Named Entity Recognition (2020)

In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020). Winner of the

CAPITEL@IberLEF
task on Spanish NER.

María Espinosa, Rodrigo Agerri, Roberto Centeno, Alvaro Rodrigo

DeepReading@SardiStance:Combining Textual, Social and Emotional Features. (2020)

Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). Winners of the

SardiStance@Evalita
2020 shared task

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2019)

Artificial Intelligence, 268 (2019) 85-95

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief, Volume 26.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665.

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Rodrigo Agerri

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)

SemEval@NAACL-HLT
2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf

Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria

Euskaldun gazte eta helduen harremanak Twitterren (2019)

III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90

Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau

Automatic white-box testing of first-order logic ontologies (2019)

Journal of Logic and Computation, Volume 29, Issue 5, September 2019, Pages 723–751

Alvez,J; Lucio,P; Rigau,G

A Framework for the Evaluation of SUMO-Based Ontologies Using WordNet (2019)

IEEE Access, 7, 36075-36093. 2019

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Josu Goikoetxea, Aitor Soroa eta Eneko Agirre

Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Rodrigo Agerri, German Rigau

Simple Language Independent Sequence Labelling for the Annotation of Disabilities in Medical Texts (2018)

Proceedings of the Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), Diann Track, Sevilla, Spain.

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Multi-lingual and Cross-lingual timeline extraction (2017)

Knowledge-Based Systems, 133, 77-89

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Goikoetxea J., Agirre E., Soroa A.

Single or Multiple. Combining Word Representations Independently Learned from Text and WordNet (2016)

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. pp. 2608-26014. ISBN: 978-1-57735-760-5. Phoenix (USA).

Rodrigo Agerri, German Rigau

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

Hugo Manguinhas, Nuno Freire, Antoine Isaac, Juliane Stiller, Valentine Charles, Aitor Soroa, Rainer Simon, Vladimir Alexiev

Exploring Comparative Evaluation of Semantic Enrichment Tools for Cultural Heritage Metadata (2016)

Proceedings of the 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016, Vol 9818, pp 266-278

Ander Intxaurrondo, Eneko Agirre, Oier Lopez de Lacalle, Mihai Surdeanu

Diamonds in the Rough: Event Extraction from Imperfect Microblog Data (2015)file2 (2015)file3 (2015)

Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL HLT), pages: 641-650. ISBN: 978-1-941643-49-5.

Goikoetxea J., Agirre E., Soroa A.

Random Walks and Neural Network Language Models on Knowledge Bases (2015)

Proceedings of the Annual Meeting of the North American chapter of the Association of Computational Linguistics (NAACL HLT 2015), pages 1434-1439. ISBN: 978-1-937284-73-2. Denver (USA).

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)

Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf

Mark Hall, Eneko Agirre, Nikolas Aletras, Runar Bergheim, Kostas Chandrinos, Paul Clough, Samuel Fernando, Kate Fernie, Paula Goodale, Jill Griffiths, Oier Lopez de Lacalle, Andrea de Polo, Aitor Soroa, Mark Stevenson

PATHS - Exploring Digital Cultural Heritage Spaces (2012)

Theory and Practice of Digital Libraries 2012. ISBN 9783642332906 ISSN 0302-9743

Arantxa Otegi

Hedapena informazioaren berreskurapenean: hitzen adiera-desanbiguazioaren eta antzekotasun semantikoaren ekarpenak (2012)file2 (2012)

Lengoaia eta Sistema Informatikoak Saila, EHU/UPV. Informatika Fakultatea. 2012/03/16

Iñaki Alegria, Bertol Arrieta, Arantza Diaz de Ilarraza, Elixabete Izagirre, Montse Maritxalar

Using Machine Learning Techniques to Build a Comma Checker for Basque (2006)

Proceedings of Coling-ACL 2006. Sydney. Australia.ISBN: 1-932432-69-8 pp.1-8. https://aclanthology.org/P06-4000/

A. Casillas, V. Fresno, R. Martínez, S. Montalvo

Evaluación del clustering de páginas web mediante funciones de peso y combinación heurística de criterios (2005)

Revista Española para el Procesamiento del Lenguaje Natural, 35, 417-424 .https://1library.co/document/yn4mkjpz-evaluacion-clustering-paginas-mediante-funciones-combinacion-heuristica-criterios.html

All HiTZ publications