Information Extraction and Information Retrieval
The ever increasing availability of unstructured textual resources in the Web and their potential to be used in applications for the automatic acquisition of knowledge have caused a dramatic rise in research related to Information Extraction (IE) and Information Retrieval (IR). Traditionally, the required textual content was produced by means of manual annotations by human experts on the task at hand, which is too costly in terms of both economic and human resources. In the last decade, new t...Read More
ie_ir_tabs
Demos
Demo of the NewsReader NLP pipeline
Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.
Demo of the NewsReader NLP pipeline
Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format
Eihera
Basque named entities recognizer/classifier
Eustagger
Basque lemmatizer and morphosyntactic analyzer
Contracts
(2024 - 2025)
(2023 - 2024)- Adimen artifizial sortzailea web mintegia (webinar).
Online course for Gipuzkoa Provincial Council employees
(2024 - 2024)
Data Privacy in Artificial Intelligence for Health Applications: A QA system to extract specific information from medical reports that can be used for better decision making
(2020 - 2021)
Pre-training cross-lingual language models
(2020 - 2020)
(2019 - 2020)
Projects
- CLARIAH-EUS-gArA
(20240901 - 20250901) - #neural2speech - Decoding speech and language from the human brain
Deep learning for speech generation from brain.
(2023 - 2026) - DeepMinor: Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
DeepKnowledge (PID2021-127777OB-C21) project funded by MCIN/AEI/10.13039/501100011033 and by FEDER
(2022 - 2025)- ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco
(2023 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Antidote (PCI2020-120717-2) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR
(2021 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)- Disargue: Few-shot Learning and Argumentation to Detect and Fight Misinformation in Social Media
Disargue (TED2021-130810B-C21) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/ PRTR
(2022 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)
Better Extraction from Text Towards Enhanced Retrieval
(2019 - 2023)
Tools for the analysis of parliamentary discourses: polarization, subjectivity and affectivity in the post-truth era
(2020 - 2022)
DeepReading: Mining, Understanding, and Reasoning with Multilingual Content.
(2019 - 2021)
Deep learning, Big Data and knowledge for multilingual text processing.
(2019 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)
Automated surveillance of key questions on COVID-19 in scientific publications
(2020 - 2021)
Learning to Interact with Humans by Lifelong Interaction with Humans
(2017 - 2020)- CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019)
TUNER: Automatic domain adaptation for semantic processing.
(2016 - 2018)- MUSTER: Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
(2016 - 2018) - Openminted: Sharing IXA pipes in the OpenMinTeD platform.
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
(2018 - 2018) All HiTZ projects
Patents
Resources
- EIEC
Basque Named Entity Recognition corpus. - EDIEC
Basque corpus annotated for Named Entity Disambiguation. - MCR: Multilingual Central Repository
Multilingual lexical database with wordnets for several European languages, including Basque. - EPEC-EuSemcor
Corpus tagged with Basque WordNet senses.
Publications
Olia Toporkov, Rodrigo Agerri
On the Role of Morphological Information for Contextual Lemmatization (2024)
Computational Linguistics (MIT Press).
Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)
The Twelfth International Conference on Learning Representations
Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre
Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (2024)
Proceeding of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy
Eneko Agirre, Itziar Aldabe, Xabier Arregi, Mikel Artetxe, Unai Atutxa, Ekhi Azurmendi, Iker De la Iglesia, Julen Etxaniz, Victor García-Romillo, Inma Hernaez-Rioja, Asier Herranz, Mikel Iruskieta, Oier López de Lacalle, Eva Navas, Paula Ontalvilla, Aitor Ormazabal, Naiara Perez, German Rigau1 Oscar Sainz, Jon Sanchez, Ibon Saratxaga, Aitor Soroa, Christoforos Souganidis, Jon Vadillo and Aimar Zabala
IKER-GAITU: research on language technology for Basque and other low-resource languages (2024)
-
Eneko Agirre, Olatz Arbelaitz, Olatz Arregi, Gorka Azkune, Arantza Casillas, Inma Hernaez, Mikel Iruskieta, Elena Lazkano, Eva Navas, German Rigau, Roberto Santana, Aitor Soroa and Rabih Zbib
ENIA Chair in Artificial Intelligence and Language Technology (2024)
-
Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.
A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)
Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).
Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri
Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand
Ahmed Elhady, Khaled Elsayed, Eneko Agirre, and Mikel Artetxe
Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 755–761, Mexico City, Mexico. Association for Computational Linguistics.
Iñigo Alonso, Eneko Agirre, Mirella Lapata
PixT3: Pixel-based Table To Text generation (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)
Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea
EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)
Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137
Iakes Goenaga, Aitziber Atutxa, Koldo Gojenola, Maite Oronoz, Rodrigo Agerri
Explanatory argument extraction of correct answers in resident medical exams (2024)
Artificial Intelligence in Medicine Volume 157, November 2024, 102985
Alain García Olea, Ane García Domingo-Aldama, Marcos Merino Prado, Koldo Gojenola Galletebeitia, Aitziber Atutxa Salazar, Mikel Maeztu Rada, Iván García Díaz, Adrián Costa, Iván Cano, Fernando Díaz, Irene Hernández, Uxue Millet, Ainhoa Etxenike, José Miguel Ormaetxe Merodio
RENDIMIENTO DE LAS EXPRESIONES REGULARES EN EL ANÁLISIS DE INFORMES DE ALTA PRESENTES EN LA HISTORIA CLÍNICA ELECTRÓNICA: EXPRIMIENDO LOS DATOS SECUNDARIOS (2024)
Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 33
Alain García Olea, Ane García Domingo-Aldama, Marcos Merino Prado, Ignacio Díez González, Aitziber Atutxa Salazar, Josu Goikoetxea Salutregi, Koldo Gojenola Galletebeitia, Mikel Maeztu Rada, Iván Cano González, Adrián Costa Santos, Iván García Díaz, Fernando Díaz González, Irene Hernández Pérez, Uxue Millet Oyarzabal y José Miguel Ormaetxe Merodio
RENDIMIENTO DE SISTEMAS DE CHAT ALIMENTADOS CON ARTÍCULOS DE INVESTIGACIÓN EN UN ENTORNO CLÍNICO ESPECÍFICO: LA ENFERMEDAD VALVULAR CARDIACA (2024)
Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 1161
Iñigo Alonso, Eneko Agirre, Mirella Lapata
PixT3: Pixel-based Table-To-Text Generation (2024)
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://aclanthology.org/2024.acl-long.364
Itziar Gonzalez-Dios, Javier Alvez, and German Rigau
Exploiting Metonymy from Available Knowledge Resources. (2023)
20th International Conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part I. Lecture Notes in Computer Science book series (LNCS, volume 13451), pp 34-43
Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
Image captioning for effective use of language models in knowledge-based visual question answering (2023)
Expert Systems with Applications, 2023, vol. 212, p. 118669. Preprint: https://arxiv.org/abs/2109.08029
Nayla Escribano, German Rigau, Rodrigo Agerri
A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)
Nayla Escribano, German Rigau, Rodrigo Agerri, A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods, Knowledge-Based Systems, Volume 273, 2023, 110612, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2023.110612. (https://www.sciencedirect.com/science/article/pii/S0950705123003623) Abstract: Detecting and normalizing temporal expressions is an essential step for many NLP tasks. While a variety of methods have been proposed for detection, best normalization approaches rely on hand-crafted rules. Furthermore, most of them have been designed only for English. In this paper we present a modular multilingual temporal processing system combining a fine-tuned Masked Language Model for detection, and a grammar-based normalizer. We experiment in Spanish and English and compare with HeidelTime, the state-of-the-art in multilingual temporal processing. We obtain best results in gold timex normalization, timex detection and type recognition, and competitive performance in the combined TempEval-3 relaxed value metric. A detailed error analysis shows that detecting only those timexes for which it is feasible to provide a normalization is highly beneficial in this last metric. This raises the question of which is the best strategy for timex processing, namely, leaving undetected those timexes for which is not easy to provide normalization rules or aiming for high coverage. Keywords: Temporal processing; Multilingualism; Sequence labeling; Grammar-based approaches; Deep learning; Natural language processing
Murali Kondragunta, Olatz Perez-de-Viñaspre, Maite Oronoz
Improving and Simplifying Template-Based Named Entity Recognition (2023)
In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 79–86, Dubrovnik, Croatia. Association for Computational Linguistics. May 2023, Dubrovnik, Croatia.
Rodrigo Agerri, Eneko Agirre
Lessons learned from the evaluation of Spanish Language Models (2023)
Procesamiento del Lenguaje Natural (70), pp 157-170
Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa
Scaling Laws for BERT in Low-Resource Settings (2023)
Findings of the Association for Computational Linguistics: ACL 2023
Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)
ACM Computing Surveys. 27 June 2023
Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison
Identifying Token-Level Dialectal Features in Social Media (2023)
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle
Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)
Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023
Iker García, Rodrigo Agerri, German Rigau
T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Iñigo Alonso, Eneko Agirre
Automatic Logical Forms improve fidelity in Table-to-Text generation (2023)
Expert Systems with Applications, Volume 238, Part D, 15 March 2024, 121869 https://arxiv.org/abs/2310.17279
Begoña Altuna, Rodrigo Agerri, Lidia Salas-Espejo, José Javier Saiz, Roberto Zanoli, Manuela Speranza, Bernardo Magnini, Alberto Lavelli, Goutham Karunakaran
Overview of TESTLINK at IberLEF 2023: Linking Results to Clinical Laboratory Tests and Measurements (2023)
Procesamiento del Lenguaje Natural, Revista nº 71, 313-320, septiembre de 2023.
Begoña Altuna, Goutham Karunakaran, Alberto Lavelli, Bernardo Magnini, Manuela Speranza, Roberto Zanoli
CLinkaRT at EVALITA 2023: Overview of the Task on Linking a Lab Result to its Test Event in the Clinical Domain (2023)
Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Parma 2023.
Roberto Centeno, Rodrigo Agerri
Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)
Roberto Centeno and Rodrigo Agerri (2023). Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation. In Proceedings of the Workshop on NLP applied to Misinformation, co-located with the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023).
Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)
Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova (2023). HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine. In SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing.
Joseba Fernandez de Landa, Rodrigo Agerri
HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)
Joseba Fernandez de Landa, Rodrigo Agerri (2023). HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 2023.
Arantxa Otegi, Iñaki San Vicente, Xabier Saralegi, Anselmo Peñas, Borja Lozano, Eneko Agirre
Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)
Knowledge-Based Systems, Volume 240.
Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning (2022)
In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, Washington. Association for Computational Linguistics.
Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations (2022)
In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Seattle, Washington. Association for Computational Linguistics.
Eneko Agirre
Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
E Agirre, M Apidianaki, I Vulić
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland
David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal
Direct Parsing to Sentiment Graphs (2022)
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478
Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri
BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)
Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.
Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)
Findings of the Association for Computational Linguistics: EMNLP 2022
Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal
SemEval 2022 Task 10: Structured Sentiment Analysis (2022)
In SemEval 2022
Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri
A Semantics-Aware Approach to Automated Claim Verification (2022)
In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics
Cristina Aceta, Johan Kildal, Izaskun Fernández, Aitor Soroa
Towards an optimal design of natural human interaction mechanisms for a service robot with ancillary way-finding capabilities in industrial environments (2021)
Production & Manufacturing Research, 9:1, 1-32
Ainhoa Serna, Aitor Soroa, Rodrigo Agerri
Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport (2021)
Sustainability 13, no. 4: 2397.
Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre
Inferring spatial relations from textual descriptions of images (2021)
Pattern Recognition, Volume 113, 107847. Pre-print: https://arxiv.org/abs/2102.00997
Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2021)
In conjunction with NAACL. Association for Computational Linguistics
Elena Zotova, Rodrigo Agerri, German Rigau
Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)
Expert Systems with Applications, 170 (2021).
Joseba Fernandez de Landa, Rodrigo Agerri
Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)
Ekaia
Jon Alkorta
Hacia el análisis de sentimientos en euskera (2021)
J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.
Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos
Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)
IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura
Ander Barrena, Aitor Soroa, Eneko Agirre
Towards Zero-Shot Cross-Lingual Named Entity Disambiguation (2021)
Expert Systems With Applications ESWA 2021
Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction (2021)
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Rodrigo Agerri, Roberto Centeno, María Espinosa, Joseba Fernández de Landa, Álvaro Rodrigo
VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)
Procesamiento del Lenguaje Natural, 67, pp 173-181
Iker García-Ferrero, Rodrigo Agerri, German Rigau
Benchmarking Meta-embeddings: What Works and What Does Not (2021)
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021
Yi-Ling Chung, Marco Guerini, Rodrigo Agerri
Multilingual Counter Narrative Type Classification (2021)
Proceedings of Argument Mining 2021
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli
The E3C Project: European Clinical Case Corpus (2021)
Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf
Eneko Agirre
Cross-Lingual Word Embeddings (Book Review) (2020)
Computational Linguistics 46 (1), 245-248. (https://doi.org/10.1162/COLI_r_00372)
Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre
Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)
Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain
Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar
Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)
Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Towards Word Sense Disambiguation by Reasoning (2020)
Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340
Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre
Give your Text Representation Models some Love: the Case for Basque (2020)
Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf
Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza
EusTimeML: A mark-up language for temporal information in Basque (2020)
Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06
Rodrigo Agerri, German Rigau
Language independent sequence labelling for Opinion Target Extraction (2020)
International Joint Conference on Artificial Intelligence (IJCAI 2020)
Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)
Language Resources and Evaluation Conference (LREC 2020)
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning (2020)file2 (2020)
Frontiers in Artificial Intelligence and Applications. Giuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, Jérôme Lang (eds.). Volume 325: ECAI 2020. Pages 585 - 592. IOS Press Ebooks
Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcia-Serrano, Mohamed Ben Aouicha, Eneko Agirre, David Sánchez
A large reproducible benchmark of ontology-based methods and word embeddings for word similarity (2020)
Information Systems. Online first.
Iker de la Iglesia, Mikel Martinez-Puente, Alexander Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola
MEDIA team at the CLEF-2020 MultilingualInformation Extraction Task (2020)
Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum Thessaloniki, Greece, September 22-25, 2020.
Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2020)
In conjunction with EMNLP. Association for Computational Linguistics
Rodrigo Agerri, German Rigau
Projecting Heterogeneous Annotations for Named Entity Recognition (2020)
In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020). Winner of the
CAPITEL@IberLEFtask on Spanish NER.
María Espinosa, Rodrigo Agerri, Roberto Centeno, Alvaro Rodrigo
DeepReading@SardiStance:Combining Textual, Social and Emotional Features. (2020)
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). Winners of the
SardiStance@Evalita2020 shared task
Rodrigo Agerri, German Rigau
Language independent sequence labelling for Opinion Target Extraction (2019)
Artificial Intelligence, 268 (2019) 85-95
lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre
Word n-gram attention models for sentence similarity and inference (2019)
Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.
Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre
Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.
Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre
Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)
Data in Brief, Volume 26.
Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre
A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)
Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665.
Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre
A comparison of representation models in a non-conventional semantic similarity scenario (2019)
Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.
Rodrigo Agerri
Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)
SemEval@NAACL-HLT2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf
Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria
Euskaldun gazte eta helduen harremanak Twitterren (2019)
III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90
Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau
Automatic white-box testing of first-order logic ontologies (2019)
Journal of Logic and Computation, Volume 29, Issue 5, September 2019, Pages 723–751
Alvez,J; Lucio,P; Rigau,G
A Framework for the Evaluation of SUMO-Based Ontologies Using WordNet (2019)
IEEE Access, 7, 36075-36093. 2019
Mark Stevenson, Eneko Agirre
Word Sense Disambiguation (2018)
The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28
Josu Goikoetxea, Aitor Soroa eta Eneko Agirre
Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf
Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau
Building Named Entity Recognition Taggers via Parallel Corpora (2018)
In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.
Ander Barrena, Aitor Soroa, Eneko Agirre
Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)
The SIGNLL Conference on Computational Natural Language Learning CONLL 2018
Rodrigo Agerri, German Rigau
Simple Language Independent Sequence Labelling for the Annotation of Disabilities in Medical Texts (2018)
Proceedings of the Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), Diann Track, Sevilla, Spain.
Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau
Multi-lingual and Cross-lingual timeline extraction (2017)
Knowledge-Based Systems, 133, 77-89
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Goikoetxea J., Agirre E., Soroa A.
Single or Multiple. Combining Word Representations Independently Learned from Text and WordNet (2016)
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. pp. 2608-26014. ISBN: 978-1-57735-760-5. Phoenix (USA).
Rodrigo Agerri, German Rigau
Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)
Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003
Hugo Manguinhas, Nuno Freire, Antoine Isaac, Juliane Stiller, Valentine Charles, Aitor Soroa, Rainer Simon, Vladimir Alexiev
Exploring Comparative Evaluation of Semantic Enrichment Tools for Cultural Heritage Metadata (2016)
Proceedings of the 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016, Vol 9818, pp 266-278
Ander Intxaurrondo, Eneko Agirre, Oier Lopez de Lacalle, Mihai Surdeanu
Diamonds in the Rough: Event Extraction from Imperfect Microblog Data (2015)file2 (2015)file3 (2015)
Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL HLT), pages: 641-650. ISBN: 978-1-941643-49-5.
Goikoetxea J., Agirre E., Soroa A.
Random Walks and Neural Network Language Models on Knowledge Bases (2015)
Proceedings of the Annual Meeting of the North American chapter of the Association of Computational Linguistics (NAACL HLT 2015), pages 1434-1439. ISBN: 978-1-937284-73-2. Denver (USA).
Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe
BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)
Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf
Mark Hall, Eneko Agirre, Nikolas Aletras, Runar Bergheim, Kostas Chandrinos, Paul Clough, Samuel Fernando, Kate Fernie, Paula Goodale, Jill Griffiths, Oier Lopez de Lacalle, Andrea de Polo, Aitor Soroa, Mark Stevenson
PATHS - Exploring Digital Cultural Heritage Spaces (2012)
Theory and Practice of Digital Libraries 2012. ISBN 9783642332906 ISSN 0302-9743
Arantxa Otegi
Hedapena informazioaren berreskurapenean: hitzen adiera-desanbiguazioaren eta antzekotasun semantikoaren ekarpenak (2012)file2 (2012)
Lengoaia eta Sistema Informatikoak Saila, EHU/UPV. Informatika Fakultatea. 2012/03/16
Iñaki Alegria, Bertol Arrieta, Arantza Diaz de Ilarraza, Elixabete Izagirre, Montse Maritxalar
Using Machine Learning Techniques to Build a Comma Checker for Basque (2006)
Proceedings of Coling-ACL 2006. Sydney. Australia.ISBN: 1-932432-69-8 pp.1-8. https://aclanthology.org/P06-4000/
A. Casillas, V. Fresno, R. Martínez, S. Montalvo
Evaluación del clustering de páginas web mediante funciones de peso y combinación heurística de criterios (2005)
Revista Española para el Procesamiento del Lenguaje Natural, 35, 417-424 .https://1library.co/document/yn4mkjpz-evaluacion-clustering-paginas-mediante-funciones-combinacion-heuristica-criterios.html
ie_ir_tabs_full
Demo of the NewsReader NLP pipeline
Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.
Demo of the NewsReader NLP pipeline
Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format
Eihera
Basque named entities recognizer/classifier
Eustagger
Basque lemmatizer and morphosyntactic analyzer
(2024 - 2025)
(2023 - 2024)- Adimen artifizial sortzailea web mintegia (webinar).
Online course for Gipuzkoa Provincial Council employees
(2024 - 2024)
Data Privacy in Artificial Intelligence for Health Applications: A QA system to extract specific information from medical reports that can be used for better decision making
(2020 - 2021)
Pre-training cross-lingual language models
(2020 - 2020)
(2019 - 2020)
- CLARIAH-EUS-gArA
(20240901 - 20250901) - #neural2speech - Decoding speech and language from the human brain
Deep learning for speech generation from brain.
(2023 - 2026) - DeepMinor: Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
DeepKnowledge (PID2021-127777OB-C21) project funded by MCIN/AEI/10.13039/501100011033 and by FEDER
(2022 - 2025)- ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco
(2023 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Antidote (PCI2020-120717-2) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR
(2021 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)- Disargue: Few-shot Learning and Argumentation to Detect and Fight Misinformation in Social Media
Disargue (TED2021-130810B-C21) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/ PRTR
(2022 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)
Better Extraction from Text Towards Enhanced Retrieval
(2019 - 2023)
Tools for the analysis of parliamentary discourses: polarization, subjectivity and affectivity in the post-truth era
(2020 - 2022)
DeepReading: Mining, Understanding, and Reasoning with Multilingual Content.
(2019 - 2021)
Deep learning, Big Data and knowledge for multilingual text processing.
(2019 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)
Automated surveillance of key questions on COVID-19 in scientific publications
(2020 - 2021)
Learning to Interact with Humans by Lifelong Interaction with Humans
(2017 - 2020)- CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019)
TUNER: Automatic domain adaptation for semantic processing.
(2016 - 2018)- MUSTER: Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
(2016 - 2018) - Openminted: Sharing IXA pipes in the OpenMinTeD platform.
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
(2018 - 2018) All HiTZ projects
- EIEC
Basque Named Entity Recognition corpus. - EDIEC
Basque corpus annotated for Named Entity Disambiguation. - MCR: Multilingual Central Repository
Multilingual lexical database with wordnets for several European languages, including Basque. - EPEC-EuSemcor
Corpus tagged with Basque WordNet senses.
Olia Toporkov, Rodrigo Agerri
On the Role of Morphological Information for Contextual Lemmatization (2024)
Computational Linguistics (MIT Press).
Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)
The Twelfth International Conference on Learning Representations
Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre
Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (2024)
Proceeding of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy
Eneko Agirre, Itziar Aldabe, Xabier Arregi, Mikel Artetxe, Unai Atutxa, Ekhi Azurmendi, Iker De la Iglesia, Julen Etxaniz, Victor García-Romillo, Inma Hernaez-Rioja, Asier Herranz, Mikel Iruskieta, Oier López de Lacalle, Eva Navas, Paula Ontalvilla, Aitor Ormazabal, Naiara Perez, German Rigau1 Oscar Sainz, Jon Sanchez, Ibon Saratxaga, Aitor Soroa, Christoforos Souganidis, Jon Vadillo and Aimar Zabala
IKER-GAITU: research on language technology for Basque and other low-resource languages (2024)
-
Eneko Agirre, Olatz Arbelaitz, Olatz Arregi, Gorka Azkune, Arantza Casillas, Inma Hernaez, Mikel Iruskieta, Elena Lazkano, Eva Navas, German Rigau, Roberto Santana, Aitor Soroa and Rabih Zbib
ENIA Chair in Artificial Intelligence and Language Technology (2024)
-
Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.
A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)
Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).
Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri
Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand
Ahmed Elhady, Khaled Elsayed, Eneko Agirre, and Mikel Artetxe
Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 755–761, Mexico City, Mexico. Association for Computational Linguistics.
Iñigo Alonso, Eneko Agirre, Mirella Lapata
PixT3: Pixel-based Table To Text generation (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)
Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea
EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)
Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137
Iakes Goenaga, Aitziber Atutxa, Koldo Gojenola, Maite Oronoz, Rodrigo Agerri
Explanatory argument extraction of correct answers in resident medical exams (2024)
Artificial Intelligence in Medicine Volume 157, November 2024, 102985
Alain García Olea, Ane García Domingo-Aldama, Marcos Merino Prado, Koldo Gojenola Galletebeitia, Aitziber Atutxa Salazar, Mikel Maeztu Rada, Iván García Díaz, Adrián Costa, Iván Cano, Fernando Díaz, Irene Hernández, Uxue Millet, Ainhoa Etxenike, José Miguel Ormaetxe Merodio
RENDIMIENTO DE LAS EXPRESIONES REGULARES EN EL ANÁLISIS DE INFORMES DE ALTA PRESENTES EN LA HISTORIA CLÍNICA ELECTRÓNICA: EXPRIMIENDO LOS DATOS SECUNDARIOS (2024)
Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 33
Alain García Olea, Ane García Domingo-Aldama, Marcos Merino Prado, Ignacio Díez González, Aitziber Atutxa Salazar, Josu Goikoetxea Salutregi, Koldo Gojenola Galletebeitia, Mikel Maeztu Rada, Iván Cano González, Adrián Costa Santos, Iván García Díaz, Fernando Díaz González, Irene Hernández Pérez, Uxue Millet Oyarzabal y José Miguel Ormaetxe Merodio
RENDIMIENTO DE SISTEMAS DE CHAT ALIMENTADOS CON ARTÍCULOS DE INVESTIGACIÓN EN UN ENTORNO CLÍNICO ESPECÍFICO: LA ENFERMEDAD VALVULAR CARDIACA (2024)
Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 1161
Iñigo Alonso, Eneko Agirre, Mirella Lapata
PixT3: Pixel-based Table-To-Text Generation (2024)
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://aclanthology.org/2024.acl-long.364
Itziar Gonzalez-Dios, Javier Alvez, and German Rigau
Exploiting Metonymy from Available Knowledge Resources. (2023)
20th International Conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part I. Lecture Notes in Computer Science book series (LNCS, volume 13451), pp 34-43
Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
Image captioning for effective use of language models in knowledge-based visual question answering (2023)
Expert Systems with Applications, 2023, vol. 212, p. 118669. Preprint: https://arxiv.org/abs/2109.08029
Nayla Escribano, German Rigau, Rodrigo Agerri
A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)
Nayla Escribano, German Rigau, Rodrigo Agerri, A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods, Knowledge-Based Systems, Volume 273, 2023, 110612, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2023.110612. (https://www.sciencedirect.com/science/article/pii/S0950705123003623) Abstract: Detecting and normalizing temporal expressions is an essential step for many NLP tasks. While a variety of methods have been proposed for detection, best normalization approaches rely on hand-crafted rules. Furthermore, most of them have been designed only for English. In this paper we present a modular multilingual temporal processing system combining a fine-tuned Masked Language Model for detection, and a grammar-based normalizer. We experiment in Spanish and English and compare with HeidelTime, the state-of-the-art in multilingual temporal processing. We obtain best results in gold timex normalization, timex detection and type recognition, and competitive performance in the combined TempEval-3 relaxed value metric. A detailed error analysis shows that detecting only those timexes for which it is feasible to provide a normalization is highly beneficial in this last metric. This raises the question of which is the best strategy for timex processing, namely, leaving undetected those timexes for which is not easy to provide normalization rules or aiming for high coverage. Keywords: Temporal processing; Multilingualism; Sequence labeling; Grammar-based approaches; Deep learning; Natural language processing
Murali Kondragunta, Olatz Perez-de-Viñaspre, Maite Oronoz
Improving and Simplifying Template-Based Named Entity Recognition (2023)
In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 79–86, Dubrovnik, Croatia. Association for Computational Linguistics. May 2023, Dubrovnik, Croatia.
Rodrigo Agerri, Eneko Agirre
Lessons learned from the evaluation of Spanish Language Models (2023)
Procesamiento del Lenguaje Natural (70), pp 157-170
Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa
Scaling Laws for BERT in Low-Resource Settings (2023)
Findings of the Association for Computational Linguistics: ACL 2023
Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)
ACM Computing Surveys. 27 June 2023
Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison
Identifying Token-Level Dialectal Features in Social Media (2023)
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle
Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)
Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023
Iker García, Rodrigo Agerri, German Rigau
T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Iñigo Alonso, Eneko Agirre
Automatic Logical Forms improve fidelity in Table-to-Text generation (2023)
Expert Systems with Applications, Volume 238, Part D, 15 March 2024, 121869 https://arxiv.org/abs/2310.17279
Begoña Altuna, Rodrigo Agerri, Lidia Salas-Espejo, José Javier Saiz, Roberto Zanoli, Manuela Speranza, Bernardo Magnini, Alberto Lavelli, Goutham Karunakaran
Overview of TESTLINK at IberLEF 2023: Linking Results to Clinical Laboratory Tests and Measurements (2023)
Procesamiento del Lenguaje Natural, Revista nº 71, 313-320, septiembre de 2023.
Begoña Altuna, Goutham Karunakaran, Alberto Lavelli, Bernardo Magnini, Manuela Speranza, Roberto Zanoli
CLinkaRT at EVALITA 2023: Overview of the Task on Linking a Lab Result to its Test Event in the Clinical Domain (2023)
Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Parma 2023.
Roberto Centeno, Rodrigo Agerri
Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)
Roberto Centeno and Rodrigo Agerri (2023). Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation. In Proceedings of the Workshop on NLP applied to Misinformation, co-located with the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023).
Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)
Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova (2023). HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine. In SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing.
Joseba Fernandez de Landa, Rodrigo Agerri
HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)
Joseba Fernandez de Landa, Rodrigo Agerri (2023). HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 2023.
Arantxa Otegi, Iñaki San Vicente, Xabier Saralegi, Anselmo Peñas, Borja Lozano, Eneko Agirre
Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)
Knowledge-Based Systems, Volume 240.
Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning (2022)
In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, Washington. Association for Computational Linguistics.
Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations (2022)
In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Seattle, Washington. Association for Computational Linguistics.
Eneko Agirre
Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
E Agirre, M Apidianaki, I Vulić
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland
David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal
Direct Parsing to Sentiment Graphs (2022)
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478
Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri
BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)
Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.
Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)
Findings of the Association for Computational Linguistics: EMNLP 2022
Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal
SemEval 2022 Task 10: Structured Sentiment Analysis (2022)
In SemEval 2022
Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri
A Semantics-Aware Approach to Automated Claim Verification (2022)
In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics
Cristina Aceta, Johan Kildal, Izaskun Fernández, Aitor Soroa
Towards an optimal design of natural human interaction mechanisms for a service robot with ancillary way-finding capabilities in industrial environments (2021)
Production & Manufacturing Research, 9:1, 1-32
Ainhoa Serna, Aitor Soroa, Rodrigo Agerri
Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport (2021)
Sustainability 13, no. 4: 2397.
Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre
Inferring spatial relations from textual descriptions of images (2021)
Pattern Recognition, Volume 113, 107847. Pre-print: https://arxiv.org/abs/2102.00997
Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2021)
In conjunction with NAACL. Association for Computational Linguistics
Elena Zotova, Rodrigo Agerri, German Rigau
Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)
Expert Systems with Applications, 170 (2021).
Joseba Fernandez de Landa, Rodrigo Agerri
Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)
Ekaia
Jon Alkorta
Hacia el análisis de sentimientos en euskera (2021)
J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.
Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos
Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)
IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura
Ander Barrena, Aitor Soroa, Eneko Agirre
Towards Zero-Shot Cross-Lingual Named Entity Disambiguation (2021)
Expert Systems With Applications ESWA 2021
Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction (2021)
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Rodrigo Agerri, Roberto Centeno, María Espinosa, Joseba Fernández de Landa, Álvaro Rodrigo
VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)
Procesamiento del Lenguaje Natural, 67, pp 173-181
Iker García-Ferrero, Rodrigo Agerri, German Rigau
Benchmarking Meta-embeddings: What Works and What Does Not (2021)
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021
Yi-Ling Chung, Marco Guerini, Rodrigo Agerri
Multilingual Counter Narrative Type Classification (2021)
Proceedings of Argument Mining 2021
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli
The E3C Project: European Clinical Case Corpus (2021)
Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf
Eneko Agirre
Cross-Lingual Word Embeddings (Book Review) (2020)
Computational Linguistics 46 (1), 245-248. (https://doi.org/10.1162/COLI_r_00372)
Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre
Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)
Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain
Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar
Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)
Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Towards Word Sense Disambiguation by Reasoning (2020)
Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340
Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre
Give your Text Representation Models some Love: the Case for Basque (2020)
Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf
Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza
EusTimeML: A mark-up language for temporal information in Basque (2020)
Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06
Rodrigo Agerri, German Rigau
Language independent sequence labelling for Opinion Target Extraction (2020)
International Joint Conference on Artificial Intelligence (IJCAI 2020)
Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)
Language Resources and Evaluation Conference (LREC 2020)
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning (2020)file2 (2020)
Frontiers in Artificial Intelligence and Applications. Giuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, Jérôme Lang (eds.). Volume 325: ECAI 2020. Pages 585 - 592. IOS Press Ebooks
Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcia-Serrano, Mohamed Ben Aouicha, Eneko Agirre, David Sánchez
A large reproducible benchmark of ontology-based methods and word embeddings for word similarity (2020)
Information Systems. Online first.
Iker de la Iglesia, Mikel Martinez-Puente, Alexander Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola
MEDIA team at the CLEF-2020 MultilingualInformation Extraction Task (2020)
Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum Thessaloniki, Greece, September 22-25, 2020.
Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2020)
In conjunction with EMNLP. Association for Computational Linguistics
Rodrigo Agerri, German Rigau
Projecting Heterogeneous Annotations for Named Entity Recognition (2020)
In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020). Winner of the
CAPITEL@IberLEFtask on Spanish NER.
María Espinosa, Rodrigo Agerri, Roberto Centeno, Alvaro Rodrigo
DeepReading@SardiStance:Combining Textual, Social and Emotional Features. (2020)
Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). Winners of the
SardiStance@Evalita2020 shared task
Rodrigo Agerri, German Rigau
Language independent sequence labelling for Opinion Target Extraction (2019)
Artificial Intelligence, 268 (2019) 85-95
lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre
Word n-gram attention models for sentence similarity and inference (2019)
Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.
Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre
Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.
Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre
Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)
Data in Brief, Volume 26.
Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre
A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)
Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665.
Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre
A comparison of representation models in a non-conventional semantic similarity scenario (2019)
Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.
Rodrigo Agerri
Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)
SemEval@NAACL-HLT2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf
Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria
Euskaldun gazte eta helduen harremanak Twitterren (2019)
III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90
Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau
Automatic white-box testing of first-order logic ontologies (2019)
Journal of Logic and Computation, Volume 29, Issue 5, September 2019, Pages 723–751
Alvez,J; Lucio,P; Rigau,G
A Framework for the Evaluation of SUMO-Based Ontologies Using WordNet (2019)
IEEE Access, 7, 36075-36093. 2019
Mark Stevenson, Eneko Agirre
Word Sense Disambiguation (2018)
The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28
Josu Goikoetxea, Aitor Soroa eta Eneko Agirre
Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf
Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau
Building Named Entity Recognition Taggers via Parallel Corpora (2018)
In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.
Ander Barrena, Aitor Soroa, Eneko Agirre
Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)
The SIGNLL Conference on Computational Natural Language Learning CONLL 2018
Rodrigo Agerri, German Rigau
Simple Language Independent Sequence Labelling for the Annotation of Disabilities in Medical Texts (2018)
Proceedings of the Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), Diann Track, Sevilla, Spain.
Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau
Multi-lingual and Cross-lingual timeline extraction (2017)
Knowledge-Based Systems, 133, 77-89
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Goikoetxea J., Agirre E., Soroa A.
Single or Multiple. Combining Word Representations Independently Learned from Text and WordNet (2016)
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. pp. 2608-26014. ISBN: 978-1-57735-760-5. Phoenix (USA).
Rodrigo Agerri, German Rigau
Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)
Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003
Hugo Manguinhas, Nuno Freire, Antoine Isaac, Juliane Stiller, Valentine Charles, Aitor Soroa, Rainer Simon, Vladimir Alexiev
Exploring Comparative Evaluation of Semantic Enrichment Tools for Cultural Heritage Metadata (2016)
Proceedings of the 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016, Vol 9818, pp 266-278
Ander Intxaurrondo, Eneko Agirre, Oier Lopez de Lacalle, Mihai Surdeanu
Diamonds in the Rough: Event Extraction from Imperfect Microblog Data (2015)file2 (2015)file3 (2015)
Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL HLT), pages: 641-650. ISBN: 978-1-941643-49-5.
Goikoetxea J., Agirre E., Soroa A.
Random Walks and Neural Network Language Models on Knowledge Bases (2015)
Proceedings of the Annual Meeting of the North American chapter of the Association of Computational Linguistics (NAACL HLT 2015), pages 1434-1439. ISBN: 978-1-937284-73-2. Denver (USA).
Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe
BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)
Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf
Mark Hall, Eneko Agirre, Nikolas Aletras, Runar Bergheim, Kostas Chandrinos, Paul Clough, Samuel Fernando, Kate Fernie, Paula Goodale, Jill Griffiths, Oier Lopez de Lacalle, Andrea de Polo, Aitor Soroa, Mark Stevenson
PATHS - Exploring Digital Cultural Heritage Spaces (2012)
Theory and Practice of Digital Libraries 2012. ISBN 9783642332906 ISSN 0302-9743
Arantxa Otegi
Hedapena informazioaren berreskurapenean: hitzen adiera-desanbiguazioaren eta antzekotasun semantikoaren ekarpenak (2012)file2 (2012)
Lengoaia eta Sistema Informatikoak Saila, EHU/UPV. Informatika Fakultatea. 2012/03/16
Iñaki Alegria, Bertol Arrieta, Arantza Diaz de Ilarraza, Elixabete Izagirre, Montse Maritxalar
Using Machine Learning Techniques to Build a Comma Checker for Basque (2006)
Proceedings of Coling-ACL 2006. Sydney. Australia.ISBN: 1-932432-69-8 pp.1-8. https://aclanthology.org/P06-4000/
A. Casillas, V. Fresno, R. Martínez, S. Montalvo
Evaluación del clustering de páginas web mediante funciones de peso y combinación heurística de criterios (2005)
Revista Española para el Procesamiento del Lenguaje Natural, 35, 417-424 .https://1library.co/document/yn4mkjpz-evaluacion-clustering-paginas-mediante-funciones-combinacion-heuristica-criterios.html