Text Analysis

Natural Language Analysis Tools are software modules that perform linguistic analysis on texts at different levels. These tools are essential components of any Natual Language Processing (NLP) software that analyzes text, and any text mining software is typically built by combining basic linguistic modules forming complex pipelines.

The HiTZ center has a large tradition in building analysis tools for many languages, which range from basic linguistic processors such as tokenizers, Part-of-Speech taggers or Named Entity Recognizers, to complex modules that perform sentiment analysis or event detection on News feeds. It has also developed distributed architectures to deploy complex pipelines in cluster of machines, thus allowing the processing of the vast amount of textual information is produced every day through diverse channels such as traditional newspapers and social media sites.

HiTZ has developed the IXA-pipes tools, a set of ready to use NLP tools which provide easy access to NLP technology for several languages. It offers robust and efficient linguistic annotation with the aim of lowering the barriers of using NLP technology either for research purposes or for small industrial developers and SMEs.

The Basque language is of great interest for HiTZ, and building robust and scalable processing tools for Basque is one of the strategic goals of the center. HiTZ has developed the largest set of Basque linguistic processors available to day, which enables automatically analysis and facilitates building text mining tools for Basque.

Main Researcher:

Rodrigo Agerri

Researchers:

Arantza Díaz de Ilarraza

Oier Lopez de Lacalle

Olatz Perez de Viñaspre

Text_analysis_tabs

Demos

Coming soon.

Contracts

Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. - TECNALIA
(2024 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. MULTIVERSE.
(2025 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. ELHUYAR.
(2025 - 2028)
EFICIENCIA DE MODELOS LLM PARA INDUSTRIAS ESTRATÉGICAS (EMIE) (2025.0737)
LLMak industrian duten erabilera ebaluatzeko proiektua. RVCTI azpikontratazioa HAZITEK estrategiko batean.
(2025 - 2026)
EFICIENCIA DE MODELOS LLM PARA INDUSTRIAS ESTRATÉGICAS (EMIE) (2025.0736)
HAZITEK ESTRATEGIKOA. EMIE. RVCTI AZPIKONTRATAZIOA
(2025 - 2025)
(2020 - 2020)
Hizkuntza Teknologia: Egoeraren diagnostikoa eta AMIA egitea.

(2019 - 2019)
Euskara HTen arloan sustatzeko proposamenak.

(2019 - 2019)
Hizkuntza-teknologiak sustatzeko proiektu transbertsalak

(2019 - 2019)

All HiTZ projects.

Projects

Research on Language Technology to foster the presence of Basque in the digital landscape.
(2026 - 2028)
Humanizing AI with language technology (HumanAIze)
(2025 - 2028)
Grant DeepThought (PID2024-159202OB-C21) funded by MICIU/AEI /10.13039/501100011033 and by ERDF, EU
(2025 - 2028)
Project CRITICS (PCI2025-167239-2) funded by MICIU/AEI /10.13039/501100011033 and co-funded by the European Union
(2025 - 2028)
SARETU - Nuevos enfoques de coordinación multiagente basados en LLMs para la Fabricación Avanzada en el País Vasco

(2026 - 2027)
DeepKnowledge (PID2021-127777OB-C21) project funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU
(2022 - 2026)
Extending reasoning to multiple languages with cross-lingual consistency
(2026 - 2026)
LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
(2024 - 2026)
Project CNS2023-144375 funded by MTDFP/ and by European Union Next GenerationEU/ PRTR.
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
(2025 - 2026)
ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco

(2023 - 2025)
Research on Language Technology to foster the presence of Basque in the digital landscape.
(2023 - 2025)
(2023 - 2025)
CLARIAH-EUS-gArA

(2024 - 2025)
Disargue (TED2021-130810B-C21) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/ PRTR
(2022 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Antidote (PCI2020-120717-2) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR
(2021 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)
Trustworthy AI - Integrating Learning, Optimisation and Reasoning
(2020 - 2023)
Tools for the analysis of parliamentary discourses: polarization, subjectivity and affectivity in the post-truth era
(2020 - 2022)
European Language Equality
(2021 - 2022)
DeepReading: Mining, Understanding, and Reasoning with Multilingual Content.
(2019 - 2021)
Deep learning, Big Data and knowledge for multilingual text processing.
(2019 - 2021)
red estratégica para la promoción de las infraestructuras de tecnologías del lenguaje en ehumanidades y ciencias sociales
(2020 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)
CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019)
BERBAOLA: Hizkuntzaren eta hizketaren teknologiak RIS3ko eremuetara aplikatuta.
BERBAOLA: Language and Speech technologies within the Basque RIS3 strategy.
(2018 - 2019)
DL4NLP: Deep Learning aplicado al Procesamiento del Lenguaje Natural como apoyo a los ámbitos del RIS3

(2019 - 2019)
TUNER: Automatic domain adaptation for semantic processing.
(2016 - 2018)
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
(2018 - 2018)

All HiTZ projects

Patents

MALTIXA

Resources

Coming soon.

Publications

Mikel Zubillaga, Naiara Perez, Oscar Sainz, German Rigau

SemBench: A Universal Semantic Framework for LLM Evaluation (2026)

Zubillaga, M., Perez, N., Sainz, O., & Rigau, G. (2026). SemBench: A Universal Semantic Framework for LLM Evaluation. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2603.11687

Xabier Irastortza-Urbieta, José M. García-Miguel, Marcos Garcia

Language Mixture to Develop Accurate Galician Dependency Parsers: An Exploration of Its Effects (2026)

Xabier Irastortza-Urbieta, José M. García-Miguel, and Marcos Garcia. 2026. Language Mixture to Develop Accurate Galician Dependency Parsers: An Exploration of Its Effects. In Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 58–69, Rabat, Morocco. Association for Computational Linguistics.

Iker De la Iglesia, Iakes Goenaga, Johanna Ramirez-Romero, Jose Maria Villa-Gonzalez, Josu Goikoetxea, Ander Barrena

Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments (2025)

COLING 2025

Maria Jesus Aranzabe, Igone Zabala, Izaskun Aldezabal

Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2025)

In B. Altuna, J.Alkorta, X. Arregi, J.M. Arriola, A. Estarrona, A. Farwell, J. Fernandez de Landa, X. Goenaga, M. Iruskieta (ed.), CLARIAH-EUS:Zientzia Sozialak eta Humanitate Digitalak gaur egun, 39–56. UPV/EHUko Argitalpen Zerbitzua.

María Grandury, Javier Aula-Blasco, Júlia Falcão, Clémentine Fourrier, Miguel González, Gonzalo Martínez, Gonzalo Santamaría, Rodrigo Agerri, Nuria Aldama, Luis Chiruzzo, Javier Conde, Helena Gómez, Marta Guerrero, Guido Ivetta, Natalia López, Flor Miriam Plaza-del-Arco, María Teresa Martín-Valdivia, Helena Montoro, Carmen Muñoz, Pedro Reviriego, Leire Rosado, Alejandro Vaca, María Estrella Vallecillo-Rodríguez, Jorge Vallego, Irune Zubiaga

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America (2025)

In ACL 2025.

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read German: A statistical analysis (2025)

trans-kom 18 [1]: 330–354 ISSN 1867-4844

Elisa Sanchez-Bayona, Rodrigo Agerri

Meta4XNLI: A Cross-lingual Parallel Corpus for Metaphor Detection and Interpretation (2025)

Computational Linguistics 2025, 1–44.

Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria de-Dios-Flores, and Rodrigo Agerri

Truth Knows No Language: Evaluating Truthfulness Beyond English (2025)

Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria de-Dios-Flores, and Rodrigo Agerri. 2025. Truth Knows No Language: Evaluating Truthfulness Beyond English. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31204–31218, Vienna, Austria. Association for Computational Linguistics.

Beatriz Botella-Gil, Isabel Espinosa-Zaragoza, Alba Bonet-Jover, Margot Madina, Lucas Molino Piñar, Paloma Moreda, Itziar Gonzalez-Dios, M.Teresa Martín-Valdivia, L.Alfonso Ureña-López

Overview of CLEARS at IberLEF 2025: Challenge for Plain Language and Easy-to-Read Adaptation for Spanish texts (2025)

Overview of CLEARS at IberLEF 2025: Challenge for Plain Language and Easy-to-Read Adaptation for Spanish texts. Procesamiento del Lenguaje Natural, 75, 393-400.

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Elisa Sanchez-Bayona, Rodrigo Agerri

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

In Findings of the Association for Computational Linguistics: ACL 2025, pages 17462–17477, Vienna, Austria. Association for Computational Linguistics.

Olia Toporkov, Alan Akbik, Rodrigo Agerri

Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data (2025)

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18219–18232, Suzhou, China. Association for Computational Linguistics.

Blanca Calvo Figueras, Rodrigo Agerri

Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models (2025)

EMNLP Findings 2025

Orbegozo-Terradillos, Julen, Ainara Larrondo-Ureta, Nayla Escribano, Simón Peña Fernández, Rodrigo Agerri

BasqueParl: descifrar la huella retórica en el Parlamento Vasco con el procesamiento del lenguaje natural. (2025)

Palabra Clave 28, no. 1 (2025): e2813-e2813.

Masson, Maxime, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, and Philippe Roose

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

Knowledge-Based Systems (2025): 114001 (Elsevier).

Jaione Bengoetxea, Itziar Gonzalez-Dios, Rodrigo Agerri

Euskara eta gaztelaniazko kontra-narratiben sorkuntza: datuen sorrera eta ebaluazioa (2025)

IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 133–140.

Amaia Solaun, Nora Aranberri

Exploring lexical diversity in Basque news: original vs. machine-translated texts (2025)

Fontes Linguae Vasconum. Number 140. December 2025, 273-315.

Ekhi Azurmendi, Xabier Arregi, Oier Lopez de Lacalle

Euskarazko lehen C1 ebaluatzaile automatikoa (2025)

IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 125–132

Elhady, Ahmed, Eneko Agirre, y Mikel Artetxe

Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation (2025)

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32174–32186, Vienna, Austria. Association for Computational Linguistics.

Igone Zabala, María Jesús Aranzabe

HARTA/TAILA: Herramienta de ayuda a la enseñanza-aprendizaje de la fraseología académica del euskera basada en un corpus de trabajos académicos (2024)

In Perez-Llantada, Carmen; Carciu, Oana; Villares, Rosana (ed.), “Book of abstracts Joint 21st AELFE LSPPC7 Conference 2023”, Mendeley Data, V1.

Olia Toporkov, Rodrigo Agerri

On the Role of Morphological Information for Contextual Lemmatization (2024)

Computational Linguistics (MIT Press).

Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)

The Twelfth International Conference on Learning Representations

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

A Preliminary Study of ChatGPT for Spanish E2R Text Adaptation (2024)

Madina, M., Gonzalez-Dios, I., & Siegel, M. (2024, May). A Preliminary Study of ChatGPT for Spanish E2R Text Adaptation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 1422-1434).

Margot Madina, Itziar Gonzalez-Dios, and Melanie Siegel.

LanguageTool as a CAT tool for Easy-to-Read in Spanish (2024)

Madina, M., Gonzalez-Dios, I., & Siegel, M. (2024, May). LanguageTool as a CAT tool for Easy-to-Read in Spanish. In Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI)@ LREC-COLING 2024 (pp. 93-101).

Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa Salazar, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata, Andrea Zaninello

MedMT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain (2024)

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2024)

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2132–2141

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Towards Reliable E2R Texts: A Proposal for Standardized Evaluation Practices (2024)

Madina, M., Gonzalez-Dios, I., & Siegel, M. (2024, July). Towards reliable E2R texts: a proposal for standardized evaluation practices. In International Conference on Computers Helping People with Special Needs (pp. 224-231). Cham: Springer Nature Switzerland.

Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation (2024)

Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9572–9585

Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

Ixa at RefutES 2024: Leveraging Language Models for Counter Narrative Generation (2024)

IberLEF 2024 - Proceedings of the Iberian Languages Evaluation Forum, co-located with the Conference of the Spanish Society for Natural Language Processing, SEPLN 2024, vol. 3756

Josu Goikoetxea, Markel Etxabe, Marcos García, Eleonora Guzzi, Margarita Alonso

Multi-label Discourse Function Classification of Lexical Bundles in Basque and Spanish via transformer-based models (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 29-41

Iñigo Alonso, Maite Oronoz, Rodrigo Agerri

MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering (2024)

Artificial Intelligence in Medicine Volume 155, September 2024, 102938 https://www.sciencedirect.com/science/article/pii/S0933365724001805

Blanca Calvo Figueras and Rodrigo Agerri

Critical Questions Generation: Motivation and Challenges (2024)

In Proceedings of the 28th Conference on Computational Natural Language Learning, pages 105–116, Miami, FL, USA. Association for Computational Linguistics.

Patrick Bareiß, Roman Klinger, Jeremy Barnes

English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts (2024)

In Companion Proceedings of the ACM Web Conference 2024 (WWW ’24 Companion), May 13–17, 2024, Singapore, Singapore. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3589335.3651902

Orphée De Clercq, Valentin Barriere, Jeremy Barnes, Roman Klinger, João Sedoc, and Shabnam Tafreshi

The 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis (2024)

Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis. Association for Computational Linguistics, Bangkok, Thailand, edition.

Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe

BertaQA: How Much Do Language Models Know About Local Culture? (2024)

38th Conference on Neural Information Processing Systems (NeurIPS 2024) https://doi.org/10.48550/arXiv.2406.07302

Ekaterina Sviridova, Anar Yeginbergen, Ainara Estarrona, Elena Cabrio, Serena Villata, Rodrigo Agerri

CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures (2024)

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18463-18475.

Cristian Cardellino, Theo Collias, Benjamin Molinet, Erwan Hain, Wei Sun, Rodrigo Agerri, Serena Villata, Elena Cabrio

ANTIDOTE: ArgumeNtaTIon-Driven explainable artificial intelligence fOr digiTal mEdicine. (2024)

ECAI 2024 (demos)

Maxime Masson, Philippe Roose, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Rodrigo Agerri

ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social media (2024)

Social Network Analysis and Mining, 14(1), pp.1-23

Anar Yeginbergen, Rodrigo Agerri

Crosslingual Argument Mining in the Medical Domain (2024)

Procesamiento del Lenguaje Natural, Nº. 73, págs. 296-312.

Rodrigo Agerri, Eneko Agirre, Gorka Azkune, Roberto Centeno, Anselmo Peñas, German Rigau, Álvaro Rodrigo, Aitor Soroa

DeepKnowledge: Deep Multilingual Language Model Technology for Language Understanding. (2024)

In SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations, June 19-20, 2024, A Coruña, Spain.

Rodrigo Agerri, Jeremy Barnes, Jaione Bengoetxea, Blanca Calvo Figueras, Joseba Fernandez de Landa, Iker García-Ferrero, Olia Toporkov, Irune Zubiaga

HiTZ@Disargue: Few-shot Learning and Argumentation to Detect and Fight Misinformation in Social Media. (2024)

In SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations, June 19-20, 2024, A Coruña, Spain.

Maxime Masson, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose, Rodrigo Agerri

TextBI: An Interactive Dashboard for Visualizing Multidimensional NLP Annotations in Social Media Data. (2024)

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024)

Olia Toporkov, Rodrigo Agerri

Evaluating Shortest Edit Script Methods for Contextual Lemmatization (2024)

In Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).

Joseba Fernandez de Landa

Adimen Artifizialeko metodoak gizarte ikerkuntzarako: analisi demografikoa, jarreren detekzioa eta joera politikoen identifikazioa (2024)

Aitor Ormazabal

Towards general attribute controllability in NLP models (2024)

Oscar Sainz, Oier Lopez de Lacalle, Eneko Agirre, German Rigau

What do Language Models know about word senses? Zero-Shot WSD with Language Models and Domain Inventories (2023)

In Proceedings of the 12th Global Wordnet Conference, pages 331–342, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.

Ainara Estarrona, Izaskun Etxeberria, Manuel Padilla-Moyano, Ander Soraluze

Measuring language distance for historical texts in Basque (2023)

Procesamiento del Lenguaje Natural, Revista no 70, marzo del 2023, pp. 53-61

Nayla Escribano, German Rigau, Rodrigo Agerri

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Nayla Escribano, German Rigau, Rodrigo Agerri, A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods, Knowledge-Based Systems, Volume 273, 2023, 110612, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2023.110612. (https://www.sciencedirect.com/science/article/pii/S0950705123003623) Abstract: Detecting and normalizing temporal expressions is an essential step for many NLP tasks. While a variety of methods have been proposed for detection, best normalization approaches rely on hand-crafted rules. Furthermore, most of them have been designed only for English. In this paper we present a modular multilingual temporal processing system combining a fine-tuned Masked Language Model for detection, and a grammar-based normalizer. We experiment in Spanish and English and compare with HeidelTime, the state-of-the-art in multilingual temporal processing. We obtain best results in gold timex normalization, timex detection and type recognition, and competitive performance in the combined TempEval-3 relaxed value metric. A detailed error analysis shows that detecting only those timexes for which it is feasible to provide a normalization is highly beneficial in this last metric. This raises the question of which is the best strategy for timex processing, namely, leaving undetected those timexes for which is not easy to provide normalization rules or aiming for high coverage. Keywords: Temporal processing; Multilingualism; Sequence labeling; Grammar-based approaches; Deep learning; Natural language processing

Itziar Aduriz, Manex Agirrezabal, Eneko Agirre, Iñaki Alegria, Xabier Arregi, Jose Mari Arriola Xabier Artola, Arantza Díaz de Ilarraza, Ainara Estarrona, Izaskun Etxeberria, Nerea Ezeiza, Kepa Sarazola

Mofologia Konputazionala Euskaraz, 35 urte (2023)

Lindemann, D. (arg.). Miren Azkarateri esker onez, 15-30. UPV/EHU Argitalpen zerbitzua. Bilbo.

Rodrigo Agerri, Eneko Agirre

Lessons learned from the evaluation of Spanish Language Models (2023)

Procesamiento del Lenguaje Natural (70), pp 157-170

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Scaling Laws for BERT in Low-Resource Settings (2023)

Findings of the Association for Computational Linguistics: ACL 2023

Masson, M., Roose, P., Sallaberry, C., Agerri, R., Bessagnet, MN., Lacayrelle, A.L.P

APs: A Proxemic Framework for Social Media Interactions Modeling and Analysis (2023)

In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham.

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

ACM Computing Surveys. 27 June 2023

Iker De la Iglesia, María Vivó, Paula Chocrón, Gabriel de Maeztu, Koldo Gojenola, Aitziber Atutxa

Overview of ClinAIS at IberLEF 2023: Automatic Identification of Sections in Clinical Documents in Spanish (2023)

Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language Resources and Tools for three European Languages (2023)

Madina, M., Gonzalez-Dios, I., & Siegel, M. (2023, July). Easy-to-Read Language Resources and Tools for three European Languages. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments (pp. 693-699).

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena (2023)

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel (2023) Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena. V. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma: Giza Zientziak eta Artea, 35-42.

Margot Madina, Itziar Gonzalez-Dios and Melanie Siegel

Easy-to-Read in Germany: a Survey on its Current State and Available Resources (2023)

Margot Madina, Itziar Gonzalez-Dios and Melanie Siegel (2023) Easy-to-Read in Germany: a Survey on its Current State and Available Resources. To appear in proceedings of 10th Language & Technology Conference

Jeremy Barnes

Sentiment and Emotion Classification in Low-resource Settings (2023)

Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models (2023)

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Effective Correction Methods Using WordNet Meronymy Relations (2023)

Álvez, J., Gonzalez-Dios, I., & Rigau, G. (2023, January). Towards Effective Correction Methods Using WordNet Meronymy Relations. In Proceedings of the 12th Global Wordnet Conference (pp. 31-40).

Roberto Centeno, Rodrigo Agerri

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

Roberto Centeno and Rodrigo Agerri (2023). Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation. In Proceedings of the Workshop on NLP applied to Misinformation, co-located with the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023).

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova (2023). HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine. In SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing.

Joseba Fernandez de Landa, Rodrigo Agerri

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Joseba Fernandez de Landa, Rodrigo Agerri (2023). HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 2023.

Blanca Calvo Figueras, Irene Bausells, Tommaso Caselli

Dynamic Stance: Modeling Discussions by Labeling the Interactions (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal

Direct Parsing to Sentiment Graphs (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478

Nora Hollenstein, Itziar Gonzalez-Dios, Lisa Beinborn, and Lena Jäger

Patterns of text readability in human and predicted eye movements (2022)

Nora Hollenstein, Itziar Gonzalez-Dios, Lisa Beinborn, and Lena Jäger. 2022. Patterns of Text Readability in Human and Predicted Eye Movements. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon, pages 1–15, Taipei, Taiwan. Association for Computational Linguistics.

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Soroa, A., Gonzalez-Dios, I,... & Manica, M.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., ... & Manica, M. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.

Itziar Glez Dios, Aitor Soroa, Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco de Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, etal.

The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset (2022)

2022. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Oscar Cumbicus-Pineda, Iker Gutiérrez-Fandiño, Itziar Gonzalez-Dios, Aitor Soroa

Noisy Channel for Automatic Text Simplification (2022)

Cumbicus-Pineda, O. M., Gutiérrez-Fandiño, I., Gonzalez-Dios, I., & Soroa, A. (2022). Noisy Channel for Automatic Text Simplification. arXiv preprint arXiv:2211.03152.

Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

Does Corpus Quality Really Matter for Low-Resource Languages? (2022)

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7383–7390.

Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Maxime Masson, Christian Sallaberry, Rodrigo Agerri, Marie-Noelle Bessagnet, Philippe Roose, Annig Le Parc Lacayrelle

A Domain-Independent Method for Thematic Dataset Building from Social Media: The Case of Tourism on Twitter (2022)

In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham.

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa

BasqueGLUE: A Natural Language Understanding Benchmark for Basque (2022)

LREC 2022

Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

In SemEval 2022

Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri

A Semantics-Aware Approach to Automated Claim Verification (2022)

In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Pérez-de-Viñaspre, Rodrigo Agerri

Euskararen erabilera Eusko Legebiltzarreko debateetan (2012-2020) (2022)

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Pérez-de-Viñaspre, Rodrigo Agerri (2022). Euskararen erabilera Eusko Legebiltzarreko debateetan (2012-2020). In Mediatika, 19, 163-178.

Amaia Aguirregoitia Martinez, Kepa Bengoetxea Kortazar, Itziar Gonzalez-Dios

Journal of Immersion and Content-Based Language Education, Volume 9, Issue 1, May 2021, p. 4 - 30

Ionut-Teodor Sorodoc, Madhumita Sushil, Ece Takmaz, Eneko Agirre (Editors)

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop (2021)

In conjunction with EACL. Association for Computational Linguistics

Elena Zotova, Rodrigo Agerri, German Rigau

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Expert Systems with Applications, 170 (2021).

Joseba Fernandez de Landa, Rodrigo Agerri

Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)

Ekaia

Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos

Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Oscar Sainz, German Rigau

Ask2Transformers: Zero-Shot Domain labelling with Pretrained Language Models (2021)

Proceedings of the 11th Global WordNet Conference pages 44–52, University of South Africa (UNISA). Global Wordnet Association.

Iakes Goenaga, Xabier Lahuerta, Aitziber Atutxa, Koldo Gojenola

A Section Identification Tool: towards HL7 CDA/CCR Standardization in Spanish Discharge Summaries (2021)

Journal of Biomedical Informatics

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Oscar Cumbicus, Itziar Gonzalez-Dios, Aitor Soroa

A Syntax-Aware Edit-based System for Text Simplification (2021)

Cumbicus, Oscar, Gonzalez-Dios, Itziar and Soroa, Aitor (2021). A Syntax-Aware Edit-based System for Text Simplification In: Proceedings of Recent Advances in Natural Language Processing, pages 329–339. https://aclanthology.org/2021.ranlp-1.38/

Kepa Bengoetxea, Itziar Gonzalez-Dios

MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment (2021)

arXiv:2109.04870

Kepa Bengoetxea and Itziar Gonzalez-Dios

MultiAzterTest@Exist-IberLEF 2021: Linguistically Motivated Sexism Identification (2021)

Kepa Bengoetxea and Itziar Gonzalez-Dios (2021)

MultiAzterTest@Exist-IberLEF

2021: Linguistically Motivated Sexism Identification. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021) pp. 449-457 http://ceur-ws.org/Vol-2943/

Itziar Gonzalez-Dios, Kepa Bengoetxea

MultiAzterTest@VaxxStance-IberLEF 2021: Identifying Stances with Language Models and Linguistic Features (2021)

Itziar Gonzalez-Dios and Kepa Bengoetxea (2021) Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021). pp. 192-201. http://ceur-ws.org/Vol-2943/

Oscar M. Cumbicus-Pineda, Itziar Gonzalez-Dios, Aitor Soroa

Linguistic Capabilities for a Checklist-based evaluation in Automatic Text Simplification (2021)

Oscar M. Cumbicus-Pineda, Itziar Gonzalez-Dios, Aitor Soroa. (2021). Linguistic Capabilities for a Checklist-based evaluation in Automatic Text Simplification. Proceedings of the First Workshop on Current Trends in Text Simplification (CTTS 2021) co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN2021) Online (initially located in Málaga, Spain), September 21st, 2021. Edited by: Horacio Saggion, Sanja Štajner, Daniel Ferrés, Kim Cheng Sheang, pages 70-83. ISSN 1613-0073

Rodrigo Agerri, Roberto Centeno, María Espinosa, Joseba Fernández de Landa, Álvaro Rodrigo

VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)

Procesamiento del Lenguaje Natural, 67, pp 173-181

Amir Zeldes, Yang Janet Liu, Mikel Iruskieta, Philippe Muller, Chloé Braud, Sonia Badene

Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021) (2021)

Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021). URL: https://aclanthology.org/volumes/2021.disrpt-1/

Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Multilingual Counter Narrative Type Classification (2021)

Proceedings of Argument Mining 2021

Beatriz Pereda-Goikoetxea, María Isabel Elorza-Puyadena Mikel Lersundi-Ayestaran Joseba Xabier Huitzi-Egilegor María José Uranga-Iturrioz Blanca Marín-Fernández

Emakumeen emozio-zurrunbiloa erditzean (2021)

Ekaia, 2021, 41, 31-48

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project: European Clinical Case Corpus (2021)

Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf

Eneko Agirre

Cross-Lingual Word Embeddings (Book Review) (2020)

Computational Linguistics 46 (1), 245-248. (https://doi.org/10.1162/COLI_r_00372)

Jose R. Pichel, Pablo Gamallo, Iñaki Alegria, Marco Neves

A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity (2020)

Journal of Quantitative Linguistics. DOI 10.1080/09296174.2020.1732177

All HiTZ publications

Text_analysis_tabs_full

Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. - TECNALIA
(2024 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. MULTIVERSE.
(2025 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. ELHUYAR.
(2025 - 2028)
EFICIENCIA DE MODELOS LLM PARA INDUSTRIAS ESTRATÉGICAS (EMIE) (2025.0737)
LLMak industrian duten erabilera ebaluatzeko proiektua. RVCTI azpikontratazioa HAZITEK estrategiko batean.
(2025 - 2026)
EFICIENCIA DE MODELOS LLM PARA INDUSTRIAS ESTRATÉGICAS (EMIE) (2025.0736)
HAZITEK ESTRATEGIKOA. EMIE. RVCTI AZPIKONTRATAZIOA
(2025 - 2025)
(2020 - 2020)
Hizkuntza Teknologia: Egoeraren diagnostikoa eta AMIA egitea.

(2019 - 2019)
Euskara HTen arloan sustatzeko proposamenak.

(2019 - 2019)
Hizkuntza-teknologiak sustatzeko proiektu transbertsalak

(2019 - 2019)

All HiTZ projects.

Research on Language Technology to foster the presence of Basque in the digital landscape.
(2026 - 2028)
Humanizing AI with language technology (HumanAIze)
(2025 - 2028)
Grant DeepThought (PID2024-159202OB-C21) funded by MICIU/AEI /10.13039/501100011033 and by ERDF, EU
(2025 - 2028)
Project CRITICS (PCI2025-167239-2) funded by MICIU/AEI /10.13039/501100011033 and co-funded by the European Union
(2025 - 2028)
SARETU - Nuevos enfoques de coordinación multiagente basados en LLMs para la Fabricación Avanzada en el País Vasco

(2026 - 2027)
DeepKnowledge (PID2021-127777OB-C21) project funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU
(2022 - 2026)
Extending reasoning to multiple languages with cross-lingual consistency
(2026 - 2026)
LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
(2024 - 2026)
Project CNS2023-144375 funded by MTDFP/ and by European Union Next GenerationEU/ PRTR.
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
(2025 - 2026)
ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco

(2023 - 2025)
Research on Language Technology to foster the presence of Basque in the digital landscape.
(2023 - 2025)
(2023 - 2025)
CLARIAH-EUS-gArA

(2024 - 2025)
Disargue (TED2021-130810B-C21) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/ PRTR
(2022 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Antidote (PCI2020-120717-2) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR
(2021 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)
Trustworthy AI - Integrating Learning, Optimisation and Reasoning
(2020 - 2023)
Tools for the analysis of parliamentary discourses: polarization, subjectivity and affectivity in the post-truth era
(2020 - 2022)
European Language Equality
(2021 - 2022)
DeepReading: Mining, Understanding, and Reasoning with Multilingual Content.
(2019 - 2021)
Deep learning, Big Data and knowledge for multilingual text processing.
(2019 - 2021)
red estratégica para la promoción de las infraestructuras de tecnologías del lenguaje en ehumanidades y ciencias sociales
(2020 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)
CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019)
BERBAOLA: Hizkuntzaren eta hizketaren teknologiak RIS3ko eremuetara aplikatuta.
BERBAOLA: Language and Speech technologies within the Basque RIS3 strategy.
(2018 - 2019)
DL4NLP: Deep Learning aplicado al Procesamiento del Lenguaje Natural como apoyo a los ámbitos del RIS3

(2019 - 2019)
TUNER: Automatic domain adaptation for semantic processing.
(2016 - 2018)
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
(2018 - 2018)

All HiTZ projects

MALTIXA

Coming soon.

Mikel Zubillaga, Naiara Perez, Oscar Sainz, German Rigau

SemBench: A Universal Semantic Framework for LLM Evaluation (2026)

Zubillaga, M., Perez, N., Sainz, O., & Rigau, G. (2026). SemBench: A Universal Semantic Framework for LLM Evaluation. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2603.11687

Xabier Irastortza-Urbieta, José M. García-Miguel, Marcos Garcia

Language Mixture to Develop Accurate Galician Dependency Parsers: An Exploration of Its Effects (2026)

Iker De la Iglesia, Iakes Goenaga, Johanna Ramirez-Romero, Jose Maria Villa-Gonzalez, Josu Goikoetxea, Ander Barrena

Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments (2025)

COLING 2025

Maria Jesus Aranzabe, Igone Zabala, Izaskun Aldezabal

Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2025)

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America (2025)

In ACL 2025.

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read German: A statistical analysis (2025)

trans-kom 18 [1]: 330–354 ISSN 1867-4844

Elisa Sanchez-Bayona, Rodrigo Agerri

Meta4XNLI: A Cross-lingual Parallel Corpus for Metaphor Detection and Interpretation (2025)

Computational Linguistics 2025, 1–44.

Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria de-Dios-Flores, and Rodrigo Agerri

Truth Knows No Language: Evaluating Truthfulness Beyond English (2025)

Beatriz Botella-Gil, Isabel Espinosa-Zaragoza, Alba Bonet-Jover, Margot Madina, Lucas Molino Piñar, Paloma Moreda, Itziar Gonzalez-Dios, M.Teresa Martín-Valdivia, L.Alfonso Ureña-López

Overview of CLEARS at IberLEF 2025: Challenge for Plain Language and Easy-to-Read Adaptation for Spanish texts (2025)

Overview of CLEARS at IberLEF 2025: Challenge for Plain Language and Easy-to-Read Adaptation for Spanish texts. Procesamiento del Lenguaje Natural, 75, 393-400.

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Elisa Sanchez-Bayona, Rodrigo Agerri

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

In Findings of the Association for Computational Linguistics: ACL 2025, pages 17462–17477, Vienna, Austria. Association for Computational Linguistics.

Olia Toporkov, Alan Akbik, Rodrigo Agerri

Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data (2025)

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18219–18232, Suzhou, China. Association for Computational Linguistics.

Blanca Calvo Figueras, Rodrigo Agerri

Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models (2025)

EMNLP Findings 2025

Orbegozo-Terradillos, Julen, Ainara Larrondo-Ureta, Nayla Escribano, Simón Peña Fernández, Rodrigo Agerri

BasqueParl: descifrar la huella retórica en el Parlamento Vasco con el procesamiento del lenguaje natural. (2025)

Palabra Clave 28, no. 1 (2025): e2813-e2813.

Masson, Maxime, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, and Philippe Roose

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

Knowledge-Based Systems (2025): 114001 (Elsevier).

Jaione Bengoetxea, Itziar Gonzalez-Dios, Rodrigo Agerri

Euskara eta gaztelaniazko kontra-narratiben sorkuntza: datuen sorrera eta ebaluazioa (2025)

IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 133–140.

Amaia Solaun, Nora Aranberri

Exploring lexical diversity in Basque news: original vs. machine-translated texts (2025)

Fontes Linguae Vasconum. Number 140. December 2025, 273-315.

Ekhi Azurmendi, Xabier Arregi, Oier Lopez de Lacalle

Euskarazko lehen C1 ebaluatzaile automatikoa (2025)

IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 125–132

Elhady, Ahmed, Eneko Agirre, y Mikel Artetxe

Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation (2025)

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32174–32186, Vienna, Austria. Association for Computational Linguistics.

Igone Zabala, María Jesús Aranzabe

HARTA/TAILA: Herramienta de ayuda a la enseñanza-aprendizaje de la fraseología académica del euskera basada en un corpus de trabajos académicos (2024)

In Perez-Llantada, Carmen; Carciu, Oana; Villares, Rosana (ed.), “Book of abstracts Joint 21st AELFE LSPPC7 Conference 2023”, Mendeley Data, V1.

Olia Toporkov, Rodrigo Agerri

On the Role of Morphological Information for Contextual Lemmatization (2024)

Computational Linguistics (MIT Press).

Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)

The Twelfth International Conference on Learning Representations

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

A Preliminary Study of ChatGPT for Spanish E2R Text Adaptation (2024)

Margot Madina, Itziar Gonzalez-Dios, and Melanie Siegel.

LanguageTool as a CAT tool for Easy-to-Read in Spanish (2024)

MedMT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain (2024)

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2024)

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2132–2141

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Towards Reliable E2R Texts: A Proposal for Standardized Evaluation Practices (2024)

Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation (2024)

Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9572–9585

Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

Ixa at RefutES 2024: Leveraging Language Models for Counter Narrative Generation (2024)

IberLEF 2024 - Proceedings of the Iberian Languages Evaluation Forum, co-located with the Conference of the Spanish Society for Natural Language Processing, SEPLN 2024, vol. 3756

Josu Goikoetxea, Markel Etxabe, Marcos García, Eleonora Guzzi, Margarita Alonso

Multi-label Discourse Function Classification of Lexical Bundles in Basque and Spanish via transformer-based models (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 29-41

Iñigo Alonso, Maite Oronoz, Rodrigo Agerri

MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering (2024)

Artificial Intelligence in Medicine Volume 155, September 2024, 102938 https://www.sciencedirect.com/science/article/pii/S0933365724001805

Blanca Calvo Figueras and Rodrigo Agerri

Critical Questions Generation: Motivation and Challenges (2024)

In Proceedings of the 28th Conference on Computational Natural Language Learning, pages 105–116, Miami, FL, USA. Association for Computational Linguistics.

Patrick Bareiß, Roman Klinger, Jeremy Barnes

English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts (2024)

In Companion Proceedings of the ACM Web Conference 2024 (WWW ’24 Companion), May 13–17, 2024, Singapore, Singapore. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3589335.3651902

Orphée De Clercq, Valentin Barriere, Jeremy Barnes, Roman Klinger, João Sedoc, and Shabnam Tafreshi

The 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis (2024)

Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis. Association for Computational Linguistics, Bangkok, Thailand, edition.

Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe

BertaQA: How Much Do Language Models Know About Local Culture? (2024)

38th Conference on Neural Information Processing Systems (NeurIPS 2024) https://doi.org/10.48550/arXiv.2406.07302

Ekaterina Sviridova, Anar Yeginbergen, Ainara Estarrona, Elena Cabrio, Serena Villata, Rodrigo Agerri

CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures (2024)

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18463-18475.

Cristian Cardellino, Theo Collias, Benjamin Molinet, Erwan Hain, Wei Sun, Rodrigo Agerri, Serena Villata, Elena Cabrio

ANTIDOTE: ArgumeNtaTIon-Driven explainable artificial intelligence fOr digiTal mEdicine. (2024)

ECAI 2024 (demos)

Maxime Masson, Philippe Roose, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Rodrigo Agerri

ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social media (2024)

Social Network Analysis and Mining, 14(1), pp.1-23

Anar Yeginbergen, Rodrigo Agerri

Crosslingual Argument Mining in the Medical Domain (2024)

Procesamiento del Lenguaje Natural, Nº. 73, págs. 296-312.

Rodrigo Agerri, Eneko Agirre, Gorka Azkune, Roberto Centeno, Anselmo Peñas, German Rigau, Álvaro Rodrigo, Aitor Soroa

DeepKnowledge: Deep Multilingual Language Model Technology for Language Understanding. (2024)

In SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations, June 19-20, 2024, A Coruña, Spain.

Rodrigo Agerri, Jeremy Barnes, Jaione Bengoetxea, Blanca Calvo Figueras, Joseba Fernandez de Landa, Iker García-Ferrero, Olia Toporkov, Irune Zubiaga

HiTZ@Disargue: Few-shot Learning and Argumentation to Detect and Fight Misinformation in Social Media. (2024)

In SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations, June 19-20, 2024, A Coruña, Spain.

Maxime Masson, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose, Rodrigo Agerri

TextBI: An Interactive Dashboard for Visualizing Multidimensional NLP Annotations in Social Media Data. (2024)

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024)

Olia Toporkov, Rodrigo Agerri

Evaluating Shortest Edit Script Methods for Contextual Lemmatization (2024)

In Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).

Joseba Fernandez de Landa

Adimen Artifizialeko metodoak gizarte ikerkuntzarako: analisi demografikoa, jarreren detekzioa eta joera politikoen identifikazioa (2024)

Aitor Ormazabal

Towards general attribute controllability in NLP models (2024)

Oscar Sainz, Oier Lopez de Lacalle, Eneko Agirre, German Rigau

What do Language Models know about word senses? Zero-Shot WSD with Language Models and Domain Inventories (2023)

In Proceedings of the 12th Global Wordnet Conference, pages 331–342, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.

Ainara Estarrona, Izaskun Etxeberria, Manuel Padilla-Moyano, Ander Soraluze

Measuring language distance for historical texts in Basque (2023)

Procesamiento del Lenguaje Natural, Revista no 70, marzo del 2023, pp. 53-61

Nayla Escribano, German Rigau, Rodrigo Agerri

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Mofologia Konputazionala Euskaraz, 35 urte (2023)

Lindemann, D. (arg.). Miren Azkarateri esker onez, 15-30. UPV/EHU Argitalpen zerbitzua. Bilbo.

Rodrigo Agerri, Eneko Agirre

Lessons learned from the evaluation of Spanish Language Models (2023)

Procesamiento del Lenguaje Natural (70), pp 157-170

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Scaling Laws for BERT in Low-Resource Settings (2023)

Findings of the Association for Computational Linguistics: ACL 2023

Masson, M., Roose, P., Sallaberry, C., Agerri, R., Bessagnet, MN., Lacayrelle, A.L.P

APs: A Proxemic Framework for Social Media Interactions Modeling and Analysis (2023)

In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham.

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

ACM Computing Surveys. 27 June 2023

Iker De la Iglesia, María Vivó, Paula Chocrón, Gabriel de Maeztu, Koldo Gojenola, Aitziber Atutxa

Overview of ClinAIS at IberLEF 2023: Automatic Identification of Sections in Clinical Documents in Spanish (2023)

Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language Resources and Tools for three European Languages (2023)

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena (2023)

Margot Madina, Itziar Gonzalez-Dios and Melanie Siegel

Easy-to-Read in Germany: a Survey on its Current State and Available Resources (2023)

Jeremy Barnes

Sentiment and Emotion Classification in Low-resource Settings (2023)

Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models (2023)

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Effective Correction Methods Using WordNet Meronymy Relations (2023)

Álvez, J., Gonzalez-Dios, I., & Rigau, G. (2023, January). Towards Effective Correction Methods Using WordNet Meronymy Relations. In Proceedings of the 12th Global Wordnet Conference (pp. 31-40).

Roberto Centeno, Rodrigo Agerri

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

Joseba Fernandez de Landa, Rodrigo Agerri

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Blanca Calvo Figueras, Irene Bausells, Tommaso Caselli

Dynamic Stance: Modeling Discussions by Labeling the Interactions (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal

Direct Parsing to Sentiment Graphs (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478

Nora Hollenstein, Itziar Gonzalez-Dios, Lisa Beinborn, and Lena Jäger

Patterns of text readability in human and predicted eye movements (2022)

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Soroa, A., Gonzalez-Dios, I,... & Manica, M.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., ... & Manica, M. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.

The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset (2022)

2022. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Oscar Cumbicus-Pineda, Iker Gutiérrez-Fandiño, Itziar Gonzalez-Dios, Aitor Soroa

Noisy Channel for Automatic Text Simplification (2022)

Cumbicus-Pineda, O. M., Gutiérrez-Fandiño, I., Gonzalez-Dios, I., & Soroa, A. (2022). Noisy Channel for Automatic Text Simplification. arXiv preprint arXiv:2211.03152.

Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

Does Corpus Quality Really Matter for Low-Resource Languages? (2022)

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7383–7390.

Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Maxime Masson, Christian Sallaberry, Rodrigo Agerri, Marie-Noelle Bessagnet, Philippe Roose, Annig Le Parc Lacayrelle

A Domain-Independent Method for Thematic Dataset Building from Social Media: The Case of Tourism on Twitter (2022)

In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham.

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa

BasqueGLUE: A Natural Language Understanding Benchmark for Basque (2022)

LREC 2022

Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

In SemEval 2022

Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri

A Semantics-Aware Approach to Automated Claim Verification (2022)

In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Pérez-de-Viñaspre, Rodrigo Agerri

Euskararen erabilera Eusko Legebiltzarreko debateetan (2012-2020) (2022)

Amaia Aguirregoitia Martinez, Kepa Bengoetxea Kortazar, Itziar Gonzalez-Dios

Journal of Immersion and Content-Based Language Education, Volume 9, Issue 1, May 2021, p. 4 - 30

Ionut-Teodor Sorodoc, Madhumita Sushil, Ece Takmaz, Eneko Agirre (Editors)

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop (2021)

In conjunction with EACL. Association for Computational Linguistics

Elena Zotova, Rodrigo Agerri, German Rigau

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Expert Systems with Applications, 170 (2021).

Joseba Fernandez de Landa, Rodrigo Agerri

Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)

Ekaia

Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos

Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Oscar Sainz, German Rigau

Ask2Transformers: Zero-Shot Domain labelling with Pretrained Language Models (2021)

Proceedings of the 11th Global WordNet Conference pages 44–52, University of South Africa (UNISA). Global Wordnet Association.

Iakes Goenaga, Xabier Lahuerta, Aitziber Atutxa, Koldo Gojenola

A Section Identification Tool: towards HL7 CDA/CCR Standardization in Spanish Discharge Summaries (2021)

Journal of Biomedical Informatics

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Oscar Cumbicus, Itziar Gonzalez-Dios, Aitor Soroa

A Syntax-Aware Edit-based System for Text Simplification (2021)

Kepa Bengoetxea, Itziar Gonzalez-Dios

MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment (2021)

arXiv:2109.04870

Kepa Bengoetxea and Itziar Gonzalez-Dios

MultiAzterTest@Exist-IberLEF 2021: Linguistically Motivated Sexism Identification (2021)

Kepa Bengoetxea and Itziar Gonzalez-Dios (2021)

MultiAzterTest@Exist-IberLEF

Itziar Gonzalez-Dios, Kepa Bengoetxea

A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity (2020)

Journal of Quantitative Linguistics. DOI 10.1080/09296174.2020.1732177

All HiTZ publications

Coming soon.

Languages

You are here

Text Analysis

Text_analysis_tabs

MALTIXA

SemBench: A Universal Semantic Framework for LLM Evaluation (2026)

Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments (2025)

Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2025)

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America (2025)

Meta4XNLI: A Cross-lingual Parallel Corpus for Metaphor Detection and Interpretation (2025)

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

BasqueParl: descifrar la huella retórica en el Parlamento Vasco con el procesamiento del lenguaje natural. (2025)

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

Euskara eta gaztelaniazko kontra-narratiben sorkuntza: datuen sorrera eta ebaluazioa (2025)

Exploring lexical diversity in Basque news: original vs. machine-translated texts (2025)

Euskarazko lehen C1 ebaluatzaile automatikoa (2025)

Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation (2025)

HARTA/TAILA: Herramienta de ayuda a la enseñanza-aprendizaje de la fraseología académica del euskera basada en un corpus de trabajos académicos (2024)

On the Role of Morphological Information for Contextual Lemmatization (2024)

MedMT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain (2024)

Towards Reliable E2R Texts: A Proposal for Standardized Evaluation Practices (2024)

A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation (2024)

Ixa at RefutES 2024: Leveraging Language Models for Counter Narrative Generation (2024)

The 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis (2024)

ANTIDOTE: ArgumeNtaTIon-Driven explainable artificial intelligence fOr digiTal mEdicine. (2024)

ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social media (2024)

Crosslingual Argument Mining in the Medical Domain (2024)

DeepKnowledge: Deep Multilingual Language Model Technology for Language Understanding. (2024)

HiTZ@Disargue: Few-shot Learning and Argumentation to Detect and Fight Misinformation in Social Media. (2024)

TextBI: An Interactive Dashboard for Visualizing Multidimensional NLP Annotations in Social Media Data. (2024)

Evaluating Shortest Edit Script Methods for Contextual Lemmatization (2024)

Measuring language distance for historical texts in Basque (2023)

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Mofologia Konputazionala Euskaraz, 35 urte (2023)

Lessons learned from the evaluation of Spanish Language Models (2023)

Scaling Laws for BERT in Low-Resource Settings (2023)

APs: A Proxemic Framework for Social Media Interactions Modeling and Analysis (2023)

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

Overview of ClinAIS at IberLEF 2023: Automatic Identification of Sections in Clinical Documents in Spanish (2023)

Easy-to-Read in Germany: a Survey on its Current State and Available Resources (2023)

Sentiment and Emotion Classification in Low-resource Settings (2023)

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models (2023)

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Does Corpus Quality Really Matter for Low-Resource Languages? (2022)

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

A Domain-Independent Method for Thematic Dataset Building from Social Media: The Case of Tourism on Twitter (2022)

BasqueGLUE: A Natural Language Understanding Benchmark for Basque (2022)

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

A Semantics-Aware Approach to Automated Claim Verification (2022)

Euskararen erabilera Eusko Legebiltzarreko debateetan (2012-2020) (2022)

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop (2021)

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)

A Section Identification Tool: towards HL7 CDA/CCR Standardization in Spanish Discharge Summaries (2021)

Language and Technology in Wales: Volume I (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)

Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021) (2021)

Multilingual Counter Narrative Type Classification (2021)

Cross-Lingual Word Embeddings (Book Review) (2020)

A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity (2020)

Text_analysis_tabs_full

MALTIXA

SemBench: A Universal Semantic Framework for LLM Evaluation (2026)

Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments (2025)

Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2025)

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America (2025)

Meta4XNLI: A Cross-lingual Parallel Corpus for Metaphor Detection and Interpretation (2025)

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

BasqueParl: descifrar la huella retórica en el Parlamento Vasco con el procesamiento del lenguaje natural. (2025)

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

Euskara eta gaztelaniazko kontra-narratiben sorkuntza: datuen sorrera eta ebaluazioa (2025)

Exploring lexical diversity in Basque news: original vs. machine-translated texts (2025)

Euskarazko lehen C1 ebaluatzaile automatikoa (2025)