publications
Angelina McMillan-Major, Francesco De Toni, Zaid Alyafeai, Stella Biderman, Kimbo Chen, G\'{e}rard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ili\'{c}, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Daniel van Strien, Zeerak Talat, Yacine Jernite
Documenting Geographically and Contextually Diverse Language Data Sources (2024)
@article{mcmillan2024, author = {McMillan, Angelina-Major and De Francesco, Toni and Alyafeai, Zaid and Biderman, Stella and Chen Kimbo, and Dupont, G\'{e}rard and Elsahar, Hady and Emezue, Chris and Fikri Aji, Alham and Ili\'{c}, Suzana and Khamis, Nurulaqilla and Leong, Colin and Masoud, Maraim and Soroa, Aitor and Ortiz Suarez, Pedro and van Strien, Daniel and Talat, Zeerak and Jernite, Yacine, title = "{Documenting Geographically and Contextually Diverse Language Data Sources}", journal = {Northern European Journal of Language Technology (NELJT)}, volume = {10}, number = {1}, year = {2024}, issn = {2000-1533}, doi = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217}, url = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217} }
Xabier Larrayoz, Arantza Casillas, Maite Oronoz, Alicia Pérez
Mental Disorder Detection in Spanish: Hands on Skewed Class Distribution to Leverage Training (2024)
Accepted. MentalRiskES at IberLEF 2023: Early Detection of Mental Disorders Risk in Spanish
Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea
EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)
Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137
Unai Atutxa Barrenetxea, Iker de la Iglesia, Mikel Iruskieta
IGARRITZ: el predictor de palabras para el euskera basado en la inteligencia artificial y su evaluación en el entorno escolar (2024)
III. Congreso Internacional de Nuevas Tecnologías y Tendencias en la Educación. 181-194 orr. Dykinson.
Mikel Iruskieta, Iker de la Iglesia, Unai Atutxa, Lierni Ortiz
IGARRITZ: euskarazko testu iragarpenerako web ingurune egokitua (2024)
Ekaia
Nuria Lebeña, Alicia Pérez, Arantza Casillas
Quantifying decision support level of explainable automatic classification of diagnoses in Spanish medical records (2024)
Nuria Lebeña, Alicia Pérez, Arantza Casillas, Quantifying decision support level of explainable automatic classification of diagnoses in Spanish medical records, Computers in Biology and Medicine, Volume 182, 2024, 109127, ISSN 0010-4825, https://doi.org/10.1016/j.compbiomed.2024.109127. (https://www.sciencedirect.com/science/article/pii/S0010482524012125)
Iñigo Alonso, Maite Oronoz, Rodrigo Agerri
MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering (2024)
Artificial Intelligence in Medicine, 2024.
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel
Towards Reliable E2R Texts: A Proposal for Standardized Evaluation Practices (2024)
Madina, M., Gonzalez-Dios, I., & Siegel, M. (2024, July). Towards reliable E2R texts: a proposal for standardized evaluation practices. In International Conference on Computers Helping People with Special Needs (pp. 224-231). Cham: Springer Nature Switzerland.
Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero
A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models (2024)
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024), pages 35–44
Iñigo Alonso, Eneko Agirre, Mirella Lapata
PixT3: Pixel-based Table To Text generation (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)
Ahmed Elhady, Khaled Elsayed, Eneko Agirre, and Mikel Artetxe
Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 755–761, Mexico City, Mexico. Association for Computational Linguistics.
Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre
Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (2024)
Proceeding of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy
Eneko Agirre, Itziar Aldabe, Xabier Arregi, Mikel Artetxe, Unai Atutxa, Ekhi Azurmendi, Iker De la Iglesia, Julen Etxaniz, Victor García-Romillo, Inma Hernaez-Rioja, Asier Herranz, Mikel Iruskieta, Oier López de Lacalle, Eva Navas, Paula Ontalvilla, Aitor Ormazabal, Naiara Perez, German Rigau1 Oscar Sainz, Jon Sanchez, Ibon Saratxaga, Aitor Soroa, Christoforos Souganidis, Jon Vadillo and Aimar Zabala
IKER-GAITU: research on language technology for Basque and other low-resource languages (2024)
-
Olia Toporkov, Rodrigo Agerri
On the Role of Morphological Information for Contextual Lemmatization (2024)
Computational Linguistics (MIT Press).
Adrián Núñez-Marcos, Ignacio Arganda-Carreras
Transformer-based fall detection in videos (2024)
Núñez-Marcos, A., & Arganda-Carreras, I. (2024). Transformer-based fall detection in videos. Engineering Applications of Artificial Intelligence, 132, 107937.
Júlia Falcão, Claudia Borg, Nora Aranberri, and Kurt Abela
COMET for Low-Resource Machine Translation Evaluation: A Case Study of English-Maltese and Spanish-Basque (2024)
In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3553–3565, Torino, Italia. ELRA and ICCL.
Nora Aranberri
Analysis of the Annotations from a Crowd MT Evaluation Initiative: Case Study for the Spanish-Basque Pair (2024)
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 548–559 June 24-27.
Maite Oronoz, Sara Gracia, Jose Mari González, Alicia Pérez
Suizidio-zantzuak sare sozialetan: ingelesez eta gaztelaniaz hizkuntza-ezaugarriak berdinak al dira? (2024)
EKAIA: Zientzia eta Teknologia aldizkaria. 2024ko XX alea.
Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri
Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand
Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lacalle, Mikel Artetxe
Do Multilingual Language Models Think Better in English? (2024)
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 550–564, Mexico City, Mexico. Association for Computational Linguistics.
Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa
XNLIeu: a dataset for cross-lingual NLI in Basque (2024)
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4177–4188, Mexico City, Mexico. Association for Computational Linguistics.
Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri
Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2024)
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2132–2141
Julen Etxaniz, Oscar Sainz, Naiara Perez Miguel, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
Latxa: An Open Language Model and Evaluation Suite for Basque (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)
Jordan Koontz, Maite Oronoz, Alicia Pérez
Ixa-Med at Discharge Me! Retrieval-Assisted Generation for Streamlining Discharge Documentation (2024)
BioNLP Discharge-Me Shared Task @ ACL
Tomaž Erjavec, Matyáš Kopp, Nikola Ljubešić, Taja Kuzman, Paul Rayson, Petya Osenova, Maciej Ogrodniczuk, Çağrı Çöltekin, Danijel Koržinek, Katja Meden, Jure Skubic, Peter Rupnik, Tommaso Agnoloni, José Aires, Starkaður Barkarson, Roberto Bartolini, Núria Bel, María Calzada, Roberts Darģis, Sascha Diwersy, Maria Gavriilidou, Ruben van Heusden, Mikel Iruskieta, Neeme Kahusk, Anna Kryvenko, Noémi Ligeti-Nagy, Carmen Magariños, Martin Mölder, Costanza Navarretta, Kiril Simov et al.
ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe (2024)
PREPRINT (Version 1) available at Research Square
Iker de la Iglesia, Unai Atutxa Barrenetxea, Lierni Ortiz Elorza, Mikel Iruskieta
IGARRITZ: prediccion de textos en euskera para la escritura con la mirada (2024)
-
Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa Salazar, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata, Andrea Zaninello
MedMT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain (2024)
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Unai Atutxa Barrenetxea, Iker de la Iglesia, Lierni Ortiz Elorza, Mikel Iruskieta
Impacto de IGARRITZ en la producción de textos en euskera para personas con parálisis cerebral: Un estudio en entorno real (2024)
-
Janire Arana, Mikel Idoyaga, Maitane Urruela, Elisa Espina, Aitziber Atutxa, Koldo Gojenola
A Virtual Patient Dialogue System Based on Question-Answering on Clinical Records (2024)
THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION, LREC-Coling 2024, Torino
Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.
A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)
Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel
A Preliminary Study of ChatGPT for Spanish E2R Text Adaptation (2024)
Madina, M., Gonzalez-Dios, I., & Siegel, M. (2024, May). A Preliminary Study of ChatGPT for Spanish E2R Text Adaptation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 1422-1434).
Margot Madina, Itziar Gonzalez-Dios, and Melanie Siegel.
LanguageTool as a CAT tool for Easy-to-Read in Spanish (2024)
Madina, M., Gonzalez-Dios, I., & Siegel, M. (2024, May). LanguageTool as a CAT tool for Easy-to-Read in Spanish. In Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI)@ LREC-COLING 2024 (pp. 93-101).
Irune Ibarra, Asunción Martínez-Arbelaiz, Jose Mari Arriola, Mikel Iruskieta
Medidas de proceso de escritura en alumnado de 2.º de Primaria en dos momentos de la vida escolar (2024)
Título: Educación en transición: experiencias y propuestas para un mundo cambiante. Nahia Idoiaga, Noemi Serrano-Díaz y Eva Palasí (coords.) Editorial: Octaedro.
Eneko Agirre, Olatz Arbelaitz, Olatz Arregi, Gorka Azkune, Arantza Casillas, Inma Hernaez, Mikel Iruskieta, Elena Lazkano, Eva Navas, German Rigau, Roberto Santana, Aitor Soroa and Rabih Zbib
ENIA Chair in Artificial Intelligence and Language Technology (2024)
-
Maria Sierro, Begoña Altuna, Itziar Gonzalez-Dios.
Automatic Detection and Labelling of Personal Data in Case Reports from the ECHR in Spanish: Evaluation of Two Different Annotation Approaches (2024)
Sierro, M., Altuna, B., & Gonzalez-Dios, I. (2024, March). Automatic Detection and Labelling of Personal Data in Case Reports from the ECHR in Spanish: Evaluation of Two Different Annotation Approaches. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024) (pp. 18-24).
Irune Ibarra, Asunción Martínez, Jose Maria Arriola
Análisis de las revisiones espontáneas a nivel de palabra para la mejora de la escritura a mano en segundo de Educación Primaria (2024)
Educación en la era digital. Propuestas innovadoras para los desafíos educativos del presente y del futuro. Tirant lo Blanch, capit. 3º, pp.65-77.
Nuria Lebeña, Arantza Casillas, Alicia Pérez
Temporal Name Entity Recognition and Relation Extraction in Clinical Electronic Health Records with Span-based Entity and Relation Transformer (2024)
ICBBB '24: Proceedings of the 2024 14th International Conference on Bioscience, Biochemistry and Bioinformatics; January 2024;Pages 48–54
Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)
The Twelfth International Conference on Learning Representations
Suna Şeyma Uçar, Itziar Aldabe, Nora Aranberri, Ana Arruarte
Exploring Automatic Readability Assessment for Science Documents within a Multilingual Educational Context (2024)
Uçar, SŞ., Aldabe, I., Aranberri, N. et al. Exploring Automatic Readability Assessment for Science Documents within a Multilingual Educational Context. Int J Artif Intell Educ (2024). https://doi.org/10.1007/s40593-024-00393-2
Nora Aranberri, Uxoa Iñurrieta
When minoritized languages encounter MT: perceptions and expectations of the Basque community (2024)
Aranberri, N., & Iñurrieta, U. (2024). When minoritized languages encounter MT: perceptions and expectations of the Basque community. The Journal of Specialised Translation, (41), 179-205. Available at: https://www.jostrans.org/article/view/4718/4237
Gorka Azkune, Ander Salaberria, Eneko Agirre
Grounding spatial relations in text-only language models (2024)
Neural Networks. Volume 170, February 2024, Pages 215-226