Language Resources

For the development of products and applications in Linguistic Technology it is necessary to have basic linguistic resources (textual and oral corpus, lexicons and knowledge bases) and development tools (morphological and syntactic analysers, meaning disambiguators, corpus treatment tools, lemmatisers, integrated tool environments, etc.).

We have more than 25 years of experience in the creation of this type of basic linguistic resources and we have different reference corpus, lexicons ...Read More

see more

data_tabs

Demos

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

Contracts

All HiTZ projects.

Projects

Patents

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Resources

Publications

Janire Arana, Mikel Idoyaga, Maitane Urruela, Elisa Espina, Aitziber Atutxa, Koldo Gojenola

A Virtual Patient Dialogue System Based on Question-Answering on Clinical Records (2024)

THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION, LREC-Coling 2024, Torino

Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.

A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)

Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).

Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa

XNLIeu: a dataset for cross-lingual NLI in Basque (2024)

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4177–4188, Mexico City, Mexico. Association for Computational Linguistics.

Julen Etxaniz, Oscar Sainz, Naiara Perez Miguel, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

Latxa: An Open Language Model and Evaluation Suite for Basque (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)

Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2024)

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2132–2141

Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero

A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models (2024)

Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024), pages 35–44

Iñigo Alonso, Maite Oronoz, Rodrigo Agerri

MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering (2024)

Artificial Intelligence in Medicine, 2024.

Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea

EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137

Angelina McMillan-Major, Francesco De Toni, Zaid Alyafeai, Stella Biderman, Kimbo Chen, G\'{e}rard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ili\'{c}, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Daniel van Strien, Zeerak Talat, Yacine Jernite

Documenting Geographically and Contextually Diverse Language Data Sources (2024)

@article{mcmillan2024, author = {McMillan, Angelina-Major and De Francesco, Toni and Alyafeai, Zaid and Biderman, Stella and Chen Kimbo, and Dupont, G\'{e}rard and Elsahar, Hady and Emezue, Chris and Fikri Aji, Alham and Ili\'{c}, Suzana and Khamis, Nurulaqilla and Leong, Colin and Masoud, Maraim and Soroa, Aitor and Ortiz Suarez, Pedro and van Strien, Daniel and Talat, Zeerak and Jernite, Yacine, title = "{Documenting Geographically and Contextually Diverse Language Data Sources}", journal = {Northern European Journal of Language Technology (NELJT)}, volume = {10}, number = {1}, year = {2024}, issn = {2000-1533}, doi = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217}, url = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217} }

Ainara Estarrona, Izaskun Etxeberria, Manuel Padilla-Moyano, Ander Soraluze

Measuring language distance for historical texts in Basque (2023)

Procesamiento del Lenguaje Natural, Revista no 70, marzo del 2023, pp. 53-61

Igone Zabala

Euskararen erregistro akademikoen garapenaz: hiztegia eta fraseologia (2023)

Lindemann David (ed.) Miren Azkarateri esker onez. Bilbo: UPV/EHUko Argitalpen Zerbitzua: 313-332

Itziar Aduriz, Manex Agirrezabal, Eneko Agirre, Iñaki Alegria, Xabier Arregi, Jose Mari Arriola Xabier Artola, Arantza Díaz de Ilarraza, Ainara Estarrona, Izaskun Etxeberria, Nerea Ezeiza, Kepa Sarazola

Mofologia Konputazionala Euskaraz, 35 urte (2023)

Lindemann, D. (arg.). Miren Azkarateri esker onez, 15-30. UPV/EHU Argitalpen zerbitzua. Bilbo.

Izaskun Aldezabal, María Jesús Aranzabe

Euskararen eredutik hizkuntza-ereduen euskarara (2023)

David Lindemann (arg.), Miren Azkarateri esker onez, 57-75. Bilbo: UPV/EHUko Argitalpen Zerbitzua

Kepa Sarasola, Itziar Aldabe, Nora Aranberri

Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)

Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.

Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernáez, Eva Navas

Language Report Basque (2023)

Sarasola, K., I. Aldabe, A. Diaz de Ilarraza, A. Estarrona, A. Farwell, I. Hernáez, E. Navas (2023). Language Report Basque. In: Rehm, G., Way, A. (eds) European Language Equality. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-28819-7_5

Igone Zabala, María Jesús Aranzabe

HARTA/TAILA: Herramienta de ayuda a la enseñanza-aprendizaje de la fraseología académica del euskera basada en un corpus de trabajos académicos (2023)

Genres and Languages in Digital Environments: Trends and New Directions (Book of abstracts), page 75. Joint 21st AELFE-LSPPC7 Conference. Zaragoza, 28-30th June 2023.

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language Resources and Tools for three European Languages (2023)

Madina, M., Gonzalez-Dios, I., & Siegel, M. (2023, July). Easy-to-Read Language Resources and Tools for three European Languages. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments (pp. 693-699).

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena (2023)

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel (2023) Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena. V. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma: Giza Zientziak eta Artea, 35-42.

Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison

Identifying Token-Level Dialectal Features in Social Media (2023)

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Iker García, Begoña Altura, Javier Álvez, Itziar Gonzalez-Dios, German Rigau

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models (2023)

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Oscar Sainz, Jon Ander Campos, Iker García, Julen Etxaniz, Oier Lopez de Lacalle, Eneko Agirre

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models (2023)

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Effective Correction Methods Using WordNet Meronymy Relations (2023)

Álvez, J., Gonzalez-Dios, I., & Rigau, G. (2023, January). Towards Effective Correction Methods Using WordNet Meronymy Relations. In Proceedings of the 12th Global Wordnet Conference (pp. 31-40).

Kuzman, Taja ; Ljubešić, Nikola ; Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Rayson, Paul ; Vidler, John ; Agerri, Rodrigo ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Rober

Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en. ana 4.0 (2023)

Slovenian language resource repository CLARIN.SI

María Jesús Aranzabe, Igone Zabala, Izaskun Aldezabal

Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2023)

II. CLARIAH-EUS workshop-a: Europako ikerketa azpiegiturekin lotuta egongo den euskararako ikerketa azpiegitura eraikitzen. Donostian, 2023ko azaroaren 23an. (Workshop horretan aurkeztutako posterra)

Blanca Calvo Figueras, Irene Bausells, Tommaso Caselli

Dynamic Stance: Modeling Discussions by Labeling the Interactions (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Izaskun Aldezabal, Jose Mari Arriola, Arantxa Otegi

TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively (2022)

Proceedings of the 13th Language Resources and Evaluation Conference. Editors: Nicoletta Calzolari (Conference chair), Fred´ eric B ´ echet, Philippe Blache, Khalid Choukri, ´ Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hel´ ene Mazo, Jan Odijk, Stelios Piperidis

Gonzalez-Dios, Itziar and Altuna, Begoña

Natural Language Processing and Language Technologies for the Basque Language (2022)

Gonzalez-Dios, Itziar and Altuna, Begoña (2022). Natural Language Processing and Language Technologies for the Basque Language. In Cuadernos Europeos de Deusto. NÚMERO ESPECIAL. Linguas minoritarias e futuro de Europa. Minority Languages and the Future of Europe 26, 203-230. https://doi.org/10.18543/ced.2477 https://ced.revistas.deusto.es/issue/view/285

María Jesús Aranzabe, Antton Gurrutxaga, Igone Zabala

Compilación del corpus académico de noveles en euskera HARTAeus y su explotación para el estudio de la fraseología académica (2022)

Procesamiento del Lenguaje Natural, Revista no 69, septiembre de 2022, pp. 95-103

MarÍa Jesús Aranzabe, Izaskun Aldezabal, Igone Zabala

Recursos y Herramientas de Lingüística de Corpus y PLN para la Monitorización e Investigación de los Usos Académicos del Euskera (2022)

III. workshop de INTELE (Infraestructura de Tecnologías del Lenguaje). Madrid, 13 y 14 de septiembre (Workshop horretan aurkeztutako posterra)

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli

European Clinical Case Corpus (2022)

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli (2022). European Clinical Case Corpus. Georg Rehm ed. European Language Grid, A Language Technology Platform for Multilingual Europe. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-031-17258-8

Petter Mæhlum, Andre Kåsen, Samia Touileb, and Jeremy Barnes.

Annotating Norwegian language varieties on Twitter for Part-of-speech. (2022)

Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects

Itziar Glez Dios, Aitor Soroa, Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco de Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, etal.

The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset (2022)

2022. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Soroa, A., Gonzalez-Dios, I,... & Manica, M.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., ... & Manica, M. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.

Margarita Alonso Ramos, Igone Zabala

HARTAes-vas: Lexical combinations for an academic writing aid tool in Spanish and Basque (2022)

SEPLN-PD 2022. Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, September 21-23, 2022, A Coruña, España.

Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

Does Corpus Quality Really Matter for Low-Resource Languages? (2022)

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7383–7390.

Elisa Sanchez-Bayona, Rodrigo Agerri

Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection (2022)

Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 228--240, Abu Dhabi, United Arab Emirates, Association for Computational Linguistics.

Elisa Sanchez-Bayona, Rodrigo Agerri

From Automatic Metaphor Processing in Spanish to a Multilingual Perspective: Annotation, Systems, and Evaluation (2022)

Doctoral Symposium on Natural Language Processing from the PLN.net network 2022 (RED2018-102418-T), 21-23 September 2022, A Coruña, Spain.

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa

BasqueGLUE: A Natural Language Understanding Benchmark for Basque (2022)

LREC 2022

Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar M. Cumbicus-Pineda, Aitor Soroa

IrekiaLF_es: a new open benchmark and baseline systems for Spanish Automatic Text Simplification (2022)

Gonzalez-Dios, I., Gutiérrez-Fandiño, I., Cumbicus-Pineda, O. M., & Soroa, A. (2022, December). IrekiaLFes: a new open benchmark and baseline systems for Spanish automatic text simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) (pp. 86-97).

Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Cecilia Domingo, Tatiana Gonzalez-Ferrero, Itziar Gonzalez-Dios

What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus (2021)

Domingo, C., Gonzalez-Ferrero, T., & Gonzalez-Dios, I. (2021, January). What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus. In Proceedings of the 11th Global Wordnet Conference (pp. 234-242).

Itziar Gonzalez-Dios, Uxoa Iñurrieta, Igone Zabala

General and Specialised Corpora to Raise Linguistic Awareness in a Language Undergoing the Normalisation Process: Academic Writing in Basque (2021)

Gonzalez Dios, I.; Iñurrieta, U.; Zabala, I. General and specialised corpora to raise linguistic awareness in a language undergoing the normalisation process: academic writing in Basque. A: AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). "Multilingual academic and professional communication in a networked world. Proceedings of AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). Vilanova i la Geltrú (Barcelona), 7-9 July 2021". Vilanova i la Geltrú: Universitat Politècnica de Catalunya, 2021, ISBN 978-84-9880-943-5.

Igone Zabala

Euskararen lantze funtzionala esparru akademiko eta profesionaletan (2021)

In Grenoble, Lenore / Lane, Pia / Røyneland, Unn (eds.) Ivan Igartua & Lourdes Oñederra (Basqeu eds.) Linguistic Minorities in Europe Online. A Born-Digital, Multimodal, Peer-Reviewed Online Reference Resource The Gruyter Mouton

Igone Zabala

Euskaltzaindiaren Hiztegiaren ekarpena lexiko espezializatuaren eta ez-espezializatuaren harmonizazioan (2021)

In Andres Urrutia (ed.) Arantzazutik mundu zabalera. Euskararen normatibizazioa: 1968-2018. IKER 40. Euskaltzaindia-Iberoamericana Vervuert: 285-299

Igone Zabala, Izaskun Aldezabal, Maria Jesus Aranzabe

Academic Research Works and Domain Dinamics: Resources and Tools for Basque Academic Writing (2021)

18th International Conference on Minority Languages (Bilbao, 2021/03/24-26)

Jon Alkorta

Hacia el análisis de sentimientos en euskera (2021)

J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.

Jon Alkorta, Koldo Gojenola, Mikel Iruskieta

Ezeztapena identifikatzeko Murriztapen Gramatikako erregelak sentimenduen analisiaren testuinguruan (2021)

Alkorta, J., Gojenola, K. eta Iruskieta, M. 2021. Ezeztapena identifikatzeko Murriztapen Gramatikako erregelaksentimenduen analisiaren testuinguruan. IV. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma, Editoreak: Olatz Arbelaitz, Ainhoa Latatu, Miren Josu Omaetxebarria, Blanca Urgell. Bilbo: UEU, 169-176 orr.

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau

Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets (2021)

Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau (2021) Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets. arXiv:2107.00333

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

The First Annotated Corpus of Historical Basque (2021)

Digital Scholarship in the Humanities, vol. 37(2), pp. 391-404

Xabier Arregi

Hizkuntza-teknologiak euskaldunon artean (2021)

Hermes aldizkaria, 69, 2021, pp. 78-82

Igone Zabala, María Jesús Aranzabe, Izaskun Aldezabal

Retos actuales del desarrollo y aprendizaje de los registros académicos orales y escritos del euskera (2021)

Círculo de Lingüística Aplicada a la Comunicación 88, pp. 31-50

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project: European Clinical Case Corpus (2021)

Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2020)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Piroska Lendvai , Sándor Darányi, Christian Geng, Moniek Kuijpers, Oier Lopez de Lacalle , Jean-Christophe Mensonides, Simone Rebora and Uwe Reichel

Detection of Reading Absorption in User-Generated Book Reviews: Resources Creation and Evaluation (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque (2020)

Proceedings of The 12th Language Resources and Evaluation Conference, pp. 429–435. European Language Resources Association. ISBN: 979-10-95546-34-4

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Kepa Bengoetxea, Itziar Gonzalez-Dios, Amaia Aguirregoitia

AzterTest: Open source linguistic and stylistic analysis tool (2020)

Procesamiento del Lenguaje Natural, 64, 61-68. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6196

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Itziar Gonzalez-Dios, Javier Álvez, German Rigau

Towards modeling SUMO attributes through WordNet adjectives: a Case Study on Qualities. (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 1–6. ISBN: 979-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Jon Alkorta, Itziar Gonzalez-Dios

Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 20–24. ISBN: 79-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Thierry Declerck, Itziar Gonzalez-Dios, German Rigau (editors)

Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMWN-2020) (2020)

European Language Resources Association (ELRA), Paris. https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf ISBN: 979-10-95546-41-2 EAN: 9791095546412

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Begoña Altuna

Análisis de estructuras temporales en euskera y creación de un corpus (2020)

Procesamiento del Lenguaje Natural, Revista no 64, marzo de 2020, pp. 131-134 URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6206 ISSN: 1989-7553

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

Sintaktikoki etiketatutako euskarazko corpus historikoa eraikitzen (2020)

Fontes Linguae Vasconum 50 urte. Ekarpen berriak euskararen ikerketari. Nuevas aportaciones al estudio de la lengua vasca

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

Dealing with dialectal variation in the construction of the Basque historical corpus (2020)

Proceedings of the 7th Workshop on NLP for similar languages, varieties and dialects (VarDial2020 at COLING 2020).

Gorka Urbizu, Ander Soraluze, Olatz Arregi

Sequence to Sequence Coreference Resolution (2020)

Proceedings of the 3rd Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2020), pages 39–46,Barcelona, Spain (online), December 12, 2020.

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

DoQA - Accessing Domain-Specific FAQs via Conversational QA (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project:Collection and Annotation of a Multilingual Corpus of Clinical Cases (2020)

In Johanna Monti, Felice Dell'Orletta and Fabio Tamburini (eds.), Proceedings of the Seventh Italian Conference on Computational Linguistics. Associazione Italiana di Linguistica Computazionale. Bologna, Italy, 2020.

Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille

LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)

Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71

Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernaez, Eva Navas; Reviewers: Annika Grützner-Zahn, Maria Giagkou; Editors: Maria Giagkou, Stelios Piperidis, Georg Rehm, Jane Dunne

Report on the Basque Language. European Language Equality (2020)

Deliverables of the Project ELE (European Language Equality). D1.4 Report on the Basque Language, https://european-language-equality.eu/deliverables/

Jon Alkorta, Koldo Gojenola, Mikel Iruskieta

SentiTegi: building a semantic oriented Basque lexicon (2019)

Computación y Sistemas, 22 (4)

Igone Zabala

The elaboration of Basque in academic and professional domains. (2019)

In Grenoble, Lenore; Lane, Pia & Røyneland, Unn Unn Røyneland (ed.) Linguistic Minorities in Europe Online. The Gruyter Mouton. ISSN 2510-5361

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza

EUSKOR: End-to-end coreference resolution system for Basque (2019)

PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3

ItziarGonzalez-Dios, German Rigau

Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

Conversational QA for FAQs (2019)

NeurIPS 3rd Conversational AI Workshop: “Today's Practice and Tomorrow's Potential”

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga

Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic? (2019)

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga (2019) 'Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic?' Wikimedia+Education Conference, Donostia 2019

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43 ; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Igone Zabala

Euskararen lantze funtzionala eta profesionalen komunikazio-gaitasunen garapena osasun-alorrean (2018)

BAT Soziolinguistika Aldizkaria 108, 2018 (3): 11-34

Igone Zabala

Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)

In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)file2 (2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Arantza Díaz de Ilarraza, Itziar Gonzalez-Dios, Ruben Urizar

Building the Gold Standard for the Surface Syntax of Basque (2017)

Procesamiento del Lenguaje Natural, 58, 125-132. Consultado en http://ixa.si.ehu.es/sites/default/files/dokumentuak/8825/5421-4766-1-PB.pdf (ISSN edición impresa: 1135-5948) (ISSN edición electrónica: 1989-7553)

I. Zabala

Light Nouns and Term Creation in Basque (2017)

Terminàlia Nº 15 (2017): 17-37

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Zabala I., San Martin I., Lersundi M.

Learning terminology in order to become an active agent in the development of Basque biomedical registers (2016)

Language Learning in Higher Education. Journal of CercleS (European Confederation of Language Centres in Higher Education). De Gruyter Mouton. Volume 6, Issue 1 (May 2016). Special issue: Teaching Medical Discourse in Higher Education. ISSN (Online) 2191-6128, ISSN (Print) 2191-611X, DOI: 10.1515/cercles-2016-0007 URL: http://www.degruyter.com/view/j/cercles.2016.6.issue-1/cercles-2016-0007/cercles-2016-0007.xml

Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1

All HiTZ publications

data_tabs_full

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

All HiTZ projects.

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Janire Arana, Mikel Idoyaga, Maitane Urruela, Elisa Espina, Aitziber Atutxa, Koldo Gojenola

A Virtual Patient Dialogue System Based on Question-Answering on Clinical Records (2024)

THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION, LREC-Coling 2024, Torino

Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.

A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)

Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).

Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa

XNLIeu: a dataset for cross-lingual NLI in Basque (2024)

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4177–4188, Mexico City, Mexico. Association for Computational Linguistics.

Julen Etxaniz, Oscar Sainz, Naiara Perez Miguel, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

Latxa: An Open Language Model and Evaluation Suite for Basque (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)

Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2024)

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2132–2141

Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero

A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models (2024)

Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024), pages 35–44

Iñigo Alonso, Maite Oronoz, Rodrigo Agerri

MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering (2024)

Artificial Intelligence in Medicine, 2024.

Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea

EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137

Angelina McMillan-Major, Francesco De Toni, Zaid Alyafeai, Stella Biderman, Kimbo Chen, G\'{e}rard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ili\'{c}, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Daniel van Strien, Zeerak Talat, Yacine Jernite

Documenting Geographically and Contextually Diverse Language Data Sources (2024)

@article{mcmillan2024, author = {McMillan, Angelina-Major and De Francesco, Toni and Alyafeai, Zaid and Biderman, Stella and Chen Kimbo, and Dupont, G\'{e}rard and Elsahar, Hady and Emezue, Chris and Fikri Aji, Alham and Ili\'{c}, Suzana and Khamis, Nurulaqilla and Leong, Colin and Masoud, Maraim and Soroa, Aitor and Ortiz Suarez, Pedro and van Strien, Daniel and Talat, Zeerak and Jernite, Yacine, title = "{Documenting Geographically and Contextually Diverse Language Data Sources}", journal = {Northern European Journal of Language Technology (NELJT)}, volume = {10}, number = {1}, year = {2024}, issn = {2000-1533}, doi = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217}, url = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217} }

Ainara Estarrona, Izaskun Etxeberria, Manuel Padilla-Moyano, Ander Soraluze

Measuring language distance for historical texts in Basque (2023)

Procesamiento del Lenguaje Natural, Revista no 70, marzo del 2023, pp. 53-61

Igone Zabala

Euskararen erregistro akademikoen garapenaz: hiztegia eta fraseologia (2023)

Lindemann David (ed.) Miren Azkarateri esker onez. Bilbo: UPV/EHUko Argitalpen Zerbitzua: 313-332

Itziar Aduriz, Manex Agirrezabal, Eneko Agirre, Iñaki Alegria, Xabier Arregi, Jose Mari Arriola Xabier Artola, Arantza Díaz de Ilarraza, Ainara Estarrona, Izaskun Etxeberria, Nerea Ezeiza, Kepa Sarazola

Mofologia Konputazionala Euskaraz, 35 urte (2023)

Lindemann, D. (arg.). Miren Azkarateri esker onez, 15-30. UPV/EHU Argitalpen zerbitzua. Bilbo.

Izaskun Aldezabal, María Jesús Aranzabe

Euskararen eredutik hizkuntza-ereduen euskarara (2023)

David Lindemann (arg.), Miren Azkarateri esker onez, 57-75. Bilbo: UPV/EHUko Argitalpen Zerbitzua

Kepa Sarasola, Itziar Aldabe, Nora Aranberri

Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)

Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.

Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernáez, Eva Navas

Language Report Basque (2023)

Sarasola, K., I. Aldabe, A. Diaz de Ilarraza, A. Estarrona, A. Farwell, I. Hernáez, E. Navas (2023). Language Report Basque. In: Rehm, G., Way, A. (eds) European Language Equality. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-28819-7_5

Igone Zabala, María Jesús Aranzabe

HARTA/TAILA: Herramienta de ayuda a la enseñanza-aprendizaje de la fraseología académica del euskera basada en un corpus de trabajos académicos (2023)

Genres and Languages in Digital Environments: Trends and New Directions (Book of abstracts), page 75. Joint 21st AELFE-LSPPC7 Conference. Zaragoza, 28-30th June 2023.

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language Resources and Tools for three European Languages (2023)

Madina, M., Gonzalez-Dios, I., & Siegel, M. (2023, July). Easy-to-Read Language Resources and Tools for three European Languages. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments (pp. 693-699).

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena (2023)

Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel (2023) Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena. V. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma: Giza Zientziak eta Artea, 35-42.

Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison

Identifying Token-Level Dialectal Features in Social Media (2023)

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Iker García, Begoña Altura, Javier Álvez, Itziar Gonzalez-Dios, German Rigau

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models (2023)

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Oscar Sainz, Jon Ander Campos, Iker García, Julen Etxaniz, Oier Lopez de Lacalle, Eneko Agirre

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models (2023)

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Effective Correction Methods Using WordNet Meronymy Relations (2023)

Álvez, J., Gonzalez-Dios, I., & Rigau, G. (2023, January). Towards Effective Correction Methods Using WordNet Meronymy Relations. In Proceedings of the 12th Global Wordnet Conference (pp. 31-40).

Kuzman, Taja ; Ljubešić, Nikola ; Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Rayson, Paul ; Vidler, John ; Agerri, Rodrigo ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Rober

Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en. ana 4.0 (2023)

Slovenian language resource repository CLARIN.SI

María Jesús Aranzabe, Igone Zabala, Izaskun Aldezabal

Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2023)

II. CLARIAH-EUS workshop-a: Europako ikerketa azpiegiturekin lotuta egongo den euskararako ikerketa azpiegitura eraikitzen. Donostian, 2023ko azaroaren 23an. (Workshop horretan aurkeztutako posterra)

Blanca Calvo Figueras, Irene Bausells, Tommaso Caselli

Dynamic Stance: Modeling Discussions by Labeling the Interactions (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Izaskun Aldezabal, Jose Mari Arriola, Arantxa Otegi

TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively (2022)

Proceedings of the 13th Language Resources and Evaluation Conference. Editors: Nicoletta Calzolari (Conference chair), Fred´ eric B ´ echet, Philippe Blache, Khalid Choukri, ´ Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hel´ ene Mazo, Jan Odijk, Stelios Piperidis

Gonzalez-Dios, Itziar and Altuna, Begoña

Natural Language Processing and Language Technologies for the Basque Language (2022)

Gonzalez-Dios, Itziar and Altuna, Begoña (2022). Natural Language Processing and Language Technologies for the Basque Language. In Cuadernos Europeos de Deusto. NÚMERO ESPECIAL. Linguas minoritarias e futuro de Europa. Minority Languages and the Future of Europe 26, 203-230. https://doi.org/10.18543/ced.2477 https://ced.revistas.deusto.es/issue/view/285

María Jesús Aranzabe, Antton Gurrutxaga, Igone Zabala

Compilación del corpus académico de noveles en euskera HARTAeus y su explotación para el estudio de la fraseología académica (2022)

Procesamiento del Lenguaje Natural, Revista no 69, septiembre de 2022, pp. 95-103

MarÍa Jesús Aranzabe, Izaskun Aldezabal, Igone Zabala

Recursos y Herramientas de Lingüística de Corpus y PLN para la Monitorización e Investigación de los Usos Académicos del Euskera (2022)

III. workshop de INTELE (Infraestructura de Tecnologías del Lenguaje). Madrid, 13 y 14 de septiembre (Workshop horretan aurkeztutako posterra)

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli

European Clinical Case Corpus (2022)

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli (2022). European Clinical Case Corpus. Georg Rehm ed. European Language Grid, A Language Technology Platform for Multilingual Europe. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-031-17258-8

Petter Mæhlum, Andre Kåsen, Samia Touileb, and Jeremy Barnes.

Annotating Norwegian language varieties on Twitter for Part-of-speech. (2022)

Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects

Itziar Glez Dios, Aitor Soroa, Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco de Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, etal.

The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset (2022)

2022. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Soroa, A., Gonzalez-Dios, I,... & Manica, M.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., ... & Manica, M. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.

Margarita Alonso Ramos, Igone Zabala

HARTAes-vas: Lexical combinations for an academic writing aid tool in Spanish and Basque (2022)

SEPLN-PD 2022. Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, September 21-23, 2022, A Coruña, España.

Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

Does Corpus Quality Really Matter for Low-Resource Languages? (2022)

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7383–7390.

Elisa Sanchez-Bayona, Rodrigo Agerri

Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection (2022)

Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 228--240, Abu Dhabi, United Arab Emirates, Association for Computational Linguistics.

Elisa Sanchez-Bayona, Rodrigo Agerri

From Automatic Metaphor Processing in Spanish to a Multilingual Perspective: Annotation, Systems, and Evaluation (2022)

Doctoral Symposium on Natural Language Processing from the PLN.net network 2022 (RED2018-102418-T), 21-23 September 2022, A Coruña, Spain.

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa

BasqueGLUE: A Natural Language Understanding Benchmark for Basque (2022)

LREC 2022

Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar M. Cumbicus-Pineda, Aitor Soroa

IrekiaLF_es: a new open benchmark and baseline systems for Spanish Automatic Text Simplification (2022)

Gonzalez-Dios, I., Gutiérrez-Fandiño, I., Cumbicus-Pineda, O. M., & Soroa, A. (2022, December). IrekiaLFes: a new open benchmark and baseline systems for Spanish automatic text simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) (pp. 86-97).

Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Cecilia Domingo, Tatiana Gonzalez-Ferrero, Itziar Gonzalez-Dios

What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus (2021)

Domingo, C., Gonzalez-Ferrero, T., & Gonzalez-Dios, I. (2021, January). What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus. In Proceedings of the 11th Global Wordnet Conference (pp. 234-242).

Itziar Gonzalez-Dios, Uxoa Iñurrieta, Igone Zabala

General and Specialised Corpora to Raise Linguistic Awareness in a Language Undergoing the Normalisation Process: Academic Writing in Basque (2021)

Gonzalez Dios, I.; Iñurrieta, U.; Zabala, I. General and specialised corpora to raise linguistic awareness in a language undergoing the normalisation process: academic writing in Basque. A: AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). "Multilingual academic and professional communication in a networked world. Proceedings of AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). Vilanova i la Geltrú (Barcelona), 7-9 July 2021". Vilanova i la Geltrú: Universitat Politècnica de Catalunya, 2021, ISBN 978-84-9880-943-5.

Igone Zabala

Euskararen lantze funtzionala esparru akademiko eta profesionaletan (2021)

In Grenoble, Lenore / Lane, Pia / Røyneland, Unn (eds.) Ivan Igartua & Lourdes Oñederra (Basqeu eds.) Linguistic Minorities in Europe Online. A Born-Digital, Multimodal, Peer-Reviewed Online Reference Resource The Gruyter Mouton

Igone Zabala

Euskaltzaindiaren Hiztegiaren ekarpena lexiko espezializatuaren eta ez-espezializatuaren harmonizazioan (2021)

In Andres Urrutia (ed.) Arantzazutik mundu zabalera. Euskararen normatibizazioa: 1968-2018. IKER 40. Euskaltzaindia-Iberoamericana Vervuert: 285-299

Igone Zabala, Izaskun Aldezabal, Maria Jesus Aranzabe

Academic Research Works and Domain Dinamics: Resources and Tools for Basque Academic Writing (2021)

18th International Conference on Minority Languages (Bilbao, 2021/03/24-26)

Jon Alkorta

Hacia el análisis de sentimientos en euskera (2021)

J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.

Jon Alkorta, Koldo Gojenola, Mikel Iruskieta

Ezeztapena identifikatzeko Murriztapen Gramatikako erregelak sentimenduen analisiaren testuinguruan (2021)

Alkorta, J., Gojenola, K. eta Iruskieta, M. 2021. Ezeztapena identifikatzeko Murriztapen Gramatikako erregelaksentimenduen analisiaren testuinguruan. IV. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma, Editoreak: Olatz Arbelaitz, Ainhoa Latatu, Miren Josu Omaetxebarria, Blanca Urgell. Bilbo: UEU, 169-176 orr.

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau

Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets (2021)

Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau (2021) Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets. arXiv:2107.00333

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

The First Annotated Corpus of Historical Basque (2021)

Digital Scholarship in the Humanities, vol. 37(2), pp. 391-404

Xabier Arregi

Hizkuntza-teknologiak euskaldunon artean (2021)

Hermes aldizkaria, 69, 2021, pp. 78-82

Igone Zabala, María Jesús Aranzabe, Izaskun Aldezabal

Retos actuales del desarrollo y aprendizaje de los registros académicos orales y escritos del euskera (2021)

Círculo de Lingüística Aplicada a la Comunicación 88, pp. 31-50

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project: European Clinical Case Corpus (2021)

Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2020)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Piroska Lendvai , Sándor Darányi, Christian Geng, Moniek Kuijpers, Oier Lopez de Lacalle , Jean-Christophe Mensonides, Simone Rebora and Uwe Reichel

Detection of Reading Absorption in User-Generated Book Reviews: Resources Creation and Evaluation (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque (2020)

Proceedings of The 12th Language Resources and Evaluation Conference, pp. 429–435. European Language Resources Association. ISBN: 979-10-95546-34-4

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Kepa Bengoetxea, Itziar Gonzalez-Dios, Amaia Aguirregoitia

AzterTest: Open source linguistic and stylistic analysis tool (2020)

Procesamiento del Lenguaje Natural, 64, 61-68. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6196

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Itziar Gonzalez-Dios, Javier Álvez, German Rigau

Towards modeling SUMO attributes through WordNet adjectives: a Case Study on Qualities. (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 1–6. ISBN: 979-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Jon Alkorta, Itziar Gonzalez-Dios

Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 20–24. ISBN: 79-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Thierry Declerck, Itziar Gonzalez-Dios, German Rigau (editors)

Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMWN-2020) (2020)

European Language Resources Association (ELRA), Paris. https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf ISBN: 979-10-95546-41-2 EAN: 9791095546412

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Begoña Altuna

Análisis de estructuras temporales en euskera y creación de un corpus (2020)

Procesamiento del Lenguaje Natural, Revista no 64, marzo de 2020, pp. 131-134 URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6206 ISSN: 1989-7553

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

Sintaktikoki etiketatutako euskarazko corpus historikoa eraikitzen (2020)

Fontes Linguae Vasconum 50 urte. Ekarpen berriak euskararen ikerketari. Nuevas aportaciones al estudio de la lengua vasca

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

Dealing with dialectal variation in the construction of the Basque historical corpus (2020)

Proceedings of the 7th Workshop on NLP for similar languages, varieties and dialects (VarDial2020 at COLING 2020).

Gorka Urbizu, Ander Soraluze, Olatz Arregi

Sequence to Sequence Coreference Resolution (2020)

Proceedings of the 3rd Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2020), pages 39–46,Barcelona, Spain (online), December 12, 2020.

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

DoQA - Accessing Domain-Specific FAQs via Conversational QA (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project:Collection and Annotation of a Multilingual Corpus of Clinical Cases (2020)

In Johanna Monti, Felice Dell'Orletta and Fabio Tamburini (eds.), Proceedings of the Seventh Italian Conference on Computational Linguistics. Associazione Italiana di Linguistica Computazionale. Bologna, Italy, 2020.

Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille

LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)

Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71

Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernaez, Eva Navas; Reviewers: Annika Grützner-Zahn, Maria Giagkou; Editors: Maria Giagkou, Stelios Piperidis, Georg Rehm, Jane Dunne

Report on the Basque Language. European Language Equality (2020)

Deliverables of the Project ELE (European Language Equality). D1.4 Report on the Basque Language, https://european-language-equality.eu/deliverables/

Jon Alkorta, Koldo Gojenola, Mikel Iruskieta

SentiTegi: building a semantic oriented Basque lexicon (2019)

Computación y Sistemas, 22 (4)

Igone Zabala

The elaboration of Basque in academic and professional domains. (2019)

In Grenoble, Lenore; Lane, Pia & Røyneland, Unn Unn Røyneland (ed.) Linguistic Minorities in Europe Online. The Gruyter Mouton. ISSN 2510-5361

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza

EUSKOR: End-to-end coreference resolution system for Basque (2019)

PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3

ItziarGonzalez-Dios, German Rigau

Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

Conversational QA for FAQs (2019)

NeurIPS 3rd Conversational AI Workshop: “Today's Practice and Tomorrow's Potential”

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga

Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic? (2019)

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga (2019) 'Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic?' Wikimedia+Education Conference, Donostia 2019

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43 ; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Igone Zabala

Euskararen lantze funtzionala eta profesionalen komunikazio-gaitasunen garapena osasun-alorrean (2018)

BAT Soziolinguistika Aldizkaria 108, 2018 (3): 11-34

Igone Zabala

Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)

In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)file2 (2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Arantza Díaz de Ilarraza, Itziar Gonzalez-Dios, Ruben Urizar

Building the Gold Standard for the Surface Syntax of Basque (2017)

Procesamiento del Lenguaje Natural, 58, 125-132. Consultado en http://ixa.si.ehu.es/sites/default/files/dokumentuak/8825/5421-4766-1-PB.pdf (ISSN edición impresa: 1135-5948) (ISSN edición electrónica: 1989-7553)

I. Zabala

Light Nouns and Term Creation in Basque (2017)

Terminàlia Nº 15 (2017): 17-37

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Zabala I., San Martin I., Lersundi M.

Learning terminology in order to become an active agent in the development of Basque biomedical registers (2016)

Language Learning in Higher Education. Journal of CercleS (European Confederation of Language Centres in Higher Education). De Gruyter Mouton. Volume 6, Issue 1 (May 2016). Special issue: Teaching Medical Discourse in Higher Education. ISSN (Online) 2191-6128, ISSN (Print) 2191-611X, DOI: 10.1515/cercles-2016-0007 URL: http://www.degruyter.com/view/j/cercles.2016.6.issue-1/cercles-2016-0007/cercles-2016-0007.xml

Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1

All HiTZ publications