Language Resources
For the development of products and applications in Linguistic Technology it is necessary to have basic linguistic resources (textual and oral corpus, lexicons and knowledge bases) and development tools (morphological and syntactic analysers, meaning disambiguators, corpus treatment tools, lemmatisers, integrated tool environments, etc.).
We have more than 25 years of experience in the creation of this type of basic linguistic resources and we have different reference corpus, lexicons ...Read More
data_tabs
Demos
Konbitzul
Izen+aditz konbinazio-itzulpenen datu-basea
e-ROLda
A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus
Universal Dependencies treebank for Basque
This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.
Contracts
(2020 - 2021)
(2019 - 2019)- Hizkuntza Teknologia: Egoeraren diagnostikoa eta AMIA egitea.
(2019 - 2019) - Euskara HTen arloan sustatzeko proposamenak.
(2019 - 2019) - Hizkuntza-teknologiak sustatzeko proiektu transbertsalak
(2019 - 2019) - Orotariko Euskal Hiztegia corpus bihurtzea: bigarren urratsa, B fasea.
Phase B, second stage in the conversion to corpus of the dictionary Orotariko Euskal Hiztegia.
(2017 - 2017) - Orotariko Euskal Hiztegia corpus bihurtzea: bigarren urratsa.
Second stage in the conversion to corpus of the dictionary Orotariko Euskal Hiztegia.
(2016 - 2016)
Projects
LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
(2024 - 2026)- DeepMinor: Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
(2023 - 2025)- ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco
(2023 - 2025)
Research on Language Technology to foster the presence of Basque in the digital landscape.
(2023 - 2025)
(2023 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Language In The Human-Machine Era (LITHME). COST Action number: CA19102.
(2020 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)
Strategic network for the integration into the European research infrastructures in Social Sciences and Humanities.
(2023 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)- LUTEST: LANGUAGE UNDERSTANDING TEST SETS
(2020 - 2023)
Study of lexical combinations in Basque based on a novice academic corpus for an Academic Texts Writing Aid
(2020 - 2023)
Trustworthy AI - Integrating Learning, Optimisation and Reasoning
(2020 - 2023)
(2023 - 2023)
European Language Equality
(2021 - 2022)
enetCollect: A New European Network for combining Language Learning with Crowdsourcing Techniques
(2017 - 2021)
red estratégica para la promoción de las infraestructuras de tecnologías del lenguaje en ehumanidades y ciencias sociales
(2020 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)- CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019) - DL4NLP: Deep Learning aplicado al Procesamiento del Lenguaje Natural como apoyo a los ámbitos del RIS3
(2019 - 2019)
(2011 - 2011) All HiTZ projects
Patents
Resources
Publications
Janire Arana, Mikel Idoyaga, Maitane Urruela, Elisa Espina, Aitziber Atutxa, Koldo Gojenola
A Virtual Patient Dialogue System Based on Question-Answering on Clinical Records (2024)
THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION, LREC-Coling 2024, Torino
Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.
A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)
Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).
Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa
XNLIeu: a dataset for cross-lingual NLI in Basque (2024)
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4177–4188, Mexico City, Mexico. Association for Computational Linguistics.
Julen Etxaniz, Oscar Sainz, Naiara Perez Miguel, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
Latxa: An Open Language Model and Evaluation Suite for Basque (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)
Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri
Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2024)
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2132–2141
Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero
A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models (2024)
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024), pages 35–44
Iñigo Alonso, Maite Oronoz, Rodrigo Agerri
MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering (2024)
Artificial Intelligence in Medicine, 2024.
Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea
EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)
Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137
Angelina McMillan-Major, Francesco De Toni, Zaid Alyafeai, Stella Biderman, Kimbo Chen, G\'{e}rard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ili\'{c}, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Daniel van Strien, Zeerak Talat, Yacine Jernite
Documenting Geographically and Contextually Diverse Language Data Sources (2024)
@article{mcmillan2024, author = {McMillan, Angelina-Major and De Francesco, Toni and Alyafeai, Zaid and Biderman, Stella and Chen Kimbo, and Dupont, G\'{e}rard and Elsahar, Hady and Emezue, Chris and Fikri Aji, Alham and Ili\'{c}, Suzana and Khamis, Nurulaqilla and Leong, Colin and Masoud, Maraim and Soroa, Aitor and Ortiz Suarez, Pedro and van Strien, Daniel and Talat, Zeerak and Jernite, Yacine, title = "{Documenting Geographically and Contextually Diverse Language Data Sources}", journal = {Northern European Journal of Language Technology (NELJT)}, volume = {10}, number = {1}, year = {2024}, issn = {2000-1533}, doi = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217}, url = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217} }
Ainara Estarrona, Izaskun Etxeberria, Manuel Padilla-Moyano, Ander Soraluze
Measuring language distance for historical texts in Basque (2023)
Procesamiento del Lenguaje Natural, Revista no 70, marzo del 2023, pp. 53-61
Igone Zabala
Euskararen erregistro akademikoen garapenaz: hiztegia eta fraseologia (2023)
Lindemann David (ed.) Miren Azkarateri esker onez. Bilbo: UPV/EHUko Argitalpen Zerbitzua: 313-332
Itziar Aduriz, Manex Agirrezabal, Eneko Agirre, Iñaki Alegria, Xabier Arregi, Jose Mari Arriola Xabier Artola, Arantza Díaz de Ilarraza, Ainara Estarrona, Izaskun Etxeberria, Nerea Ezeiza, Kepa Sarazola
Mofologia Konputazionala Euskaraz, 35 urte (2023)
Lindemann, D. (arg.). Miren Azkarateri esker onez, 15-30. UPV/EHU Argitalpen zerbitzua. Bilbo.
Izaskun Aldezabal, María Jesús Aranzabe
Euskararen eredutik hizkuntza-ereduen euskarara (2023)
David Lindemann (arg.), Miren Azkarateri esker onez, 57-75. Bilbo: UPV/EHUko Argitalpen Zerbitzua
Kepa Sarasola, Itziar Aldabe, Nora Aranberri
Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)
Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.
Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernáez, Eva Navas
Language Report Basque (2023)
Sarasola, K., I. Aldabe, A. Diaz de Ilarraza, A. Estarrona, A. Farwell, I. Hernáez, E. Navas (2023). Language Report Basque. In: Rehm, G., Way, A. (eds) European Language Equality. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-28819-7_5
Igone Zabala, María Jesús Aranzabe
HARTA/TAILA: Herramienta de ayuda a la enseñanza-aprendizaje de la fraseología académica del euskera basada en un corpus de trabajos académicos (2023)
Genres and Languages in Digital Environments: Trends and New Directions (Book of abstracts), page 75. Joint 21st AELFE-LSPPC7 Conference. Zaragoza, 28-30th June 2023.
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel
Easy-to-Read Language Resources and Tools for three European Languages (2023)
Madina, M., Gonzalez-Dios, I., & Siegel, M. (2023, July). Easy-to-Read Language Resources and Tools for three European Languages. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments (pp. 693-699).
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel
Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena (2023)
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel (2023) Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena. V. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma: Giza Zientziak eta Artea, 35-42.
Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison
Identifying Token-Level Dialectal Features in Social Media (2023)
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Iker García, Rodrigo Agerri, German Rigau
T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Iker García, Begoña Altura, Javier Álvez, Itziar Gonzalez-Dios, German Rigau
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models (2023)
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Oscar Sainz, Jon Ander Campos, Iker García, Julen Etxaniz, Oier Lopez de Lacalle, Eneko Agirre
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models (2023)
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Towards Effective Correction Methods Using WordNet Meronymy Relations (2023)
Álvez, J., Gonzalez-Dios, I., & Rigau, G. (2023, January). Towards Effective Correction Methods Using WordNet Meronymy Relations. In Proceedings of the 12th Global Wordnet Conference (pp. 31-40).
Kuzman, Taja ; Ljubešić, Nikola ; Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Rayson, Paul ; Vidler, John ; Agerri, Rodrigo ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Rober
Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en. ana 4.0 (2023)
Slovenian language resource repository CLARIN.SI
María Jesús Aranzabe, Igone Zabala, Izaskun Aldezabal
Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2023)
II. CLARIAH-EUS workshop-a: Europako ikerketa azpiegiturekin lotuta egongo den euskararako ikerketa azpiegitura eraikitzen. Donostian, 2023ko azaroaren 23an. (Workshop horretan aurkeztutako posterra)
Blanca Calvo Figueras, Irene Bausells, Tommaso Caselli
Dynamic Stance: Modeling Discussions by Labeling the Interactions (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Izaskun Aldezabal, Jose Mari Arriola, Arantxa Otegi
TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively (2022)
Proceedings of the 13th Language Resources and Evaluation Conference. Editors: Nicoletta Calzolari (Conference chair), Fred´ eric B ´ echet, Philippe Blache, Khalid Choukri, ´ Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hel´ ene Mazo, Jan Odijk, Stelios Piperidis
Gonzalez-Dios, Itziar and Altuna, Begoña
Natural Language Processing and Language Technologies for the Basque Language (2022)
Gonzalez-Dios, Itziar and Altuna, Begoña (2022). Natural Language Processing and Language Technologies for the Basque Language. In Cuadernos Europeos de Deusto. NÚMERO ESPECIAL. Linguas minoritarias e futuro de Europa. Minority Languages and the Future of Europe 26, 203-230. https://doi.org/10.18543/ced.2477 https://ced.revistas.deusto.es/issue/view/285
María Jesús Aranzabe, Antton Gurrutxaga, Igone Zabala
Compilación del corpus académico de noveles en euskera HARTAeus y su explotación para el estudio de la fraseología académica (2022)
Procesamiento del Lenguaje Natural, Revista no 69, septiembre de 2022, pp. 95-103
MarÍa Jesús Aranzabe, Izaskun Aldezabal, Igone Zabala
Recursos y Herramientas de Lingüística de Corpus y PLN para la Monitorización e Investigación de los Usos Académicos del Euskera (2022)
III. workshop de INTELE (Infraestructura de Tecnologías del Lenguaje). Madrid, 13 y 14 de septiembre (Workshop horretan aurkeztutako posterra)
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli
European Clinical Case Corpus (2022)
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli (2022). European Clinical Case Corpus. Georg Rehm ed. European Language Grid, A Language Technology Platform for Multilingual Europe. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-031-17258-8
Petter Mæhlum, Andre Kåsen, Samia Touileb, and Jeremy Barnes.
Annotating Norwegian language varieties on Twitter for Part-of-speech. (2022)
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
Itziar Glez Dios, Aitor Soroa, Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco de Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, etal.
The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset (2022)
2022. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Soroa, A., Gonzalez-Dios, I,... & Manica, M.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., ... & Manica, M. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.
Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri
BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)
Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.
Margarita Alonso Ramos, Igone Zabala
HARTAes-vas: Lexical combinations for an academic writing aid tool in Spanish and Basque (2022)
SEPLN-PD 2022. Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, September 21-23, 2022, A Coruña, España.
Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa
Does Corpus Quality Really Matter for Low-Resource Languages? (2022)
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7383–7390.
Elisa Sanchez-Bayona, Rodrigo Agerri
Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection (2022)
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 228--240, Abu Dhabi, United Arab Emirates, Association for Computational Linguistics.
Elisa Sanchez-Bayona, Rodrigo Agerri
From Automatic Metaphor Processing in Spanish to a Multilingual Perspective: Annotation, Systems, and Evaluation (2022)
Doctoral Symposium on Natural Language Processing from the PLN.net network 2022 (RED2018-102418-T), 21-23 September 2022, A Coruña, Spain.
Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa
BasqueGLUE: A Natural Language Understanding Benchmark for Basque (2022)
LREC 2022
Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar M. Cumbicus-Pineda, Aitor Soroa
IrekiaLF_es: a new open benchmark and baseline systems for Spanish Automatic Text Simplification (2022)
Gonzalez-Dios, I., Gutiérrez-Fandiño, I., Cumbicus-Pineda, O. M., & Soroa, A. (2022, December). IrekiaLFes: a new open benchmark and baseline systems for Spanish automatic text simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) (pp. 86-97).
Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre
PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation (2022)
Findings of the Association for Computational Linguistics: EMNLP 2022
Cecilia Domingo, Tatiana Gonzalez-Ferrero, Itziar Gonzalez-Dios
What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus (2021)
Domingo, C., Gonzalez-Ferrero, T., & Gonzalez-Dios, I. (2021, January). What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus. In Proceedings of the 11th Global Wordnet Conference (pp. 234-242).
Itziar Gonzalez-Dios, Uxoa Iñurrieta, Igone Zabala
General and Specialised Corpora to Raise Linguistic Awareness in a Language Undergoing the Normalisation Process: Academic Writing in Basque (2021)
Gonzalez Dios, I.; Iñurrieta, U.; Zabala, I. General and specialised corpora to raise linguistic awareness in a language undergoing the normalisation process: academic writing in Basque. A: AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). "Multilingual academic and professional communication in a networked world. Proceedings of AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). Vilanova i la Geltrú (Barcelona), 7-9 July 2021". Vilanova i la Geltrú: Universitat Politècnica de Catalunya, 2021, ISBN 978-84-9880-943-5.
Igone Zabala
Euskararen lantze funtzionala esparru akademiko eta profesionaletan (2021)
In Grenoble, Lenore / Lane, Pia / Røyneland, Unn (eds.) Ivan Igartua & Lourdes Oñederra (Basqeu eds.) Linguistic Minorities in Europe Online. A Born-Digital, Multimodal, Peer-Reviewed Online Reference Resource The Gruyter Mouton
Igone Zabala
Euskaltzaindiaren Hiztegiaren ekarpena lexiko espezializatuaren eta ez-espezializatuaren harmonizazioan (2021)
In Andres Urrutia (ed.) Arantzazutik mundu zabalera. Euskararen normatibizazioa: 1968-2018. IKER 40. Euskaltzaindia-Iberoamericana Vervuert: 285-299
Igone Zabala, Izaskun Aldezabal, Maria Jesus Aranzabe
Academic Research Works and Domain Dinamics: Resources and Tools for Basque Academic Writing (2021)
18th International Conference on Minority Languages (Bilbao, 2021/03/24-26)
Jon Alkorta
Hacia el análisis de sentimientos en euskera (2021)
J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.
Jon Alkorta, Koldo Gojenola, Mikel Iruskieta
Ezeztapena identifikatzeko Murriztapen Gramatikako erregelak sentimenduen analisiaren testuinguruan (2021)
Alkorta, J., Gojenola, K. eta Iruskieta, M. 2021. Ezeztapena identifikatzeko Murriztapen Gramatikako erregelaksentimenduen analisiaren testuinguruan. IV. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma, Editoreak: Olatz Arbelaitz, Ainhoa Latatu, Miren Josu Omaetxebarria, Blanca Urgell. Bilbo: UEU, 169-176 orr.
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Language and Technology in Wales: Volume I (2021)
Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)
Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6
Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau
Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets (2021)
Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau (2021) Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets. arXiv:2107.00333
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze
The First Annotated Corpus of Historical Basque (2021)
Digital Scholarship in the Humanities, vol. 37(2), pp. 391-404
Igone Zabala, María Jesús Aranzabe, Izaskun Aldezabal
Retos actuales del desarrollo y aprendizaje de los registros académicos orales y escritos del euskera (2021)
Círculo de Lingüística Aplicada a la Comunicación 88, pp. 31-50
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli
The E3C Project: European Clinical Case Corpus (2021)
Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf
Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza
How the corpus-based Basque Verb Index lexicon was built (2020)
Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands
Piroska Lendvai , Sándor Darányi, Christian Geng, Moniek Kuijpers, Oier Lopez de Lacalle , Jean-Christophe Mensonides, Simone Rebora and Uwe Reichel
Detection of Reading Absorption in User-Generated Book Reviews: Resources Creation and Evaluation (2020)
Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France
Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque (2020)
Proceedings of The 12th Language Resources and Evaluation Conference, pp. 429–435. European Language Resources Association. ISBN: 979-10-95546-34-4
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Towards Word Sense Disambiguation by Reasoning (2020)
Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340
Uxoa Iñurrieta
Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)
Procesamiento del Lenguaje Natural, 64, pp. 123-126.
Kepa Bengoetxea, Itziar Gonzalez-Dios, Amaia Aguirregoitia
AzterTest: Open source linguistic and stylistic analysis tool (2020)
Procesamiento del Lenguaje Natural, 64, 61-68. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6196
Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre
Give your Text Representation Models some Love: the Case for Basque (2020)
Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf
Itziar Gonzalez-Dios, Javier Álvez, German Rigau
Towards modeling SUMO attributes through WordNet adjectives: a Case Study on Qualities. (2020)
Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 1–6. ISBN: 979-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf
Jon Alkorta, Itziar Gonzalez-Dios
Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon (2020)
Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 20–24. ISBN: 79-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf
Thierry Declerck, Itziar Gonzalez-Dios, German Rigau (editors)
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMWN-2020) (2020)
European Language Resources Association (ELRA), Paris. https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf ISBN: 979-10-95546-41-2 EAN: 9791095546412
Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza
EusTimeML: A mark-up language for temporal information in Basque (2020)
Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06
Begoña Altuna
Análisis de estructuras temporales en euskera y creación de un corpus (2020)
Procesamiento del Lenguaje Natural, Revista no 64, marzo de 2020, pp. 131-134 URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6206 ISSN: 1989-7553
Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)
Language Resources and Evaluation Conference (LREC 2020)
Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)
Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze
Sintaktikoki etiketatutako euskarazko corpus historikoa eraikitzen (2020)
Fontes Linguae Vasconum 50 urte. Ekarpen berriak euskararen ikerketari. Nuevas aportaciones al estudio de la lengua vasca
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze
Dealing with dialectal variation in the construction of the Basque historical corpus (2020)
Proceedings of the 7th Workshop on NLP for similar languages, varieties and dialects (VarDial2020 at COLING 2020).
Gorka Urbizu, Ander Soraluze, Olatz Arregi
Sequence to Sequence Coreference Resolution (2020)
Proceedings of the 3rd Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2020), pages 39–46,Barcelona, Spain (online), December 12, 2020.
Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
DoQA - Accessing Domain-Specific FAQs via Conversational QA (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314
Itziar Gonzalez-Dios
Data statement of the Corpus of Basque Simplified Texts (2020)file2 (2020)file3 (2020)
Data Statements workshop
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli
The E3C Project:Collection and Annotation of a Multilingual Corpus of Clinical Cases (2020)
In Johanna Monti, Felice Dell'Orletta and Fabio Tamburini (eds.), Proceedings of the Seventh Italian Conference on Computational Linguistics. Associazione Italiana di Linguistica Computazionale. Bologna, Italy, 2020.
Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille
LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)
Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71
Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernaez, Eva Navas; Reviewers: Annika Grützner-Zahn, Maria Giagkou; Editors: Maria Giagkou, Stelios Piperidis, Georg Rehm, Jane Dunne
Report on the Basque Language. European Language Equality (2020)
Deliverables of the Project ELE (European Language Equality). D1.4 Report on the Basque Language, https://european-language-equality.eu/deliverables/
Jon Alkorta, Koldo Gojenola, Mikel Iruskieta
SentiTegi: building a semantic oriented Basque lexicon (2019)
Computación y Sistemas, 22 (4)
Igone Zabala
The elaboration of Basque in academic and professional domains. (2019)
In Grenoble, Lenore; Lane, Pia & Røyneland, Unn Unn Røyneland (ed.) Linguistic Minorities in Europe Online. The Gruyter Mouton. ISSN 2510-5361
Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta
Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)
PLoS ONE 14(9): e0221639
Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza
EUSKOR: End-to-end coreference resolution system for Basque (2019)
PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801
Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano
Spelling Normalisation of Basque Historical Texts (2019)
Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)
Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3
ItziarGonzalez-Dios, German Rigau
Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)
Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3
Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
Conversational QA for FAQs (2019)
NeurIPS 3rd Conversational AI Workshop: “Today's Practice and Tomorrow's Potential”
Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga
Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic? (2019)
Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga (2019) 'Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic?' Wikimedia+Education Conference, Donostia 2019
Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza
Adapting TimeML to Basque: Event Annotation (2018)
In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43 ; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1
Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Konbitzul: an MWE-specific Database for Spanish-Basque (2018)
Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.
Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria
Verbal Multiword Expressions in Basque corpora (2018)
In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)
Igone Zabala
Euskararen lantze funtzionala eta profesionalen komunikazio-gaitasunen garapena osasun-alorrean (2018)
BAT Soziolinguistika Aldizkaria 108, 2018 (3): 11-34
Igone Zabala
Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)
In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.
Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga
Coreferential Relations in Basque: The Annotation Process (2018)
J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.
Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar
Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)file2 (2018)
Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf
Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Arantza Díaz de Ilarraza, Itziar Gonzalez-Dios, Ruben Urizar
Building the Gold Standard for the Surface Syntax of Basque (2017)
Procesamiento del Lenguaje Natural, 58, 125-132. Consultado en http://ixa.si.ehu.es/sites/default/files/dokumentuak/8825/5421-4766-1-PB.pdf (ISSN edición impresa: 1135-5948) (ISSN edición electrónica: 1989-7553)
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Zabala I., San Martin I., Lersundi M.
Learning terminology in order to become an active agent in the development of Basque biomedical registers (2016)
Language Learning in Higher Education. Journal of CercleS (European Confederation of Language Centres in Higher Education). De Gruyter Mouton. Volume 6, Issue 1 (May 2016). Special issue: Teaching Medical Discourse in Higher Education. ISSN (Online) 2191-6128, ISSN (Print) 2191-611X, DOI: 10.1515/cercles-2016-0007 URL: http://www.degruyter.com/view/j/cercles.2016.6.issue-1/cercles-2016-0007/cercles-2016-0007.xml
Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1
data_tabs_full
Konbitzul
Izen+aditz konbinazio-itzulpenen datu-basea
e-ROLda
A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus
Universal Dependencies treebank for Basque
This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.
(2020 - 2021)
(2019 - 2019)- Hizkuntza Teknologia: Egoeraren diagnostikoa eta AMIA egitea.
(2019 - 2019) - Euskara HTen arloan sustatzeko proposamenak.
(2019 - 2019) - Hizkuntza-teknologiak sustatzeko proiektu transbertsalak
(2019 - 2019) - Orotariko Euskal Hiztegia corpus bihurtzea: bigarren urratsa, B fasea.
Phase B, second stage in the conversion to corpus of the dictionary Orotariko Euskal Hiztegia.
(2017 - 2017) - Orotariko Euskal Hiztegia corpus bihurtzea: bigarren urratsa.
Second stage in the conversion to corpus of the dictionary Orotariko Euskal Hiztegia.
(2016 - 2016)
LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
(2024 - 2026)- DeepMinor: Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
Language Models for Multilingual and Multidomain Text Processing in Low Resource Scenarios
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
(2023 - 2025)- ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco
(2023 - 2025)
Research on Language Technology to foster the presence of Basque in the digital landscape.
(2023 - 2025)
(2023 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Language In The Human-Machine Era (LITHME). COST Action number: CA19102.
(2020 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)
Strategic network for the integration into the European research infrastructures in Social Sciences and Humanities.
(2023 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)- LUTEST: LANGUAGE UNDERSTANDING TEST SETS
(2020 - 2023)
Study of lexical combinations in Basque based on a novice academic corpus for an Academic Texts Writing Aid
(2020 - 2023)
Trustworthy AI - Integrating Learning, Optimisation and Reasoning
(2020 - 2023)
(2023 - 2023)
European Language Equality
(2021 - 2022)
enetCollect: A New European Network for combining Language Learning with Crowdsourcing Techniques
(2017 - 2021)
red estratégica para la promoción de las infraestructuras de tecnologías del lenguaje en ehumanidades y ciencias sociales
(2020 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)- CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019) - DL4NLP: Deep Learning aplicado al Procesamiento del Lenguaje Natural como apoyo a los ámbitos del RIS3
(2019 - 2019)
(2011 - 2011) All HiTZ projects
Janire Arana, Mikel Idoyaga, Maitane Urruela, Elisa Espina, Aitziber Atutxa, Koldo Gojenola
A Virtual Patient Dialogue System Based on Question-Answering on Clinical Records (2024)
THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION, LREC-Coling 2024, Torino
Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.
A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)
Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).
Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa
XNLIeu: a dataset for cross-lingual NLI in Basque (2024)
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4177–4188, Mexico City, Mexico. Association for Computational Linguistics.
Julen Etxaniz, Oscar Sainz, Naiara Perez Miguel, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
Latxa: An Open Language Model and Evaluation Suite for Basque (2024)
Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)
Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri
Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2024)
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2132–2141
Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero
A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models (2024)
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024), pages 35–44
Iñigo Alonso, Maite Oronoz, Rodrigo Agerri
MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering (2024)
Artificial Intelligence in Medicine, 2024.
Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea
EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)
Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137
Angelina McMillan-Major, Francesco De Toni, Zaid Alyafeai, Stella Biderman, Kimbo Chen, G\'{e}rard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ili\'{c}, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Daniel van Strien, Zeerak Talat, Yacine Jernite
Documenting Geographically and Contextually Diverse Language Data Sources (2024)
@article{mcmillan2024, author = {McMillan, Angelina-Major and De Francesco, Toni and Alyafeai, Zaid and Biderman, Stella and Chen Kimbo, and Dupont, G\'{e}rard and Elsahar, Hady and Emezue, Chris and Fikri Aji, Alham and Ili\'{c}, Suzana and Khamis, Nurulaqilla and Leong, Colin and Masoud, Maraim and Soroa, Aitor and Ortiz Suarez, Pedro and van Strien, Daniel and Talat, Zeerak and Jernite, Yacine, title = "{Documenting Geographically and Contextually Diverse Language Data Sources}", journal = {Northern European Journal of Language Technology (NELJT)}, volume = {10}, number = {1}, year = {2024}, issn = {2000-1533}, doi = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217}, url = {https://doi.org/10.3384/nejlt.2000-1533.2024.5217} }
Ainara Estarrona, Izaskun Etxeberria, Manuel Padilla-Moyano, Ander Soraluze
Measuring language distance for historical texts in Basque (2023)
Procesamiento del Lenguaje Natural, Revista no 70, marzo del 2023, pp. 53-61
Igone Zabala
Euskararen erregistro akademikoen garapenaz: hiztegia eta fraseologia (2023)
Lindemann David (ed.) Miren Azkarateri esker onez. Bilbo: UPV/EHUko Argitalpen Zerbitzua: 313-332
Itziar Aduriz, Manex Agirrezabal, Eneko Agirre, Iñaki Alegria, Xabier Arregi, Jose Mari Arriola Xabier Artola, Arantza Díaz de Ilarraza, Ainara Estarrona, Izaskun Etxeberria, Nerea Ezeiza, Kepa Sarazola
Mofologia Konputazionala Euskaraz, 35 urte (2023)
Lindemann, D. (arg.). Miren Azkarateri esker onez, 15-30. UPV/EHU Argitalpen zerbitzua. Bilbo.
Izaskun Aldezabal, María Jesús Aranzabe
Euskararen eredutik hizkuntza-ereduen euskarara (2023)
David Lindemann (arg.), Miren Azkarateri esker onez, 57-75. Bilbo: UPV/EHUko Argitalpen Zerbitzua
Kepa Sarasola, Itziar Aldabe, Nora Aranberri
Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)
Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.
Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernáez, Eva Navas
Language Report Basque (2023)
Sarasola, K., I. Aldabe, A. Diaz de Ilarraza, A. Estarrona, A. Farwell, I. Hernáez, E. Navas (2023). Language Report Basque. In: Rehm, G., Way, A. (eds) European Language Equality. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-28819-7_5
Igone Zabala, María Jesús Aranzabe
HARTA/TAILA: Herramienta de ayuda a la enseñanza-aprendizaje de la fraseología académica del euskera basada en un corpus de trabajos académicos (2023)
Genres and Languages in Digital Environments: Trends and New Directions (Book of abstracts), page 75. Joint 21st AELFE-LSPPC7 Conference. Zaragoza, 28-30th June 2023.
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel
Easy-to-Read Language Resources and Tools for three European Languages (2023)
Madina, M., Gonzalez-Dios, I., & Siegel, M. (2023, July). Easy-to-Read Language Resources and Tools for three European Languages. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments (pp. 693-699).
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel
Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena (2023)
Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel (2023) Easy-to-Read Language: baliabide linguistikoen eta testuen egokitzapena eta tresna automatikoen garapena. V. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma: Giza Zientziak eta Artea, 35-42.
Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison
Identifying Token-Level Dialectal Features in Social Media (2023)
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Iker García, Rodrigo Agerri, German Rigau
T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Iker García, Begoña Altura, Javier Álvez, Itziar Gonzalez-Dios, German Rigau
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models (2023)
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Oscar Sainz, Jon Ander Campos, Iker García, Julen Etxaniz, Oier Lopez de Lacalle, Eneko Agirre
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models (2023)
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Towards Effective Correction Methods Using WordNet Meronymy Relations (2023)
Álvez, J., Gonzalez-Dios, I., & Rigau, G. (2023, January). Towards Effective Correction Methods Using WordNet Meronymy Relations. In Proceedings of the 12th Global Wordnet Conference (pp. 31-40).
Kuzman, Taja ; Ljubešić, Nikola ; Erjavec, Tomaž ; Kopp, Matyáš ; Ogrodniczuk, Maciej ; Osenova, Petya ; Rayson, Paul ; Vidler, John ; Agerri, Rodrigo ; Agirrezabal, Manex ; Agnoloni, Tommaso ; Aires, José ; Albini, Monica ; Alkorta, Jon ; Antiba-Cartazo, Iván ; Arrieta, Ekain ; Barcala, Mario ; Bardanca, Daniel ; Barkarson, Starkaður ; Bartolini, Roberto ; Battistoni, Roberto ; Bel, Nuria ; Bonet Ramos, Maria del Mar ; Calzada Pérez, María ; Cardoso, Aida ; Çöltekin, Çağrı ; Coole, Matthew ; Darģis, Rober
Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en. ana 4.0 (2023)
Slovenian language resource repository CLARIN.SI
María Jesús Aranzabe, Igone Zabala, Izaskun Aldezabal
Goi-mailako testu akademikoak lantzeko baliabideak eta tresnak (2023)
II. CLARIAH-EUS workshop-a: Europako ikerketa azpiegiturekin lotuta egongo den euskararako ikerketa azpiegitura eraikitzen. Donostian, 2023ko azaroaren 23an. (Workshop horretan aurkeztutako posterra)
Blanca Calvo Figueras, Irene Bausells, Tommaso Caselli
Dynamic Stance: Modeling Discussions by Labeling the Interactions (2023)
Findings of the Association for Computational Linguistics: EMNLP 2023
Izaskun Aldezabal, Jose Mari Arriola, Arantxa Otegi
TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively (2022)
Proceedings of the 13th Language Resources and Evaluation Conference. Editors: Nicoletta Calzolari (Conference chair), Fred´ eric B ´ echet, Philippe Blache, Khalid Choukri, ´ Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hel´ ene Mazo, Jan Odijk, Stelios Piperidis
Gonzalez-Dios, Itziar and Altuna, Begoña
Natural Language Processing and Language Technologies for the Basque Language (2022)
Gonzalez-Dios, Itziar and Altuna, Begoña (2022). Natural Language Processing and Language Technologies for the Basque Language. In Cuadernos Europeos de Deusto. NÚMERO ESPECIAL. Linguas minoritarias e futuro de Europa. Minority Languages and the Future of Europe 26, 203-230. https://doi.org/10.18543/ced.2477 https://ced.revistas.deusto.es/issue/view/285
María Jesús Aranzabe, Antton Gurrutxaga, Igone Zabala
Compilación del corpus académico de noveles en euskera HARTAeus y su explotación para el estudio de la fraseología académica (2022)
Procesamiento del Lenguaje Natural, Revista no 69, septiembre de 2022, pp. 95-103
MarÍa Jesús Aranzabe, Izaskun Aldezabal, Igone Zabala
Recursos y Herramientas de Lingüística de Corpus y PLN para la Monitorización e Investigación de los Usos Académicos del Euskera (2022)
III. workshop de INTELE (Infraestructura de Tecnologías del Lenguaje). Madrid, 13 y 14 de septiembre (Workshop horretan aurkeztutako posterra)
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli
European Clinical Case Corpus (2022)
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli (2022). European Clinical Case Corpus. Georg Rehm ed. European Language Grid, A Language Technology Platform for Multilingual Europe. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-031-17258-8
Petter Mæhlum, Andre Kåsen, Samia Touileb, and Jeremy Barnes.
Annotating Norwegian language varieties on Twitter for Part-of-speech. (2022)
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
Itziar Glez Dios, Aitor Soroa, Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco de Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, etal.
The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset (2022)
2022. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Soroa, A., Gonzalez-Dios, I,... & Manica, M.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., ... & Manica, M. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.
Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri
BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)
Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.
Margarita Alonso Ramos, Igone Zabala
HARTAes-vas: Lexical combinations for an academic writing aid tool in Spanish and Basque (2022)
SEPLN-PD 2022. Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, September 21-23, 2022, A Coruña, España.
Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa
Does Corpus Quality Really Matter for Low-Resource Languages? (2022)
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7383–7390.
Elisa Sanchez-Bayona, Rodrigo Agerri
Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection (2022)
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 228--240, Abu Dhabi, United Arab Emirates, Association for Computational Linguistics.
Elisa Sanchez-Bayona, Rodrigo Agerri
From Automatic Metaphor Processing in Spanish to a Multilingual Perspective: Annotation, Systems, and Evaluation (2022)
Doctoral Symposium on Natural Language Processing from the PLN.net network 2022 (RED2018-102418-T), 21-23 September 2022, A Coruña, Spain.
Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri and Aitor Soroa
BasqueGLUE: A Natural Language Understanding Benchmark for Basque (2022)
LREC 2022
Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar M. Cumbicus-Pineda, Aitor Soroa
IrekiaLF_es: a new open benchmark and baseline systems for Spanish Automatic Text Simplification (2022)
Gonzalez-Dios, I., Gutiérrez-Fandiño, I., Cumbicus-Pineda, O. M., & Soroa, A. (2022, December). IrekiaLFes: a new open benchmark and baseline systems for Spanish automatic text simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) (pp. 86-97).
Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre
PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation (2022)
Findings of the Association for Computational Linguistics: EMNLP 2022
Cecilia Domingo, Tatiana Gonzalez-Ferrero, Itziar Gonzalez-Dios
What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus (2021)
Domingo, C., Gonzalez-Ferrero, T., & Gonzalez-Dios, I. (2021, January). What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus. In Proceedings of the 11th Global Wordnet Conference (pp. 234-242).
Itziar Gonzalez-Dios, Uxoa Iñurrieta, Igone Zabala
General and Specialised Corpora to Raise Linguistic Awareness in a Language Undergoing the Normalisation Process: Academic Writing in Basque (2021)
Gonzalez Dios, I.; Iñurrieta, U.; Zabala, I. General and specialised corpora to raise linguistic awareness in a language undergoing the normalisation process: academic writing in Basque. A: AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). "Multilingual academic and professional communication in a networked world. Proceedings of AELFE-TAPP 2021 (19th AELFE Conference, 2nd TAPP Conference). Vilanova i la Geltrú (Barcelona), 7-9 July 2021". Vilanova i la Geltrú: Universitat Politècnica de Catalunya, 2021, ISBN 978-84-9880-943-5.
Igone Zabala
Euskararen lantze funtzionala esparru akademiko eta profesionaletan (2021)
In Grenoble, Lenore / Lane, Pia / Røyneland, Unn (eds.) Ivan Igartua & Lourdes Oñederra (Basqeu eds.) Linguistic Minorities in Europe Online. A Born-Digital, Multimodal, Peer-Reviewed Online Reference Resource The Gruyter Mouton
Igone Zabala
Euskaltzaindiaren Hiztegiaren ekarpena lexiko espezializatuaren eta ez-espezializatuaren harmonizazioan (2021)
In Andres Urrutia (ed.) Arantzazutik mundu zabalera. Euskararen normatibizazioa: 1968-2018. IKER 40. Euskaltzaindia-Iberoamericana Vervuert: 285-299
Igone Zabala, Izaskun Aldezabal, Maria Jesus Aranzabe
Academic Research Works and Domain Dinamics: Resources and Tools for Basque Academic Writing (2021)
18th International Conference on Minority Languages (Bilbao, 2021/03/24-26)
Jon Alkorta
Hacia el análisis de sentimientos en euskera (2021)
J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.
Jon Alkorta, Koldo Gojenola, Mikel Iruskieta
Ezeztapena identifikatzeko Murriztapen Gramatikako erregelak sentimenduen analisiaren testuinguruan (2021)
Alkorta, J., Gojenola, K. eta Iruskieta, M. 2021. Ezeztapena identifikatzeko Murriztapen Gramatikako erregelaksentimenduen analisiaren testuinguruan. IV. IKERGAZTE NAZIOARTEKO IKERKETA EUSKARAZ Kongresuko artikulu-bilduma, Editoreak: Olatz Arbelaitz, Ainhoa Latatu, Miren Josu Omaetxebarria, Blanca Urgell. Bilbo: UEU, 169-176 orr.
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Language and Technology in Wales: Volume I (2021)
Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)
Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6
Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau
Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets (2021)
Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau (2021) Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets. arXiv:2107.00333
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze
The First Annotated Corpus of Historical Basque (2021)
Digital Scholarship in the Humanities, vol. 37(2), pp. 391-404
Igone Zabala, María Jesús Aranzabe, Izaskun Aldezabal
Retos actuales del desarrollo y aprendizaje de los registros académicos orales y escritos del euskera (2021)
Círculo de Lingüística Aplicada a la Comunicación 88, pp. 31-50
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli
The E3C Project: European Clinical Case Corpus (2021)
Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf
Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza
How the corpus-based Basque Verb Index lexicon was built (2020)
Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands
Piroska Lendvai , Sándor Darányi, Christian Geng, Moniek Kuijpers, Oier Lopez de Lacalle , Jean-Christophe Mensonides, Simone Rebora and Uwe Reichel
Detection of Reading Absorption in User-Generated Book Reviews: Resources Creation and Evaluation (2020)
Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France
Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque (2020)
Proceedings of The 12th Language Resources and Evaluation Conference, pp. 429–435. European Language Resources Association. ISBN: 979-10-95546-34-4
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Towards Word Sense Disambiguation by Reasoning (2020)
Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340
Uxoa Iñurrieta
Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)
Procesamiento del Lenguaje Natural, 64, pp. 123-126.
Kepa Bengoetxea, Itziar Gonzalez-Dios, Amaia Aguirregoitia
AzterTest: Open source linguistic and stylistic analysis tool (2020)
Procesamiento del Lenguaje Natural, 64, 61-68. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6196
Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre
Give your Text Representation Models some Love: the Case for Basque (2020)
Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf
Itziar Gonzalez-Dios, Javier Álvez, German Rigau
Towards modeling SUMO attributes through WordNet adjectives: a Case Study on Qualities. (2020)
Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 1–6. ISBN: 979-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf
Jon Alkorta, Itziar Gonzalez-Dios
Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon (2020)
Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 20–24. ISBN: 79-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf
Thierry Declerck, Itziar Gonzalez-Dios, German Rigau (editors)
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMWN-2020) (2020)
European Language Resources Association (ELRA), Paris. https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf ISBN: 979-10-95546-41-2 EAN: 9791095546412
Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza
EusTimeML: A mark-up language for temporal information in Basque (2020)
Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06
Begoña Altuna
Análisis de estructuras temporales en euskera y creación de un corpus (2020)
Procesamiento del Lenguaje Natural, Revista no 64, marzo de 2020, pp. 131-134 URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6206 ISSN: 1989-7553
Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)
Language Resources and Evaluation Conference (LREC 2020)
Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)
Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze
Sintaktikoki etiketatutako euskarazko corpus historikoa eraikitzen (2020)
Fontes Linguae Vasconum 50 urte. Ekarpen berriak euskararen ikerketari. Nuevas aportaciones al estudio de la lengua vasca
Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze
Dealing with dialectal variation in the construction of the Basque historical corpus (2020)
Proceedings of the 7th Workshop on NLP for similar languages, varieties and dialects (VarDial2020 at COLING 2020).
Gorka Urbizu, Ander Soraluze, Olatz Arregi
Sequence to Sequence Coreference Resolution (2020)
Proceedings of the 3rd Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2020), pages 39–46,Barcelona, Spain (online), December 12, 2020.
Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
DoQA - Accessing Domain-Specific FAQs via Conversational QA (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314
Itziar Gonzalez-Dios
Data statement of the Corpus of Basque Simplified Texts (2020)file2 (2020)file3 (2020)
Data Statements workshop
Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli
The E3C Project:Collection and Annotation of a Multilingual Corpus of Clinical Cases (2020)
In Johanna Monti, Felice Dell'Orletta and Fabio Tamburini (eds.), Proceedings of the Seventh Italian Conference on Computational Linguistics. Associazione Italiana di Linguistica Computazionale. Bologna, Italy, 2020.
Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille
LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)
Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71
Kepa Sarasola, Itziar Aldabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Aritz Farwell, Inma Hernaez, Eva Navas; Reviewers: Annika Grützner-Zahn, Maria Giagkou; Editors: Maria Giagkou, Stelios Piperidis, Georg Rehm, Jane Dunne
Report on the Basque Language. European Language Equality (2020)
Deliverables of the Project ELE (European Language Equality). D1.4 Report on the Basque Language, https://european-language-equality.eu/deliverables/
Jon Alkorta, Koldo Gojenola, Mikel Iruskieta
SentiTegi: building a semantic oriented Basque lexicon (2019)
Computación y Sistemas, 22 (4)
Igone Zabala
The elaboration of Basque in academic and professional domains. (2019)
In Grenoble, Lenore; Lane, Pia & Røyneland, Unn Unn Røyneland (ed.) Linguistic Minorities in Europe Online. The Gruyter Mouton. ISSN 2510-5361
Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta
Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)
PLoS ONE 14(9): e0221639
Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza
EUSKOR: End-to-end coreference resolution system for Basque (2019)
PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801
Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano
Spelling Normalisation of Basque Historical Texts (2019)
Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66
Javier Álvez, Itziar Gonzalez-Dios, German Rigau
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)
Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3
ItziarGonzalez-Dios, German Rigau
Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)
Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3
Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
Conversational QA for FAQs (2019)
NeurIPS 3rd Conversational AI Workshop: “Today's Practice and Tomorrow's Potential”
Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga
Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic? (2019)
Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga (2019) 'Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic?' Wikimedia+Education Conference, Donostia 2019
Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza
Adapting TimeML to Basque: Event Annotation (2018)
In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43 ; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1
Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Konbitzul: an MWE-specific Database for Spanish-Basque (2018)
Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.
Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria
Verbal Multiword Expressions in Basque corpora (2018)
In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)
Igone Zabala
Euskararen lantze funtzionala eta profesionalen komunikazio-gaitasunen garapena osasun-alorrean (2018)
BAT Soziolinguistika Aldizkaria 108, 2018 (3): 11-34
Igone Zabala
Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)
In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.
Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga
Coreferential Relations in Basque: The Annotation Process (2018)
J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.
Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar
Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)file2 (2018)
Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf
Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Arantza Díaz de Ilarraza, Itziar Gonzalez-Dios, Ruben Urizar
Building the Gold Standard for the Surface Syntax of Basque (2017)
Procesamiento del Lenguaje Natural, 58, 125-132. Consultado en http://ixa.si.ehu.es/sites/default/files/dokumentuak/8825/5421-4766-1-PB.pdf (ISSN edición impresa: 1135-5948) (ISSN edición electrónica: 1989-7553)
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Zabala I., San Martin I., Lersundi M.
Learning terminology in order to become an active agent in the development of Basque biomedical registers (2016)
Language Learning in Higher Education. Journal of CercleS (European Confederation of Language Centres in Higher Education). De Gruyter Mouton. Volume 6, Issue 1 (May 2016). Special issue: Teaching Medical Discourse in Higher Education. ISSN (Online) 2191-6128, ISSN (Print) 2191-611X, DOI: 10.1515/cercles-2016-0007 URL: http://www.degruyter.com/view/j/cercles.2016.6.issue-1/cercles-2016-0007/cercles-2016-0007.xml
Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1