Machine Translation
We started researching Machine Translation in 2000 and have followed the paradigms being developed in the area: first RBMT, then SMT and currently NMT. We have focused mainly on translation from and into Basque since, in addition to its commercial interest in our country, it is an important challenge for several reasons: the complexity of Basque morphology, the free order of sentence constituents, and the scarcity of resources. The results have been very good and the tools developed are being...Read More
MT_tabs
Demos
Elia
Neural Machine translation for Basque, Spanish and English (developed in collaboration with Elhuyar)
NMT itzultzailea (2018)
Spanish Basque Neural Machine translation (developed in the TADEEP project)
Matxin
First machine translation system for Basque (out-dated)
Contracts
Testu klinikoak euskaratik eta euskarara egokitzeko itzultzaile automatiko baten garapena eta ezartzea
(2019 - 2021)- MultiNMT: Traducción automática neuronal mulltidireccional orientada al cliente.
(2019 - 2019)
Projects
(2023 - 2026)
LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
TRAIN (PID2021-123988OB-C31) project funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF A way of making Europe”
(2022 - 2025)
(2023 - 2025)
SignON - Sign Language Translation Mobile Application and Open Communications Framework
(2021 - 2023)
DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
(2019 - 2021)
MT4All: Unsupervised MT for low-resourced language pairs
(2020 - 2021)
Building Neuronal Machine Translation methods and systems to improve coherence at paragraph and document level
(2020 - 2021)
LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies.
(2018 - 2020)- UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
(2018 - 2020)
MODENA: Advanced neural modeling for high-quality translation.
(2018 - 2019)
TADEEP: Deep Machine Translation
(2016 - 2018)
MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
(2016 - 2017)
QTLeap: Quality Translation by Deep Language Engineering Approaches
(2013 - 2016) All HiTZ projects
Patents
Publications
Nora Aranberri, Uxoa Iñurrieta
When minoritized languages encounter MT: perceptions and expectations of the Basque community (2024)
Aranberri, N., & Iñurrieta, U. (2024). When minoritized languages encounter MT: perceptions and expectations of the Basque community. The Journal of Specialised Translation, (41), 179-205. Available at: https://www.jostrans.org/article/view/4718/4237
Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lacalle, Mikel Artetxe
Do Multilingual Language Models Think Better in English? (2024)
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 550–564, Mexico City, Mexico. Association for Computational Linguistics.
Nora Aranberri
Analysis of the Annotations from a Crowd MT Evaluation Initiative: Case Study for the Spanish-Basque Pair (2024)
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 548–559 June 24-27.
Júlia Falcão, Claudia Borg, Nora Aranberri, and Kurt Abela
COMET for Low-Resource Machine Translation Evaluation: A Case Study of English-Maltese and Spanish-Basque (2024)
In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3553–3565, Torino, Italia. ELRA and ICCL.
Adrián Núñez-Marcos, Olatz Perez-de-Viñaspre, Gorka Labaka
A survey on Sign Language machine translation (2023)
Expert Systems with Applications, Volume 213, part B. URL: https://doi.org/10.1016/j.eswa.2022.118993 ISSN: 0957-4174
Celia Soler Uguet, Nora Aranberri
Exploring politeness control in NMT: fine-tuned vs. multi-register models in Castilian Spanish (2023)
Revista Procesamiento del Lenguaje Natural, 70, pp. 199-212.
Kepa Sarasola, Itziar Aldabe, Nora Aranberri
Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)
Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka
Targeted Data Augmentation Improves Context-aware Neural Machine (2023)
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. Targeted Data Augmentation Improves Context-aware Neural Machine Translation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 298–312, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka
What Works When in Context-aware Neural Machine Translation? (2023)
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. What Works When in Context-aware Neural Machine Translation?. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 147–156, Tampere, Finland. European Association for Machine Translation.
Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion
SignON: Sign Language Translation. Progress and challenges (2023)
Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion. 2023. SignON: Sign Language Translation. Progress and challenges. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 501–502, Tampere, Finland. European Association for Machine Translation.
Xabier Soto, Olatz Pérez-de-Viñaspre, Maite Oronoz, Gorka Labaka
Development of a Machine Translation system for promoting the use of a low resource language in the clinical domain: the case of Basque. (2022)
Chapter 7 In Natural Language Processing In Healthcare A Special Focus on Low Resource Languages. Routledge, Taylor & Francis Group.
Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Principled Paraphrase Generation with Parallel Corpora (2022)
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1621-1638
Xabier Soto, Olatz Perez-De-Viñaspre, Gorka Labaka, Maite Oronoz
Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario (2022)
In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 31–40, Ghent, Belgium. European Association for Machine Translation.
Andoni Azpeitia
Datuen Ustiapena Itzulpen Automatikorako (2022)
-
Ander Salaberria, Jon Ander Campos, Iker García, Joseba Fernandez de Landa
Itzulpen Automatikoko Sistemen Analisia: Genero Alborapenaren Kasua (2021)
IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura
Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring (2021)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6479–6489
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Language and Technology in Wales: Volume I (2021)
Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)
Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6
Cristina Cumbreño, Nora Aranberri
What Do You Say? Comparison of Metrics for Post-editing Effort (2021)
In: Carl M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. pp 57-79.
Horacio Saggion, Dimitar Shterionov, Gorka Labaka, Tim Van de Cruys, Vincent Vandeghinste, Josep Blat
SignON: Bidging the gap between and Sign and Oral Languages (2021)
Procesamiento del Lenguaje Natural, Revista no 67, septiembre de 2021
Lana Yeganova, Dina Wiemann, Mariana Neves, Federica Vezzani, Amy Siu, Inigo Jauregi Unanue, Maite Oronoz, Nancy Mah, Aurélie Névéol, David Martinez, Rachel Bawden, Giorgio Maria Di Nunzio, Roland Roller, Philippe Thomas, Cristian Grozea, Olatz Perez-de-Viñaspre, Maika Vicente Navarro, and Antonio Jimeno Yepes
Findings of the WMT 2021 Biomedical Translation Shared Task: Summaries of Animal Experiments as New Test Set (2021)
In Proceedings of the Sixth Conference on Machine Translation, pages 664–683, Online. Association for Computational Linguistics.
Uxoa Iñurrieta
Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)
Procesamiento del Lenguaje Natural, 64, pp. 123-126.
Nora Aranberri
Can translationese features help users select an MT system for post-editing? (2020)
Revista Procesamiento del Lenguaje Natural, 64, 93-100.
Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way
Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.
Mikel Artetxe, Sebastian Ruder, Dani Yogatama
On the cross-lingual transferability of monolingual representations (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Nora Aranberri
With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.
Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262
Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)
Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767
Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)
Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Translation Artifacts in Cross-lingual Transfer Learning (2020)
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).
Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)
Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.
Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)
Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task
Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille
LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)
Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71
Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria
Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)
Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40
Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre
Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.
Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz
Neural Machine Translation of clinical texts between long distance languages (2019)
JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110
Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka
Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)
Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation
Mikel Artetxe, Holger Schwenk
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
An Effective Approach to Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.
Mikel Artetxe, Holger Schwenk
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)
Transactions of the Association for Computational Linguistics 7 (2019): 597-610.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)
Procesamiento del Lenguaje Natural 63 (2019): 151-154.
Ona de Gibert, Nora Aranberri
Estrategia multidimensional para la selección de candidatos de traducción automática para posedición (2019)
Linguamática, 11(2), 3-16.
Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)
Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.
Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge
Neural Machine Translation of Basque (2018)
EAMT 2018. Alicante.
Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin
QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)
Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Nora Aranberri, Gorka Labaka
Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)
Senez, 48 (2017)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.
Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe
The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning (2013)
Translation: Computation, Corpora, Cognition (TC3) journal. Vol 3, No 1, pp: 119-135 (2013). Special Issue on Language Technologies for a Multilingual Europe, ISSN: 2193-6986, https://www.researchgate.net/publication/250927257_The_BerbaTek_project_for_Basque_Promoting_a_less-resourced_language_via_language_technology_for_translation_content_management_and_learning
Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe
BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)
Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf
Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola
Matxin, an open-source rule-based machine translation system for Basque. (2011)
Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf
Pérez, Alicia and Torres, M Inés and Casascuberta, Francisco
Potential scope of a fully-integrated architecture for speech translation (2010)
Proceedings of the 14th Annual Meeting of the European Association for Machine Translation.
Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola
Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)
MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304
MT_tabs_full
Elia
Neural Machine translation for Basque, Spanish and English (developed in collaboration with Elhuyar)
NMT itzultzailea (2018)
Spanish Basque Neural Machine translation (developed in the TADEEP project)
Matxin
First machine translation system for Basque (out-dated)
Testu klinikoak euskaratik eta euskarara egokitzeko itzultzaile automatiko baten garapena eta ezartzea
(2019 - 2021)- MultiNMT: Traducción automática neuronal mulltidireccional orientada al cliente.
(2019 - 2019)
(2023 - 2026)
LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
TRAIN (PID2021-123988OB-C31) project funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF A way of making Europe”
(2022 - 2025)
(2023 - 2025)
SignON - Sign Language Translation Mobile Application and Open Communications Framework
(2021 - 2023)
DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
(2019 - 2021)
MT4All: Unsupervised MT for low-resourced language pairs
(2020 - 2021)
Building Neuronal Machine Translation methods and systems to improve coherence at paragraph and document level
(2020 - 2021)
LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies.
(2018 - 2020)- UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
(2018 - 2020)
MODENA: Advanced neural modeling for high-quality translation.
(2018 - 2019)
TADEEP: Deep Machine Translation
(2016 - 2018)
MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
(2016 - 2017)
QTLeap: Quality Translation by Deep Language Engineering Approaches
(2013 - 2016) All HiTZ projects
Nora Aranberri, Uxoa Iñurrieta
When minoritized languages encounter MT: perceptions and expectations of the Basque community (2024)
Aranberri, N., & Iñurrieta, U. (2024). When minoritized languages encounter MT: perceptions and expectations of the Basque community. The Journal of Specialised Translation, (41), 179-205. Available at: https://www.jostrans.org/article/view/4718/4237
Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lacalle, Mikel Artetxe
Do Multilingual Language Models Think Better in English? (2024)
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 550–564, Mexico City, Mexico. Association for Computational Linguistics.
Nora Aranberri
Analysis of the Annotations from a Crowd MT Evaluation Initiative: Case Study for the Spanish-Basque Pair (2024)
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 548–559 June 24-27.
Júlia Falcão, Claudia Borg, Nora Aranberri, and Kurt Abela
COMET for Low-Resource Machine Translation Evaluation: A Case Study of English-Maltese and Spanish-Basque (2024)
In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3553–3565, Torino, Italia. ELRA and ICCL.
Adrián Núñez-Marcos, Olatz Perez-de-Viñaspre, Gorka Labaka
A survey on Sign Language machine translation (2023)
Expert Systems with Applications, Volume 213, part B. URL: https://doi.org/10.1016/j.eswa.2022.118993 ISSN: 0957-4174
Celia Soler Uguet, Nora Aranberri
Exploring politeness control in NMT: fine-tuned vs. multi-register models in Castilian Spanish (2023)
Revista Procesamiento del Lenguaje Natural, 70, pp. 199-212.
Kepa Sarasola, Itziar Aldabe, Nora Aranberri
Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)
Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka
Targeted Data Augmentation Improves Context-aware Neural Machine (2023)
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. Targeted Data Augmentation Improves Context-aware Neural Machine Translation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 298–312, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka
What Works When in Context-aware Neural Machine Translation? (2023)
Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. What Works When in Context-aware Neural Machine Translation?. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 147–156, Tampere, Finland. European Association for Machine Translation.
Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion
SignON: Sign Language Translation. Progress and challenges (2023)
Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion. 2023. SignON: Sign Language Translation. Progress and challenges. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 501–502, Tampere, Finland. European Association for Machine Translation.
Xabier Soto, Olatz Pérez-de-Viñaspre, Maite Oronoz, Gorka Labaka
Development of a Machine Translation system for promoting the use of a low resource language in the clinical domain: the case of Basque. (2022)
Chapter 7 In Natural Language Processing In Healthcare A Special Focus on Low Resource Languages. Routledge, Taylor & Francis Group.
Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Principled Paraphrase Generation with Parallel Corpora (2022)
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1621-1638
Xabier Soto, Olatz Perez-De-Viñaspre, Gorka Labaka, Maite Oronoz
Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario (2022)
In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 31–40, Ghent, Belgium. European Association for Machine Translation.
Andoni Azpeitia
Datuen Ustiapena Itzulpen Automatikorako (2022)
-
Ander Salaberria, Jon Ander Campos, Iker García, Joseba Fernandez de Landa
Itzulpen Automatikoko Sistemen Analisia: Genero Alborapenaren Kasua (2021)
IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura
Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring (2021)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6479–6489
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Language and Technology in Wales: Volume I (2021)
Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3
Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.
Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)
Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6
Cristina Cumbreño, Nora Aranberri
What Do You Say? Comparison of Metrics for Post-editing Effort (2021)
In: Carl M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. pp 57-79.
Horacio Saggion, Dimitar Shterionov, Gorka Labaka, Tim Van de Cruys, Vincent Vandeghinste, Josep Blat
SignON: Bidging the gap between and Sign and Oral Languages (2021)
Procesamiento del Lenguaje Natural, Revista no 67, septiembre de 2021
Lana Yeganova, Dina Wiemann, Mariana Neves, Federica Vezzani, Amy Siu, Inigo Jauregi Unanue, Maite Oronoz, Nancy Mah, Aurélie Névéol, David Martinez, Rachel Bawden, Giorgio Maria Di Nunzio, Roland Roller, Philippe Thomas, Cristian Grozea, Olatz Perez-de-Viñaspre, Maika Vicente Navarro, and Antonio Jimeno Yepes
Findings of the WMT 2021 Biomedical Translation Shared Task: Summaries of Animal Experiments as New Test Set (2021)
In Proceedings of the Sixth Conference on Machine Translation, pages 664–683, Online. Association for Computational Linguistics.
Uxoa Iñurrieta
Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)
Procesamiento del Lenguaje Natural, 64, pp. 123-126.
Nora Aranberri
Can translationese features help users select an MT system for post-editing? (2020)
Revista Procesamiento del Lenguaje Natural, 64, 93-100.
Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way
Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.
Mikel Artetxe, Sebastian Ruder, Dani Yogatama
On the cross-lingual transferability of monolingual representations (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Nora Aranberri
With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.
Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262
Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)
Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767
Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)
Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Translation Artifacts in Cross-lingual Transfer Learning (2020)
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).
Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)
Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.
Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)
Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task
Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille
LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)
Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71
Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria
Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)
Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40
Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre
Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.
Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz
Neural Machine Translation of clinical texts between long distance languages (2019)
JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110
Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka
Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)
Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation
Mikel Artetxe, Holger Schwenk
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
An Effective Approach to Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.
Mikel Artetxe, Holger Schwenk
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)
Transactions of the Association for Computational Linguistics 7 (2019): 597-610.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)
Procesamiento del Lenguaje Natural 63 (2019): 151-154.
Ona de Gibert, Nora Aranberri
Estrategia multidimensional para la selección de candidatos de traducción automática para posedición (2019)
Linguamática, 11(2), 3-16.
Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)
Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.
Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge
Neural Machine Translation of Basque (2018)
EAMT 2018. Alicante.
Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin
QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)
Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Nora Aranberri, Gorka Labaka
Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)
Senez, 48 (2017)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.
Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe
The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning (2013)
Translation: Computation, Corpora, Cognition (TC3) journal. Vol 3, No 1, pp: 119-135 (2013). Special Issue on Language Technologies for a Multilingual Europe, ISSN: 2193-6986, https://www.researchgate.net/publication/250927257_The_BerbaTek_project_for_Basque_Promoting_a_less-resourced_language_via_language_technology_for_translation_content_management_and_learning
Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe
BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)
Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf
Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola
Matxin, an open-source rule-based machine translation system for Basque. (2011)
Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf
Pérez, Alicia and Torres, M Inés and Casascuberta, Francisco
Potential scope of a fully-integrated architecture for speech translation (2010)
Proceedings of the 14th Annual Meeting of the European Association for Machine Translation.
Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola
Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)
MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304