Machine Translation

We started researching Machine Translation in 2000 and have followed the paradigms being developed in the area: first RBMT, then SMT and currently NMT. We have focused mainly on translation from and into Basque since, in addition to its commercial interest in our country, it is an important challenge for several reasons: the complexity of Basque morphology, the free order of sentence constituents, and the scarcity of resources. The results have been very good and the tools developed are being...Read More

see more

MT_tabs

Demos

Elia

Neural Machine translation for Basque, Spanish and English (developed in collaboration with Elhuyar)

NMT itzultzailea (2018)

Spanish Basque Neural Machine translation (developed in the TADEEP project)

Matxin

First machine translation system for Basque (out-dated)

Contracts

All HiTZ projects.

Projects



  • (2023 - 2026)

  • LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
    (2024 - 2026)

  • The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
    (2024 - 2026)

  • TRAIN (PID2021-123988OB-C31) project funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF A way of making Europe”
    (2022 - 2025)


  • (2023 - 2025)
  • CLARIAH-EUS-gArA

    (2024 - 2025)

  • SignON - Sign Language Translation Mobile Application and Open Communications Framework
    (2021 - 2023)

  • DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
    (2019 - 2021)

  • MT4All: Unsupervised MT for low-resourced language pairs
    (2020 - 2021)

  • Building Neuronal Machine Translation methods and systems to improve coherence at paragraph and document level
    (2020 - 2021)

  • LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies.
    (2018 - 2020)
  • UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
    UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
    (2018 - 2020)

  • MODENA: Advanced neural modeling for high-quality translation.
    (2018 - 2019)

  • TADEEP: Deep Machine Translation
    (2016 - 2018)

  • MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
    (2016 - 2017)

  • QTLeap: Quality Translation by Deep Language Engineering Approaches
    (2013 - 2016)
  • All HiTZ projects

Patents

Matxin

Machine translation from Spanish to Basque.

EUSMT

Statistical Machine Translation from Spanish

TADEEP:

Sistema traducción automática neuronal para español -inglés y español-euskera

Publications

Nora Aranberri, Uxoa Iñurrieta

When minoritized languages encounter MT: perceptions and expectations of the Basque community (2024)

Aranberri, N., & Iñurrieta, U. (2024). When minoritized languages encounter MT: perceptions and expectations of the Basque community. The Journal of Specialised Translation, (41), 179-205. Available at: https://www.jostrans.org/article/view/4718/4237

Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lacalle, Mikel Artetxe

Do Multilingual Language Models Think Better in English? (2024)

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 550–564, Mexico City, Mexico. Association for Computational Linguistics.

Nora Aranberri

Analysis of the Annotations from a Crowd MT Evaluation Initiative: Case Study for the Spanish-Basque Pair (2024)

Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 548–559 June 24-27.

Júlia Falcão, Claudia Borg, Nora Aranberri, and Kurt Abela

COMET for Low-Resource Machine Translation Evaluation: A Case Study of English-Maltese and Spanish-Basque (2024)

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3553–3565, Torino, Italia. ELRA and ICCL.

Adrián Núñez-Marcos, Olatz Perez-de-Viñaspre, Gorka Labaka

A survey on Sign Language machine translation (2023)

Expert Systems with Applications, Volume 213, part B. URL: https://doi.org/10.1016/j.eswa.2022.118993 ISSN: 0957-4174

Celia Soler Uguet, Nora Aranberri

Exploring politeness control in NMT: fine-tuned vs. multi-register models in Castilian Spanish (2023)

Revista Procesamiento del Lenguaje Natural, 70, pp. 199-212.

Kepa Sarasola, Itziar Aldabe, Nora Aranberri

Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)

Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka

Targeted Data Augmentation Improves Context-aware Neural Machine (2023)

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. Targeted Data Augmentation Improves Context-aware Neural Machine Translation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 298–312, Macau SAR, China. Asia-Pacific Association for Machine Translation.

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka

What Works When in Context-aware Neural Machine Translation? (2023)

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. What Works When in Context-aware Neural Machine Translation?. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 147–156, Tampere, Finland. European Association for Machine Translation.

Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion

SignON: Sign Language Translation. Progress and challenges (2023)

Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion. 2023. SignON: Sign Language Translation. Progress and challenges. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 501–502, Tampere, Finland. European Association for Machine Translation.

Xabier Soto, Olatz Pérez-de-Viñaspre, Maite Oronoz, Gorka Labaka

Development of a Machine Translation system for promoting the use of a low resource language in the clinical domain: the case of Basque. (2022)

Chapter 7 In Natural Language Processing In Healthcare A Special Focus on Low Resource Languages. Routledge, Taylor & Francis Group.

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Principled Paraphrase Generation with Parallel Corpora (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1621-1638

Xabier Soto, Olatz Perez-De-Viñaspre, Gorka Labaka, Maite Oronoz

Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario (2022)

In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 31–40, Ghent, Belgium. European Association for Machine Translation.

Andoni Azpeitia

Datuen Ustiapena Itzulpen Automatikorako (2022)

-

Ander Salaberria, Jon Ander Campos, Iker García, Joseba Fernandez de Landa

Itzulpen Automatikoko Sistemen Analisia: Genero Alborapenaren Kasua (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring (2021)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6479–6489

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Cristina Cumbreño, Nora Aranberri

What Do You Say? Comparison of Metrics for Post-editing Effort (2021)

In: Carl M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. pp 57-79.

Horacio Saggion, Dimitar Shterionov, Gorka Labaka, Tim Van de Cruys, Vincent Vandeghinste, Josep Blat

SignON: Bidging the gap between and Sign and Oral Languages (2021)

Procesamiento del Lenguaje Natural, Revista no 67, septiembre de 2021

Lana Yeganova, Dina Wiemann, Mariana Neves, Federica Vezzani, Amy Siu, Inigo Jauregi Unanue, Maite Oronoz, Nancy Mah, Aurélie Névéol, David Martinez, Rachel Bawden, Giorgio Maria Di Nunzio, Roland Roller, Philippe Thomas, Cristian Grozea, Olatz Perez-de-Viñaspre, Maika Vicente Navarro, and Antonio Jimeno Yepes

Findings of the WMT 2021 Biomedical Translation Shared Task: Summaries of Animal Experiments as New Test Set (2021)

In Proceedings of the Sixth Conference on Machine Translation, pages 664–683, Online. Association for Computational Linguistics.

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Nora Aranberri

Can translationese features help users select an MT system for post-editing? (2020)

Revista Procesamiento del Lenguaje Natural, 64, 93-100.

Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way

Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

On the cross-lingual transferability of monolingual representations (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Nora Aranberri

With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.

Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262

Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre

Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)

Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Translation Artifacts in Cross-lingual Transfer Learning (2020)

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).

Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)

Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.

Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova

Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)

Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task

Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille

LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)

Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71

Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria

Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)

Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz

Neural Machine Translation of clinical texts between long distance languages (2019)

JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110

Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)

Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation

Mikel Artetxe, Holger Schwenk

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

An Effective Approach to Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.

Mikel Artetxe, Holger Schwenk

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)

Transactions of the Association for Computational Linguistics 7 (2019): 597-610.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)

Procesamiento del Lenguaje Natural 63 (2019): 151-154.

Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)

Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.

Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge

Neural Machine Translation of Basque (2018)

EAMT 2018. Alicante.

Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin

QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)

Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Nora Aranberri, Gorka Labaka

Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)

Senez, 48 (2017)

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning (2013)

Translation: Computation, Corpora, Cognition (TC3) journal. Vol 3, No 1, pp: 119-135 (2013). Special Issue on Language Technologies for a Multilingual Europe, ISSN: 2193-6986, https://www.researchgate.net/publication/250927257_The_BerbaTek_project_for_Basque_Promoting_a_less-resourced_language_via_language_technology_for_translation_content_management_and_learning

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)

Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf

Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola

Matxin, an open-source rule-based machine translation system for Basque. (2011)

Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf

Pérez, Alicia and Torres, M Inés and Casascuberta, Francisco

Potential scope of a fully-integrated architecture for speech translation (2010)

Proceedings of the 14th Annual Meeting of the European Association for Machine Translation.

Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola

Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)

MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304

All HiTZ publications

MT_tabs_full

Elia

Neural Machine translation for Basque, Spanish and English (developed in collaboration with Elhuyar)

NMT itzultzailea (2018)

Spanish Basque Neural Machine translation (developed in the TADEEP project)

Matxin

First machine translation system for Basque (out-dated)

All HiTZ projects.


  • (2023 - 2026)

  • LINGUATEC IA, adimen artifizialaren bidez aragoiera, euskara, katalana eta okzitaniera digitalizatzen aurrera egiteko proiektua
    (2024 - 2026)

  • The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
    (2024 - 2026)

  • TRAIN (PID2021-123988OB-C31) project funded by MCIN/AEI/ 10.13039/501100011033 and by “ERDF A way of making Europe”
    (2022 - 2025)


  • (2023 - 2025)
  • CLARIAH-EUS-gArA

    (2024 - 2025)

  • SignON - Sign Language Translation Mobile Application and Open Communications Framework
    (2021 - 2023)

  • DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
    (2019 - 2021)

  • MT4All: Unsupervised MT for low-resourced language pairs
    (2020 - 2021)

  • Building Neuronal Machine Translation methods and systems to improve coherence at paragraph and document level
    (2020 - 2021)

  • LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies.
    (2018 - 2020)
  • UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
    UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
    (2018 - 2020)

  • MODENA: Advanced neural modeling for high-quality translation.
    (2018 - 2019)

  • TADEEP: Deep Machine Translation
    (2016 - 2018)

  • MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
    (2016 - 2017)

  • QTLeap: Quality Translation by Deep Language Engineering Approaches
    (2013 - 2016)
  • All HiTZ projects

Matxin

Machine translation from Spanish to Basque.

EUSMT

Statistical Machine Translation from Spanish

TADEEP:

Sistema traducción automática neuronal para español -inglés y español-euskera

Nora Aranberri, Uxoa Iñurrieta

When minoritized languages encounter MT: perceptions and expectations of the Basque community (2024)

Aranberri, N., & Iñurrieta, U. (2024). When minoritized languages encounter MT: perceptions and expectations of the Basque community. The Journal of Specialised Translation, (41), 179-205. Available at: https://www.jostrans.org/article/view/4718/4237

Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lacalle, Mikel Artetxe

Do Multilingual Language Models Think Better in English? (2024)

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 550–564, Mexico City, Mexico. Association for Computational Linguistics.

Nora Aranberri

Analysis of the Annotations from a Crowd MT Evaluation Initiative: Case Study for the Spanish-Basque Pair (2024)

Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 548–559 June 24-27.

Júlia Falcão, Claudia Borg, Nora Aranberri, and Kurt Abela

COMET for Low-Resource Machine Translation Evaluation: A Case Study of English-Maltese and Spanish-Basque (2024)

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3553–3565, Torino, Italia. ELRA and ICCL.

Adrián Núñez-Marcos, Olatz Perez-de-Viñaspre, Gorka Labaka

A survey on Sign Language machine translation (2023)

Expert Systems with Applications, Volume 213, part B. URL: https://doi.org/10.1016/j.eswa.2022.118993 ISSN: 0957-4174

Celia Soler Uguet, Nora Aranberri

Exploring politeness control in NMT: fine-tuned vs. multi-register models in Castilian Spanish (2023)

Revista Procesamiento del Lenguaje Natural, 70, pp. 199-212.

Kepa Sarasola, Itziar Aldabe, Nora Aranberri

Enabling additional official languages in the EU for 2025 with language-centred Artificial Intelligence (2023)

Special issue of 'De Europa' journal "Llinguistic rights, multilingualism and language varieties in Europe in the age of artificial intelligence" pp.93-107. Turin, 2023.

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka

Targeted Data Augmentation Improves Context-aware Neural Machine (2023)

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. Targeted Data Augmentation Improves Context-aware Neural Machine Translation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 298–312, Macau SAR, China. Asia-Pacific Association for Machine Translation.

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka

What Works When in Context-aware Neural Machine Translation? (2023)

Harritxu Gete, Thierry Etchegoyhen, and Gorka Labaka. 2023. What Works When in Context-aware Neural Machine Translation?. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 147–156, Tampere, Finland. European Association for Machine Translation.

Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion

SignON: Sign Language Translation. Progress and challenges (2023)

Vincent Vandeghinste, Dimitar Shterionov, Mirella De Sisto, Aoife Brady, Mathieu De Coster, Lorraine Leeson, Josep Blat, Frankie Picron, Marcello Paolo Scipioni, Aditya Parikh, Louis ten Bosch, John O’Flaherty, Joni Dambre, Jorn Rijckaert, Bram Vanroy, Victor Ubieto Nogales, Santiago Egea Gomez, Ineke Schuurman, Gorka Labaka, Adrián Núnez-Marcos, Irene Murtagh, Euan McGill, Horacio Saggion. 2023. SignON: Sign Language Translation. Progress and challenges. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 501–502, Tampere, Finland. European Association for Machine Translation.

Xabier Soto, Olatz Pérez-de-Viñaspre, Maite Oronoz, Gorka Labaka

Development of a Machine Translation system for promoting the use of a low resource language in the clinical domain: the case of Basque. (2022)

Chapter 7 In Natural Language Processing In Healthcare A Special Focus on Low Resource Languages. Routledge, Taylor & Francis Group.

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Principled Paraphrase Generation with Parallel Corpora (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1621-1638

Xabier Soto, Olatz Perez-De-Viñaspre, Gorka Labaka, Maite Oronoz

Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario (2022)

In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 31–40, Ghent, Belgium. European Association for Machine Translation.

Andoni Azpeitia

Datuen Ustiapena Itzulpen Automatikorako (2022)

-

Ander Salaberria, Jon Ander Campos, Iker García, Joseba Fernandez de Landa

Itzulpen Automatikoko Sistemen Analisia: Genero Alborapenaren Kasua (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring (2021)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6479–6489

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Cristina Cumbreño, Nora Aranberri

What Do You Say? Comparison of Metrics for Post-editing Effort (2021)

In: Carl M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. pp 57-79.

Horacio Saggion, Dimitar Shterionov, Gorka Labaka, Tim Van de Cruys, Vincent Vandeghinste, Josep Blat

SignON: Bidging the gap between and Sign and Oral Languages (2021)

Procesamiento del Lenguaje Natural, Revista no 67, septiembre de 2021

Lana Yeganova, Dina Wiemann, Mariana Neves, Federica Vezzani, Amy Siu, Inigo Jauregi Unanue, Maite Oronoz, Nancy Mah, Aurélie Névéol, David Martinez, Rachel Bawden, Giorgio Maria Di Nunzio, Roland Roller, Philippe Thomas, Cristian Grozea, Olatz Perez-de-Viñaspre, Maika Vicente Navarro, and Antonio Jimeno Yepes

Findings of the WMT 2021 Biomedical Translation Shared Task: Summaries of Animal Experiments as New Test Set (2021)

In Proceedings of the Sixth Conference on Machine Translation, pages 664–683, Online. Association for Computational Linguistics.

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Nora Aranberri

Can translationese features help users select an MT system for post-editing? (2020)

Revista Procesamiento del Lenguaje Natural, 64, 93-100.

Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way

Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

On the cross-lingual transferability of monolingual representations (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Nora Aranberri

With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.

Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262

Uxoa Inurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre

Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)

Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Translation Artifacts in Cross-lingual Transfer Learning (2020)

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).

Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)

Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.

Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova

Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)

Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task

Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille

LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)

Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71

Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria

Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)

Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz

Neural Machine Translation of clinical texts between long distance languages (2019)

JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110

Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)

Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation

Mikel Artetxe, Holger Schwenk

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

An Effective Approach to Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.

Mikel Artetxe, Holger Schwenk

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)

Transactions of the Association for Computational Linguistics 7 (2019): 597-610.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)

Procesamiento del Lenguaje Natural 63 (2019): 151-154.

Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)

Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.

Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge

Neural Machine Translation of Basque (2018)

EAMT 2018. Alicante.

Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin

QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)

Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Nora Aranberri, Gorka Labaka

Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)

Senez, 48 (2017)

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning (2013)

Translation: Computation, Corpora, Cognition (TC3) journal. Vol 3, No 1, pp: 119-135 (2013). Special Issue on Language Technologies for a Multilingual Europe, ISSN: 2193-6986, https://www.researchgate.net/publication/250927257_The_BerbaTek_project_for_Basque_Promoting_a_less-resourced_language_via_language_technology_for_translation_content_management_and_learning

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)

Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf

Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola

Matxin, an open-source rule-based machine translation system for Basque. (2011)

Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf

Pérez, Alicia and Torres, M Inés and Casascuberta, Francisco

Potential scope of a fully-integrated architecture for speech translation (2010)

Proceedings of the 14th Annual Meeting of the European Association for Machine Translation.

Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola

Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)

MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304

All HiTZ publications