Speech and Language Resources

For the development of products and applications in Linguistic Technology it is necessary to have basic linguistic resources (textual and oral corpus, lexicons and knowledge bases) and development tools (morphological and syntactic analysers, meaning disambiguators, corpus treatment tools, lemmatisers, integrated tool environments, etc.).

We have more than 25 years of experience in the creation of this type of basic linguistic resources and we have different reference corpus, lexicons ...Read More

see more

data_tabs

Demos

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

Contracts

Projects

Patents

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Resources

Publications

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza

EUSKOR: End-to-end coreference resolution system for Basque (2019)

PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3

ItziarGonzalez-Dios, German Rigau

Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Igone Zabala

Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)

In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)
file2
(2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2018)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1

Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J.

A Methodology for the Semiautomatic Annotation of EPEC-RolSem, a Basque Corpus Labeled at Predicate Level following the PropBank/Verbnet Model (2016)

Edward Vanhoutte (ed.) Digital Scholarship in the Humanities (2016) 31 (3): 470-492. DOI: http://dx.doi.org/10.1093/llc/fqv001 First published online: 17 June 2015 (23 pages). Published by Oxford University Press on behalf of EADH: The European Association for Digital Humanities (Online ISSN 2055-768X - Print ISSN 2055-7671)

Maria Jesús Aranzabe, Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Iakes Goenaga, Koldo Gojenola, Larraitz Uria

Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies (2015)

Markus Dickinsons, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski (eds), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), 233-241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa, Poland. ISBN: 978-83-63159-18-4

Iruskieta M., Aranzabe M., Diaz de Ilarraza A., Gonzalez I., Lersundi I., Lopez de Lacalle O.

The RST Basque TreeBank: an online search interface to check rhetorical relations (2013)

4th​ Workshop RST and Discourse Studies, 40-49, Sociedad Brasileira de Computacao, Fortaleza, CE, Brasil. October 20-24 (http://encontrorst2013.wix.com/encontro-rst-2013)​

Pociello E., Agirre E. and Aldezabal I.

Methodology and construction of the Basque WordNet (2011)

Language Resources and Evaluation. Springer. Volume 45, Issue 2, pp 121-142. ISSN 1574-020X. DOI 10.1007/s10579-010-9131-y. official

Izaskun Aldezabal, Maria Jesús Aranzabe, Jose Maria Arriola, Arantza Diaz de Ilarraza

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues (2009)

Corpus Linguistics and Linguistic Theory 5-2 (2009), 241-269. Mouton de Gruyter. Berlin-New York. Print ISSN: 1613-7027 Online ISSN: 1613-7035

Itziar Aduriz, Maria Jesús Aranzabe, Jose Maria Arriola, Aitziber Atutxa, Arantza Diaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, Ruben Urizar

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing (2006)

Corpus Linguistics Around the World. Book series: Language and Computers. Vol 56 (pag 1- 15). ISBN 90-420-1836-4 Ed. Andrew Wilson, Paul Rayson, and Dawn Archer. Rodopi. Netherlands.

Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, Mikel Iruskieta, Elixabete Izagirre, Karmele Mendizabal, Eli Pociello

Improving the Basque WordNet by corpus annotation. (2006)

Proceedings of Third International WordNet Conference. pp. 287-290. ISBN 80-210-3915-9. Jeju Island (Korea).

Izaskun Aldezabal, Olatz Ansa, Bertol Arrieta, Xabier Artola, Aitzol Ezeiza, Gregorio Hernández, Mikel Lersundi

EDBL: a General Lexical Basis for the Automatic Processing of Basque (2001)

IRCS Workshop on linguistic databases. Philadelphia (USA).

More publications

data_tabs_full

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza

EUSKOR: End-to-end coreference resolution system for Basque (2019)

PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3

ItziarGonzalez-Dios, German Rigau

Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Igone Zabala

Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)

In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)
file2
(2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2018)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1

Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J.

A Methodology for the Semiautomatic Annotation of EPEC-RolSem, a Basque Corpus Labeled at Predicate Level following the PropBank/Verbnet Model (2016)

Edward Vanhoutte (ed.) Digital Scholarship in the Humanities (2016) 31 (3): 470-492. DOI: http://dx.doi.org/10.1093/llc/fqv001 First published online: 17 June 2015 (23 pages). Published by Oxford University Press on behalf of EADH: The European Association for Digital Humanities (Online ISSN 2055-768X - Print ISSN 2055-7671)

Maria Jesús Aranzabe, Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Iakes Goenaga, Koldo Gojenola, Larraitz Uria

Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies (2015)

Markus Dickinsons, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski (eds), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), 233-241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa, Poland. ISBN: 978-83-63159-18-4

Iruskieta M., Aranzabe M., Diaz de Ilarraza A., Gonzalez I., Lersundi I., Lopez de Lacalle O.

The RST Basque TreeBank: an online search interface to check rhetorical relations (2013)

4th​ Workshop RST and Discourse Studies, 40-49, Sociedad Brasileira de Computacao, Fortaleza, CE, Brasil. October 20-24 (http://encontrorst2013.wix.com/encontro-rst-2013)​

Pociello E., Agirre E. and Aldezabal I.

Methodology and construction of the Basque WordNet (2011)

Language Resources and Evaluation. Springer. Volume 45, Issue 2, pp 121-142. ISSN 1574-020X. DOI 10.1007/s10579-010-9131-y. official

Izaskun Aldezabal, Maria Jesús Aranzabe, Jose Maria Arriola, Arantza Diaz de Ilarraza

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues (2009)

Corpus Linguistics and Linguistic Theory 5-2 (2009), 241-269. Mouton de Gruyter. Berlin-New York. Print ISSN: 1613-7027 Online ISSN: 1613-7035

Itziar Aduriz, Maria Jesús Aranzabe, Jose Maria Arriola, Aitziber Atutxa, Arantza Diaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, Ruben Urizar

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing (2006)

Corpus Linguistics Around the World. Book series: Language and Computers. Vol 56 (pag 1- 15). ISBN 90-420-1836-4 Ed. Andrew Wilson, Paul Rayson, and Dawn Archer. Rodopi. Netherlands.

Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, Mikel Iruskieta, Elixabete Izagirre, Karmele Mendizabal, Eli Pociello

Improving the Basque WordNet by corpus annotation. (2006)

Proceedings of Third International WordNet Conference. pp. 287-290. ISBN 80-210-3915-9. Jeju Island (Korea).

Izaskun Aldezabal, Olatz Ansa, Bertol Arrieta, Xabier Artola, Aitzol Ezeiza, Gregorio Hernández, Mikel Lersundi

EDBL: a General Lexical Basis for the Automatic Processing of Basque (2001)

IRCS Workshop on linguistic databases. Philadelphia (USA).

More publications