Text Analysis

Natural Language Analysis Tools are software modules that perform linguistic analysis on texts at different levels. These tools are essential components of any Natual Language Processing (NLP) software that analyzes text, and any text mining software is typically built by combining basic linguistic modules forming complex pipelines.

The HiTZ center has a large tradition in building analysis tools for many languages, which range from basic linguistic processors such as tokenizers, Part-...Read More

see more

Text_analysis_tabs

Demos

Demo of the English NLP pipeline

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the Spanish NLP pipeline

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format.

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Xuxen

Basque spelling corrector on-line

Contracts

Projects

Patents

MALTIXA

Resources

Publications

Y Yaghoobzadeh, K Kann, TJ Hazen, E Agirre, H Schütze

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

José Ramom Pichel, Pablo Gamallo, Iñaki Alegria

Cross-lingual Diachronic Distance: Application to Portuguese and Spanish (2019)

SEPLN, 2019

José Ramom Pichel, Pablo Gamallo, Iñaki Alegria

Measuring diachronic language distance using perplexity. Application to English, Portuguese and Spanish. (2019)

Natural Language Engeenering

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation (2018)

Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 282–291. Brussels, Belgium, October 31 - November 1, 2018. Best paper award

Zuhaitz Beloki and Xabier Artola and Aitor Soroa

A scalable architecture for data-intensive natural language processing (2017)

Natural Language Engineering, 1-23. doi:10.1017/S1351324917000092.

Arantxa Otegi, Nerea Ezeiza, Iakes Goenaga, Gorka Labaka

A Modular Chain of NLP Tools for Basque (2016)

Proceedings of the 19th International Conference on Text, Speech and Dialogue, TSD 2016, Brno, Czech Republic, Lecture Notes in Computer Science, vol. 9924, pp. 93-100, Springer. ISBN 978-3-319-45509-9. DOI 10.1007/978-3-319-45510-5_11

Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, Aitor Soroa

Big data for Natural Language Processing: A streaming approach (2015)

Knowledge-Based Systems. http://dx.doi.org/10.1016/j.knosys.2014.11.007. Vol.79, pages 36-42.

Xabier Artola, Zuhaitz Beloki, Aitor Soroa

A stream computing approach towards scalable NLP (2014)

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland. ISBN: 978-2-9517408-8-4

Rodrigo Agerri, Josu Bermudez, German Rigau

IXA pipeline: Efficient and Ready to Use Multilingual NLP tools. (2014)

LREC 2014: 3823-3828. ISBN 978-2-9517408-8-4

More publications

Text_analysis_tabs_full

Demo of the English NLP pipeline

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the Spanish NLP pipeline

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format.

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Xuxen

Basque spelling corrector on-line

MALTIXA

Y Yaghoobzadeh, K Kann, TJ Hazen, E Agirre, H Schütze

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

José Ramom Pichel, Pablo Gamallo, Iñaki Alegria

Cross-lingual Diachronic Distance: Application to Portuguese and Spanish (2019)

SEPLN, 2019

José Ramom Pichel, Pablo Gamallo, Iñaki Alegria

Measuring diachronic language distance using perplexity. Application to English, Portuguese and Spanish. (2019)

Natural Language Engeenering

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation (2018)

Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 282–291. Brussels, Belgium, October 31 - November 1, 2018. Best paper award

Zuhaitz Beloki and Xabier Artola and Aitor Soroa

A scalable architecture for data-intensive natural language processing (2017)

Natural Language Engineering, 1-23. doi:10.1017/S1351324917000092.

Arantxa Otegi, Nerea Ezeiza, Iakes Goenaga, Gorka Labaka

A Modular Chain of NLP Tools for Basque (2016)

Proceedings of the 19th International Conference on Text, Speech and Dialogue, TSD 2016, Brno, Czech Republic, Lecture Notes in Computer Science, vol. 9924, pp. 93-100, Springer. ISBN 978-3-319-45509-9. DOI 10.1007/978-3-319-45510-5_11

Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, Aitor Soroa

Big data for Natural Language Processing: A streaming approach (2015)

Knowledge-Based Systems. http://dx.doi.org/10.1016/j.knosys.2014.11.007. Vol.79, pages 36-42.

Xabier Artola, Zuhaitz Beloki, Aitor Soroa

A stream computing approach towards scalable NLP (2014)

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland. ISBN: 978-2-9517408-8-4

Rodrigo Agerri, Josu Bermudez, German Rigau

IXA pipeline: Efficient and Ready to Use Multilingual NLP tools. (2014)

LREC 2014: 3823-3828. ISBN 978-2-9517408-8-4

More publications