Cross Lingual Word Embeddings For Low Resource And Morphologically Rich Languages


Download Cross Lingual Word Embeddings For Low Resource And Morphologically Rich Languages PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Cross Lingual Word Embeddings For Low Resource And Morphologically Rich Languages book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.

Download

Cross-lingual Word Embeddings for Low-resource and Morphologically-rich Languages


Cross-lingual Word Embeddings for Low-resource and Morphologically-rich Languages

Author: Ali Hakimi Parizi

language: en

Publisher:

Release Date: 2021


DOWNLOAD





Despite recent advances in natural language processing, there is still a gap in state-of-the-art methods to address problems related to low-resource and morphologically-rich languages. These methods are data-hungry, and due to the scarcity of training data for low-resource and morphologically-rich languages, developing NLP tools for them is a challenging task. Approaches for forming cross-lingual embeddings and transferring knowledge from a rich- to a low-resource language have emerged to overcome the lack of training data. Although in recent years we have seen major improvements in cross-lingual methods, these methods still have some limitations that have not been addressed properly. An important problem is the out-of-vocabulary word (OOV) problem, i.e., words that occur in a document being processed, but that the model did not observe during training. The OOV problem is more significant in the case of low-resource languages, since there is relatively little training data available for them, and also in the case of morphologically-rich languages, since it is very likely that we do not observe a considerable number of their word forms in the training data. Approaches to learning sub-word embeddings have been proposed to address the OOV problem in monolingual models, but most prior work has not considered sub-word embeddings in cross-lingual models. The hypothesis of this thesis is that it is possible to leverage sub-word information to overcome the OOV problem in low-resource and morphologically-rich languages. This thesis presents a novel bilingual lexicon induction task to demonstrate the effectiveness of sub-word information in the cross-lingual space and how it can be employed to overcome the OOV problem. Moreover, this thesis presents a novel cross-lingual word representation method that incorporates sub-word information during the training process to learn a better cross-lingual shared space and also better represent OOVs in the shared space. This method is particularly suitable for low-resource scenarios and this claim is proven through a series of experiments on bilingual lexicon induction, monolingual word similarity, and a downstream task, document classification. More specifically, it is shown that this method is suitable for low-resource languages by conducting bilingual lexicon induction on twelve low-resource and morphologically-rich languages.

Cross-Lingual Word Embeddings


Cross-Lingual Word Embeddings

Author: Anders Søgaard

language: en

Publisher: Springer Nature

Release Date: 2022-05-31


DOWNLOAD





The majority of natural language processing (NLP) is English language processing, and while there is good language technology support for (standard varieties of) English, support for Albanian, Burmese, or Cebuano--and most other languages--remains limited. Being able to bridge this digital divide is important for scientific and democratic reasons but also represents an enormous growth potential. A key challenge for this to happen is learning to align basic meaning-bearing units of different languages. In this book, the authors survey and discuss recent and historical work on supervised and unsupervised learning of such alignments. Specifically, the book focuses on so-called cross-lingual word embeddings. The survey is intended to be systematic, using consistent notation and putting the available methods on comparable form, making it easy to compare wildly different approaches. In so doing, the authors establish previously unreported relations between these methods and are able to present a fast-growing literature in a very compact way. Furthermore, the authors discuss how best to evaluate cross-lingual word embedding methods and survey the resources available for students and researchers interested in this topic.

Speech and Language Technologies for Low-Resource Languages


Speech and Language Technologies for Low-Resource Languages

Author: Bharathi Raja Chakravarthi

language: en

Publisher: Springer Nature

Release Date: 2024-04-23


DOWNLOAD





This book constitutes the refereed conference proceedings of the second International Conference on Speech and Language Technologies for Low-Resource Languages, SPELLL 2023, held in Perundurai, Erode, India, during December 6–8, 2023. The 27 full papers and 6 short papers presented in this book were carefully reviewed and selected from 94 submissions. The papers are divided into the following topical sections: language resources; language technologies; speech technologies; and workshops - regional fake, MMLOW, LC4.