Probabilistic Topic Models For Information Retrieval And Concept Modeling

Download Probabilistic Topic Models For Information Retrieval And Concept Modeling PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Probabilistic Topic Models For Information Retrieval And Concept Modeling book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Probabilistic Topic Models for Information Retrieval and Concept Modeling

Statistical topic models are a class of probabilistic latent variable models for textual data that represent text documents as distributions over topics. These models have been shown to produce interpretable summarization of documents in the form of topics. In this dissertation, we investigate how the statistical topic modeling framework can be used for information retrieval tasks and for the integration of background knowledge in the form of semantic concepts. We first describe the special-words topic models in which a document is represented as a distribution of (i) a mixture of shared topics, (ii) a special-words distribution specific to the document, and (iii) a corpus-level background distribution. We describe the utility of the special-words topic models for information retrieval tasks and illustrate a variation of the model for metadata enhancement of digital libraries with multiple corpora. We next investigate the problem of integrating background knowledge in the form of semantic concepts into the topic modeling framework. To combine data-driven topics and semantic concepts, we propose the concept-topic model which represents a document as a distribution over data-driven topics and semantic concepts. We extend this model to the hierarchical concept-topic model to incorporate concept hierarchies into the modeling framework. For all these models, we develop learning algorithms and demonstrate their utility with experiments conducted on real-world data sets.
Advances in Information Retrieval

This book constitutes the proceedings of the 35th European Conference on IR Research, ECIR 2013, held in Moscow, Russia, in March 2013. The 55 full papers, 38 poster papers and 10 demonstrations presented in this volume were carefully reviewed and selected from 287 submissions. The papers are organized in the following topical sections: user aspects; multimedia and cross-media IR; data mining; IR theory and formal models; IR system architectures; classification; Web; event detection; temporal IR, and microblog search. Also included are 4 tutorial and 2 workshop presentations.
Statistical Language Models for Information Retrieval

As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central research problem in information retrieval for several decades. In the past ten years, a new generation of retrieval models, often referred to as statistical language models, has been successfully applied to solve many different information retrieval problems. Compared with the traditional models such as the vector space model, these new models have a more sound statistical foundation and can leverage statistical estimation to optimize retrieval parameters. They can also be more easily adapted to model non-traditional and complex retrieval problems. Empirically, they tend to achieve comparable or better performance than a traditional model with less effort on parameter tuning. This book systematically reviews the large body of literature on applying statistical language models to information retrieval with an emphasis on the underlying principles, empirically effective language models, and language models developed for non-traditional retrieval tasks. All the relevant literature has been synthesized to make it easy for a reader to digest the research progress achieved so far and see the frontier of research in this area. The book also offers practitioners an informative introduction to a set of practically useful language models that can effectively solve a variety of retrieval problems. No prior knowledge about information retrieval is required, but some basic knowledge about probability and statistics would be useful for fully digesting all the details. Table of Contents: Introduction / Overview of Information Retrieval Models / Simple Query Likelihood Retrieval Model / Complex Query Likelihood Model / Probabilistic Distance Retrieval Model / Language Models for Special Retrieval Tasks / Language Models for Latent Topic Analysis / Conclusions