Sparse Model Building From Genome Wide Variation With Graphical Models


Download Sparse Model Building From Genome Wide Variation With Graphical Models PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Sparse Model Building From Genome Wide Variation With Graphical Models book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.

Download

Sparse Model Building from Genome-wide Variation with Graphical Models


Sparse Model Building from Genome-wide Variation with Graphical Models

Author: Benjamin Alexander Logsdon

language: en

Publisher:

Release Date: 2011


DOWNLOAD





High throughput sequencing and expression characterization have lead to an explosion of phenotypic and genotypic molecular data underlying both experimental studies and outbred populations. We develop a novel class of algorithms to reconstruct sparse models among these molecular phenotypes (e.g. expression products) and genotypes (e.g. single nucleotide polymorphisms), via both a Bayesian hierarchical model, when the sample size is much smaller than the model dimension (i.e. p n) and the well characterized adaptive lasso algo- rithm. Specifically, we propose novel approaches to the problems of increasing power to detect additional loci in genome-wide association studies using our variational algorithm, efficiently learning directed cyclic graphs from expression and genotype data using the adaptive lasso, and constructing genomewide undirected graphs among genotype, expression and downstream phenotype data using an extension of the variational feature selection algorithm. The Bayesian hierarchical model is derived for a parametric multiple regression model with a mixture prior of a point mass and normal distribution for each regression coefficient, and appropriate priors for the set of hyperparameters. When combined with a probabilistic consistency bound on the model dimension, this approach leads to very sparse solutions without the need for cross validation. We use a variational Bayes approximate inference approach in our algorithm, where we impose a complete factorization across all parameters for the approximate posterior distribution, and then minimize the KullbackLeibler divergence between the approximate and true posterior distributions. Since the prior distribution is non-convex, we restart the algorithm many times to find multiple posterior modes, and combine information across all discovered modes in an approximate Bayesian model averaging framework, to reduce the variance of the posterior probability estimates. We perform analysis of three major publicly available data-sets: the HapMap 2 genotype and expression data collected on immortalized lymphoblastoid cell lines, the genome-wide gene expression and genetic marker data collected for a yeast intercross, and genomewide gene expression, genetic marker, and downstream phenotypes related to weight in a mouse F2 intercross. Based on both simulations and data analysis we show that our algorithms can outperform other state of the art model selection procedures when including thousands to hundreds of thousands of genotypes and expression traits, in terms of aggressively controlling false discovery rate, and generating rich simultaneous statistical models.

Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics


Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics

Author: Christine Sinoquet

language: en

Publisher: OUP Oxford

Release Date: 2014-09-18


DOWNLOAD





Nowadays bioinformaticians and geneticists are faced with myriad high-throughput data usually presenting the characteristics of uncertainty, high dimensionality and large complexity. These data will only allow insights into this wealth of so-called 'omics' data if represented by flexible and scalable models, prior to any further analysis. At the interface between statistics and machine learning, probabilistic graphical models (PGMs) represent a powerful formalism to discover complex networks of relations. These models are also amenable to incorporating a priori biological information. Network reconstruction from gene expression data represents perhaps the most emblematic area of research where PGMs have been successfully applied. However these models have also created renewed interest in genetics in the broad sense, in particular regarding association genetics, causality discovery, prediction of outcomes, detection of copy number variations, and epigenetics. This book provides an overview of the applications of PGMs to genetics, genomics and postgenomics to meet this increased interest. A salient feature of bioinformatics, interdisciplinarity, reaches its limit when an intricate cooperation between domain specialists is requested. Currently, few people are specialists in the design of advanced methods using probabilistic graphical models for postgenomics or genetics. This book deciphers such models so that their perceived difficulty no longer hinders their use and focuses on fifteen illustrations showing the mechanisms behind the models. Probabilistic Graphical Models for Genetics, Genomics and Postgenomics covers six main themes: (1) Gene network inference (2) Causality discovery (3) Association genetics (4) Epigenetics (5) Detection of copy number variations (6) Prediction of outcomes from high-dimensional genomic data. Written by leading international experts, this is a collection of the most advanced work at the crossroads of probabilistic graphical models and genetics, genomics, and postgenomics. The self-contained chapters provide an enlightened account of the pros and cons of applying these powerful techniques.

Statistical Analysis for High-Dimensional Data


Statistical Analysis for High-Dimensional Data

Author: Arnoldo Frigessi

language: en

Publisher: Springer

Release Date: 2016-02-16


DOWNLOAD





This book features research contributions from The Abel Symposium on Statistical Analysis for High Dimensional Data, held in Nyvågar, Lofoten, Norway, in May 2014. The focus of the symposium was on statistical and machine learning methodologies specifically developed for inference in “big data” situations, with particular reference to genomic applications. The contributors, who are among the most prominent researchers on the theory of statistics for high dimensional inference, present new theories and methods, as well as challenging applications and computational solutions. Specific themes include, among others, variable selection and screening, penalised regression, sparsity, thresholding, low dimensional structures, computational challenges, non-convex situations, learning graphical models, sparse covariance and precision matrices, semi- and non-parametric formulations, multiple testing, classification, factor models, clustering, and preselection. Highlighting cutting-edge research and casting light on future research directions, the contributions will benefit graduate students and researchers in computational biology, statistics and the machine learning community.