Information Theoretic Learning Methods For Markov Decision Processes With Parametric Uncertainty


Download Information Theoretic Learning Methods For Markov Decision Processes With Parametric Uncertainty PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Information Theoretic Learning Methods For Markov Decision Processes With Parametric Uncertainty book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.

Download

Information Theoretic Learning Methods for Markov Decision Processes with Parametric Uncertainty


Information Theoretic Learning Methods for Markov Decision Processes with Parametric Uncertainty

Author: Peeyush Kumar

language: en

Publisher:

Release Date: 2018


DOWNLOAD





Markov decision processes (MDPs) model a class of stochastic sequential decision problems with applications in engineering, medicine, and business analytics. There is considerable interest in the literature in MDPs with imperfect information, were the search for well-performing policies faces many challenges. There is no rigorous universally accepted optimality criterion. The search space explodes and the decision-maker suffers from the curse-of-dimensionality. Finding good policies requires careful balancing of the trade-off between exploration to acquire information and exploitation of this information to earn high rewards. This dissertation contributes to this area by building a rigorous framework rooted in information theory for solving MDPs with model uncertainty. In the first chapter, the value of a parameter that characterizes the transition probabilities is unkown to the decision-maker. The decision-maker updates its Bayesian belief about this parameter using state observations induced by policies it chooses. Information Directed Policy Sampling (IDPS) is proposed to manage the exploration-exploitation trade-off. At each time-stage, the decision-maker solves a convex problem to sample a policy from a distribution that minimizes a particular ratio. The numerator of this ratio equals the square of the expected regret of distributions over policy trajectories (exploitation). The denominator equals the expected mutual information between the resulting system-state trajectory and the parameter's posterior (exploration). A generalization of Hoeffding's inequality is employed to bound regret. The bound grows at a square-root rate with the planning horizon, and a square-root log-linear rate with the parameter-set cardinality. It is insensitive to state and action-space cardinalities. The regret per stage converges to zero as the planning horizon increases. IDPS is thus asymptotically optimal. Numerical results on a stylized example, an auction-design problem, and a response-guided dosing problem demonstrate its benefits. Uncertainty in transition probablities arises from two levels in the second chapter. The top level corresponds to the ambiguity about the system model. Bottom-level uncertainty is rooted in the unknown parameter values for each possible model. Prior-update formulas using a hierarchical Bayesian framework are derived and incorporated into two learning algorithms: Thompson Sampling and a hierarchical extension of IDPS. Analytical performance bounds for these algorithms are developed. Numerical results on the response-guided dosing problem, which is amenable to hierarchical modeling, are presented. The third chapter extends the above to partially observable Markov decision processes (POMDPs). In POMDPs, the decision-maker cannot observe the acutal state of the system. Instead, it can take a measurement that provides probablitistic information about the true state. Such POMDPs are equivalent to Bayesian adaptive MDPs (BAMDPs) fromt he first two chapters. This connection is exploited to devise alogrithms and provide analytical performance guarantees for POMDPs in three separate cases: a) uncertainty in the transition probablities; b) uncertainty in the measurement outcome probabilities; and c) uncertainty in both. Numerical results on partially observed response-guided dosing are included. the fourth chapter proposes a formal information theoretic framework inspired by stochastic thermodynamics. It utilizes the idea that information is physical. An explicit link between information entropy and stochastic dynamics of a system coupled to an environment is developed from fundamental principles. Unlike the heuristic method of defining information ratio, this provides an optimization program that is built from system dynamics, problem objective, and the feedback from observations. To the best of my knowledge, this is the first comprehensive work in MDPs with model uncertainty, which builds a problem formulation entirely grounded in system and information dynamics without the use of ad-hoc heuristics.

Advances in Service Science


Advances in Service Science

Author: Hui Yang

language: en

Publisher: Springer

Release Date: 2018-12-28


DOWNLOAD





This volume offers the state-of-the-art research and developments in service science and related research, education and practice areas. It showcases emerging technology and applications in fields including healthcare, information technology, transportation, sports, logistics, and public services. Regardless of size and service, a service organization is a service system. Because of the socio-technical nature of a service system, a systems approach must be adopted to design, develop, and deliver services, aimed at meeting end users' both utilitarian and socio-psychological needs. Effective understanding of service and service systems often requires combining multiple methods to consider how interactions of people, technology, organizations, and information create value under various conditions. The papers in this volume highlight ways to approach such technical challenges in service science and are based on submissions from the 2018 INFORMS International Conference on Service Science.

Neural Information Processing


Neural Information Processing

Author: Haiqin Yang

language: en

Publisher: Springer Nature

Release Date: 2020-11-19


DOWNLOAD





The three-volume set of LNCS 12532, 12533, and 12534 constitutes the proceedings of the 27th International Conference on Neural Information Processing, ICONIP 2020, held in Bangkok, Thailand, in November 2020. Due to COVID-19 pandemic the conference was held virtually. The 187 full papers presented were carefully reviewed and selected from 618 submissions. The papers address the emerging topics of theoretical research, empirical studies, and applications of neural information processing techniques across different domains. The second volume, LNCS 12533, is organized in topical sections on computational intelligence; machine learning; robotics and control.