Approximate Dynamic Programming With Adaptive Critics And The Algebraic Perceptron As A Fast Neural Network Related To Support Vector Machines

Download Approximate Dynamic Programming With Adaptive Critics And The Algebraic Perceptron As A Fast Neural Network Related To Support Vector Machines PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Approximate Dynamic Programming With Adaptive Critics And The Algebraic Perceptron As A Fast Neural Network Related To Support Vector Machines book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Approximate Dynamic Programming with Adaptive Critics and the Algebraic Perceptron as a Fast Neural Network Related to Support Vector Machines

[Truncated abstract. Please see the pdf version for the complete text. Also, formulae and special characters can only be approximated here. Please see the pdf version of this abstract for an accurate reproduction.] This thesis treats two aspects of intelligent control: The first part is about long-term optimization by approximating dynamic programming and in the second part a specific class of a fast neural network, related to support vector machines (SVMs), is considered. The first part relates to approximate dynamic programming, especially in the framework of adaptive critic designs (ACDs). Dynamic programming can be used to find an optimal decision or control policy over a long-term period. However, in practice it is difficult, and often impossible, to calculate a dynamic programming solution, due to the 'curse of dimensionality'. The adaptive critic design framework addresses this issue and tries to find a good solution by approximating the dynamic programming process for a stationary environment. In an adaptive critic design there are three modules, the plant or environment to be controlled, a critic to estimate the long-term cost and an action or controller module to produce the decision or control strategy. Even though there have been many publications on the subject over the past two decades, there are some points that have had less attention. While most of the publications address the training of the critic, one of the points that has not received systematic attention is training of the action module.¹ Normally, training starts with an arbitrary, hopefully stable, decision policy and its long-term cost is then estimated by the critic. Often the critic is a neural network that has to be trained, using a temporal difference and Bellman's principle of optimality. Once the critic network has converged, a policy improvement step is carried out by gradient descent to adjust the parameters of the controller network. Then the critic is retrained again to give the new long-term cost estimate. However, it would be preferable to focus more on extremal policies earlier in the training. Therefore, the Calculus of Variations is investigated to discard the idea of using the Euler equations to train the actor. However, an adaptive critic formulation for a continuous plant with a short-term cost as an integral cost density is made and the chain rule is applied to calculate the total derivative of the short-term cost with respect to the actor weights. This is different from the discrete systems, usually used in adaptive critics, which are used in conjunction with total ordered derivatives. This idea is then extended to second order derivatives such that Newton's method can be applied to speed up convergence. Based on this, an almost concurrent actor and critic training was proposed. The equations are developed for any non-linear system and short-term cost density function and these were tested on a linear quadratic regulator (LQR) setup. With this approach the solution to the actor and critic weights can be achieved in only a few actor-critic training cycles. Some other, more minor issues, in the adaptive critic framework are investigated, such as the influence of the discounting factor in the Bellman equation on total ordered derivatives, the target interpretation in backpropagation through time as moving and fixed targets, the relation between simultaneous recurrent networks and dynamic programming is stated and a reinterpretation of the recurrent generalized multilayer perceptron (GMLP) as a recurrent generalized finite impulse MLP (GFIR-MLP) is made. Another subject in this area that is investigated, is that of a hybrid dynamical system, characterized as a continuous plant and a set of basic feedback controllers, which are used to control the plant by finding a switching sequence to select one basic controller at a time. The special but important case is considered when the plant is linear but with some uncertainty in the state space and in the observation vector, and a quadratic cost function. This is a form of robust control, where a dynamic programming solution has to be calculated. ¹Werbos comments that most treatment of action nets or policies either assume enumerative maximization, which is good only for small problems, except for the games of Backgammon or Go [1], or, gradient-based training. The latter is prone to difficulties with local minima due to the non-convex nature of the cost-to-go function. With incremental methods, such as backpropagation through time, calculus of variations and model-predictive control, the dangers of non-convexity of the cost-to-go function with respect to the control is much less than the with respect to the critic parameters, when the sampling times are small. Therefore, getting the critic right has priority. But with larger sampling times, when the control represents a more complex plan, non-convexity becomes more serious.
Reinforcement Learning

Author: Richard S. Sutton
language: en
Publisher: Springer Science & Business Media
Release Date: 2012-12-06
Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of machine learning, but instead must discover which actions yield the highest reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward, but also the next situation, and through that all subsequent rewards. These two characteristics -- trial-and-error search and delayed reward -- are the most important distinguishing features of reinforcement learning. Reinforcement learning is both a new and a very old topic in AI. The term appears to have been coined by Minsk (1961), and independently in control theory by Walz and Fu (1965). The earliest machine learning research now viewed as directly relevant was Samuel's (1959) checker player, which used temporal-difference learning to manage delayed reward much as it is used today. Of course learning and reinforcement have been studied in psychology for almost a century, and that work has had a very strong impact on the AI/engineering work. One could in fact consider all of reinforcement learning to be simply the reverse engineering of certain psychological learning processes (e.g. operant conditioning and secondary reinforcement). Reinforcement Learning is an edited volume of original research, comprising seven invited contributions by leading researchers.
Understanding Machine Learning

Author: Shai Shalev-Shwartz
language: en
Publisher: Cambridge University Press
Release Date: 2014-05-19
Introduces machine learning and its algorithmic paradigms, explaining the principles behind automated learning approaches and the considerations underlying their usage.