Lessons from AlphaZero for Optimal Model Predictive and Adaptive Control Book [PDF] Download

Download the fantastic book titled Lessons from AlphaZero for Optimal Model Predictive and Adaptive Control written by Dimitri Bertsekas, available in its entirety in both PDF and EPUB formats for online reading. This page includes a concise summary, a preview of the book cover, and detailed information about "Lessons from AlphaZero for Optimal Model Predictive and Adaptive Control", which was released on 19 March 2022. We suggest perusing the summary before initiating your download. This book is a top selection for enthusiasts of the Computers genre.

Summary of Lessons from AlphaZero for Optimal Model Predictive and Adaptive Control by Dimitri Bertsekas PDF

The purpose of this book is to propose and develop a new conceptual framework for approximate Dynamic Programming (DP) and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call these the off-line training and the on-line play algorithms; the names are borrowed from some of the major successes of RL involving games. Primary examples are the recent (2017) AlphaZero program (which plays chess), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Both AlphaZero and TD-Gammon were trained off-line extensively using neural networks and an approximate version of the fundamental DP algorithm of policy iteration. Yet the AlphaZero player that was obtained off-line is not used directly during on-line play (it is too inaccurate due to approximation errors that are inherent in off-line neural network training). Instead a separate on-line player is used to select moves, based on multistep lookahead minimization and a terminal position evaluator that was trained using experience with the off-line player. The on-line player performs a form of policy improvement, which is not degraded by neural network approximations. As a result, it greatly improves the performance of the off-line player. Similarly, TD-Gammon performs on-line a policy improvement step using one-step or two-step lookahead minimization, which is not degraded by neural network approximations. To this end it uses an off-line neural network-trained terminal position evaluator, and importantly it also extends its on-line lookahead by rollout (simulation with the one-step lookahead player that is based on the position evaluator). Significantly, the synergy between off-line training and on-line play also underlies Model Predictive Control (MPC), a major control system design methodology that has been extensively developed since the 1980s. This synergy can be understood in terms of abstract models of infinite horizon DP and simple geometrical constructions, and helps to explain the all-important stability issues within the MPC context. An additional benefit of policy improvement by approximation in value space, not observed in the context of games (which have stable rules and environment), is that it works well with changing problem parameters and on-line replanning, similar to indirect adaptive control. Here the Bellman equation is perturbed due to the parameter changes, but approximation in value space still operates as a Newton step. An essential requirement here is that a system model is estimated on-line through some identification method, and is used during the one-step or multistep lookahead minimization process. In this monograph we aim to provide insights (often based on visualization), which explain the beneficial effects of on-line decision making on top of off-line training. In the process, we will bring out the strong connections between the artificial intelligence view of RL, and the control theory views of MPC and adaptive control. Moreover, we will show that in addition to MPC and adaptive control, our conceptual framework can be effectively integrated with other important methodologies such as multiagent systems and decentralized control, discrete and Bayesian optimization, and heuristic algorithms for discrete optimization. One of our principal aims is to show, through the algorithmic ideas of Newton's method and the unifying principles of abstract DP, that the AlphaZero/TD-Gammon methodology of approximation in value space and rollout applies very broadly to deterministic and stochastic optimal control problems. Newton's method here is used for the solution of Bellman's equation, an operator equation that applies universally within DP with both discrete and continuous state and control spaces, as well as finite and infinite horizon.


Detail About Lessons from AlphaZero for Optimal Model Predictive and Adaptive Control PDF

  • Author : Dimitri Bertsekas
  • Publisher : Athena Scientific
  • Genre : Computers
  • Total Pages : 229 pages
  • ISBN : 1886529175
  • PDF File Size : 17,7 Mb
  • Language : English
  • Rating : 4/5 from 21 reviews

Clicking on the GET BOOK button will initiate the downloading process of Lessons from AlphaZero for Optimal Model Predictive and Adaptive Control by Dimitri Bertsekas. This book is available in ePub and PDF format with a single click unlimited downloads.

GET BOOK

A Course in Reinforcement Learning

A Course in Reinforcement Learning
  • Publisher : Athena Scientific
  • File Size : 20,5 Mb
  • Release Date : 21 June 2023
GET BOOK

These lecture notes were prepared for use in the 2023 ASU research-oriented course on Reinforcement Learning (RL) that I have offered in each of the last five years. Their purpose is

Reinforcement Learning and Optimal Control

Reinforcement Learning and Optimal Control
  • Publisher : Athena Scientific
  • File Size : 44,6 Mb
  • Release Date : 01 July 2019
GET BOOK

This book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming (DP), but their exact solution is computationally intractable. We discuss solution methods

Reinforcement Learning

Reinforcement Learning
  • Publisher : Springer Nature
  • File Size : 48,5 Mb
  • Release Date : 24 July 2023
GET BOOK

This book offers a thorough introduction to the basics and scientific and technological innovations involved in the modern study of reinforcement-learning-based feedback control. The authors address a wide variety of

Reinforcement Learning for Optimal Feedback Control

Reinforcement Learning for Optimal Feedback Control
  • Publisher : Springer
  • File Size : 43,9 Mb
  • Release Date : 10 May 2018
GET BOOK

Reinforcement Learning for Optimal Feedback Control develops model-based and data-driven reinforcement learning methods for solving optimal control problems in nonlinear deterministic dynamical systems. In order to achieve learning under uncertainty,

Recent Advances in Model Predictive Control

Recent Advances in Model Predictive Control
  • Publisher : Springer Nature
  • File Size : 38,7 Mb
  • Release Date : 17 April 2021
GET BOOK

This book focuses on distributed and economic Model Predictive Control (MPC) with applications in different fields. MPC is one of the most successful advanced control methodologies due to the simplicity

Adaptive Optimal Control

Adaptive Optimal Control
  • Publisher : Unknown Publisher
  • File Size : 36,8 Mb
  • Release Date : 03 June 1990
GET BOOK

Exploring connections between adaptive control theory and practice, this book treats the techniques of linear quadratic optimal control and estimation (Kalman filtering), recursive identification, linear systems theory and robust arguments.

Convex Optimization Theory

Convex Optimization Theory
  • Publisher : Athena Scientific
  • File Size : 44,7 Mb
  • Release Date : 01 June 2009
GET BOOK

An insightful, concise, and rigorous treatment of the basic theory of convex sets and functions in finite dimensions, and the analytical/geometrical foundations of convex optimization and duality theory. Convexity