## Data-Based: the Past and Future of Control?

## Organizers:

Suman Chakravorty, *Texas A&M University, College Station* (schakrav[at]tamu[dot]edu)

Raman Goyal, *Palo Alto Research Center, SRI International*

Mohamed Naveed Gul, *Texas A&M University, College Station* (naveed[at]tamu[dot]edu)

## Abstract:

Data-based control has a long history in the Control community, tracing back to seminal work in adaptive control and system identification. However, much of this past work concentrated, for good reason, on linear time-invariant (LTI) problems. With the rapid advances of Reinforcement Learning (RL) in the past decade, owing partly to the vast increase in computing power, data-based control is enjoying a renaissance and seems poised to advance control synthesis to a slew of new applications that are non-LTI. The fundamental construct underpinning RL is optimal control, and thus, in this workshop, we propose a study of the fundamental structure of optimal nonlinear feedback control that may enable the design of scalable, accurate and reliable data-based algorithms for nonlinear, partially observed and uncertain systems. Part of the workshop will address basic questions about “what constitutes feedback” since vast increases in available computation necessitate a careful reassessment of classic results. We conjecture that RL/data-based control methods are the correct approach to scale classical feedback control synthesis methods to complex, high dimensional, partially observed and analytically intractable nonlinear problems, thereby making a vastly increased array of applications amenable to control theoretic techniques. However, it is critical that we design these techniques to be accurate and reliable in addition to being scalable. In this regard, we hope that this workshop will be a first step towards the principled synthesis of such data-based techniques.

## Expected Outcomes:

This workshop will bring together researchers with a diverse set of ideas regarding the approach that is required for the data-based design of scalable, reliable and accurate feedback control synthesis techniques to address complex, nonlinear, partially observed problems that are robust to model uncertainty. In this context, we expect the following outcomes. In the following, data based control and RL are used synonymously.

**Best Practices for the Design and Testing of Data-based Control Techniques.** Current approaches for RL typically rely on approximating a global solution of the Dynamic Programming problem with an appropriate approximation architecture such as Deep Neural Networks. However, there are alternate approaches (see the talk abstracts of Russell, Mesbahi, and Chakravorty) that rely on a simple parametrization of the feedback law, typically a nominal open loop sequence allied with a linear feedback law, that can be shown to lead to very accurate and reliable solutions, both theoretically as well as empirically. Thus, an objective of the workshop will be to take a critical look at the relationship between these local and global approaches, and their connection to the structure of the underlying nonlinear optimal control problem. We would also like to shed light onto the true capabilities of these approaches and in the process understand the connections to traditional computational nonlinear optimal control techniques such as Differential Dynamic Programming (DDP). A related outcome will be how to empirically test these RL algorithms. Currently, they are evaluated solely in terms of their learning/ training efficiency, however, their reliability, accuracy, and closed-loop performance goes untested. Thus, a principled set of approaches for testing that can interrogate the reliability, accuracy, and closed-loop performance will be a goal.**Data-based Partially Observed Control.** Most RL approaches currently require access to the full state of the system, however, this is almost never the case in most applications. In this context, the notion of an information state wherein a finite past history of the system is substituted for the unobservable state (see the talk abstracts of Mahajan, Mehta and Goyal) seems to be a natural approach to generalize the control synthesis to partially observed problems, both in discrete and continuous state/ control spaces. A deeper look at how we can construct such approximate information states based on past data, and the necessary length of such data, and their utilization for control synthesis will be one of the primary goals of the workshop. In this context, a goal will be to tie the information state approach to traditional output feedback control, and if/ how such data-based approaches can offer a solution to nonlinear output feedback problems? Synergies between the discrete and continuous state/ control space variants will be explored.**Data-based** **Robust Control**. Most RL approaches use simulation models for the control synthesis, however, there is always a “sim to reality” gap. Thus, in this context, we shall look at both Bayesian and worst-case approaches for the design of robust techniques to account for such model uncertainty (see the talk abstracts of Kalathil and Subramanian). In a sense, this topic is perhaps the most salient as any real application will require that we address this challenge satisfactorily. Thus, a primary objective of the workshop will be to “start to address” the fundamental issue of model uncertainty in nonlinear problems, and the constraints that it may impose on control synthesis.

**Taxonomy of Data-based Control Approaches.** One of the challenges a new entrant to the field faces is the wide array of algorithms that are available to solve the RL problem. Perhaps even more fundamentally, what does solving an RL problem actually mean? What is the connection of these techniques to traditional fields of control such as optimal control, nonlinear control and adaptive control? A goal of this workshop will be to “begin” to address these questions and generate a classification/ taxonomy of these approaches and how they relate to traditional control synthesis techniques, i.e., identify the “forest from the trees”.

## Expected Attendance:

We expect this workshop to be of broad interest to the Control community, both academic as well as industry. Given the increasing interest towards learning and data-based control in the past several years, and given the need to scale control synthesis beyond traditional applications, we believe that this workshop can play a valuable role in shedding light on the capabilities and limitations of data based control approaches while providing a venue to discuss future directions for research in the field. All talks will also have a tutorial component, and can be of value to new researchers and practitioners who would like to contribute to this research field.

## Speakers:

**Ryan Russell, University of Texas, Austin.***Nonlinear Optimal Control with Differential Dynamic Programming* :

An overview of a Differential Dynamic Programming (DDP) algorithm will be presented, along with applications ranging from spacecraft to articulated bodies. The algorithm exploits full second order models of the constraints and dynamics, and benefits from several new advances including dynamic safeguards, quasi-newton approximations and a formulation that is parallelizable and allows for multi-shooting. The second order DDP algorithms are known to be robust, yet are expensive to implement and compute. However, these shortcomings can be mitigated with improved formulations, especially for applications where second order partial derivatives can be efficiently computed. In this talk, we will explore several such applications where DDP methods are attractive for solving highly nonlinear optimal control problems, both for off-line trajectory planning, and potentially for on-line autonomous control.

BIO: Ryan P. Russell is a Professor in the Department of Aerospace Engineering and Engineering Mechanics at The University of Texas at Austin. His research areas of interest include orbit mechanics, numerical optimization, trajectory design, and spacecraft dynamics. He served on the Georgia Institute of Technology faculty before joining the UT ASE/EM faculty in 2012. He is a Fellow of the American Astronautical Society (AAS), an Associate Fellow of AIAA, and is the former chair of the AIAA Astrodynamics Technical Committee. He is an associate editor for the Journal of Optimization Theory and Applications; AIAA’s Journal of Guidance Control and Dynamics; and Celestial Mechanics and Dynamical Astronomy.

**Mehran Mesbahi, University of Washington, Seattle.***Calculus and Geometry of First Order Methods for Direct Policy Optimization* :

First order methods have widely been adopted in design and optimization of large-scale dynamic systems. In recent years, there has been a surge of renewed research activities in adopting first order methods and their variants for direct policy optimization in feedback synthesis, providing an effective bridge to explore how model and data fidelity effect stability and performance of synthesized dynamics systems. Such a perspective has also opened up a host of theoretical and computational questions at the heart of control theory. In this talk, I will provide an overview of this active area of research, followed by a host of related analytic and geometric insights and questions, as well as challenges and opportunities at the Intersection of learning and control inspired by this line of work.

BIO: Mehran Mesbahi is the J. Ray Bowen Endowed Professor of Aeronautics and Astronautics, Adjunct Professor of Electrical and Computer Engineering and Mathematics at the University of Washington in Seattle, member of the Washington State Academy of Sciences, and Executive Director of Joint Center for Aerospace Technology Innovation. He is a Fellow of IEEE and recipient of NASA Space Act Award, University of Washington Distinguished Teaching Award, and University of Washington College of Engineering Innovator Award. He is the co-author of the book “Graph Theoretic Methods in Multiagent Networks” published by Princeton University Press. His research interests are distributed and networked aerospace systems, autonomy, control theory, and learning.

**Prashant Mehta, University of Illinois, Urbana Champaign**

This talk is concerned with optimal control problems for control systems in continuous time, and interacting particle system methods designed to construct approximate control solutions. Particular attention is given to the linear quadratic (LQ) control problem. There is a growing interest in revisiting this classical problem, in part due to the successes of model-based RL. The main question of this body of research (and also of our work) is to approximate the optimal control law without explicitly solving the Riccati equation. In this talk, a novel simulation-based algorithm, namely a dual ensemble Kalman filter (EnKF), is introduced. The algorithm is used to obtain formulae for optimal control, expressed entirely in terms of the EnKF particles. An extension to the nonlinear case is also presented. The theoretical results and algorithms are illustrated with numerical experiments against the state-of-the-art.

BIO:

Prashant Mehta is a Professor in the Coordinated Science Laboratory and the Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign (UIUC). He received his Ph.D. in Applied Mathematics from Cornell University in 2004. He was the co-founder and the Chief Science Officer of the startup Rithmio whose gesture recognition technology was acquired by Bosch Sensortec in 2017. His students have received the Best Student Paper Awards at the IEEE Conference on Decision and Control in 2007, 2009 and most recently in 2019; and have been finalists for these awards in 2010 and 2012. He serves as an Associate Editor for the IEEE Transactions on Automatic Control (2019-), the Systems and Control Letters (2011-14), and the ASME Journal of Dynamic Systems, Measurement and Control (2012-16).

**Aditya Mahajan, McGill University, Montreal, Canada.***Approximation and learning for partially observed systems* :

Reinforcement learning (RL) provides a conceptual framework for designing agents which learn to act optimally in unknown environments. RL has been successfully used in various applications ranging from robotics, industrial automation, finance, healthcare, and natural language processing. The success of RL is based on a solid foundation of combining the theory of exact and approximate Markov decision processes (MDPs) with iterative algorithms that are guaranteed to learn an exact or approximate action-value function and/or an approximately optimal policy. However, for the most part, the research on RL theory is focused on systems with full state observations.

In various applications including robotics, finance, and healthcare, the agent only gets a partial observation of the state of the environment. In this talk, I will present a new framework for approximate planning and learning for partially observed systems based on the notion of approximate information state. The talk will highlight the strong theoretical foundations of this framework, illustrate how many of the existing approximation results can be viewed as a special case of approximate information state, and provide empirical evidence which suggests that this approach works well in practice.

BIO: Aditya Mahajan is an Associate Professor of Electrical and Computer Engineering at McGill University, Montreal, Canada. He received the B.Tech degree in Electrical Engineering from the Indian Institute of Technology, Kanpur, India, and the MS and PhD degrees in Electrical Engineering and Computer Science from the University of Michigan, Ann Arbor, USA.

He is a senior member of the IEEE and a member of Professional Engineers Ontario. He currently serves as Associate Editor of IEEE Transactions on Automatic Control and Springer Mathematics of Control, Signal, and Systems.

He is the recipient of the 2015 George Axelby Outstanding Paper Award, the 2016 NSERC Discovery Accelerator Award, the 2014 CDC Best Student Paper Award (as supervisor), and the 2016 NecSys Best Student Paper Award (as supervisor). His principal research interests include decentralized stochastic control, team theory, reinforcement learning, multi-armed bandits and information theory.

**Dileep Kalathil, Texas A&M University, College Station, TX.**

ABSTRACT: Reinforcement Learning (RL) is the class of machine learning that addresses the problem of learning to control unknown dynamical systems. RL has achieved remarkable success recently in applications like playing games and robotics. However, most of these successes are limited to very structured or simulated environments. When applied to real-world systems, RL algorithms face two fundamental sources of fragility. First, the real-world system parameters can be very different from that of the nominal values used for training RL algorithms. Second, the control policy for any real-world system is required to maintain some necessary safety criteria to avoid undesirable outcomes. Most deep RL algorithms overlook these fundamental challenges which often results in learned policies that perform poorly in the real-world settings. In this talk, I will present two approaches to overcome these challenges. First, I will present an RL algorithm that is robust against the parameter mismatches between the simulation system and the real-world system. Second, I will discuss a safe RL algorithm to learn policies such that the frequency of visiting undesirable states and expensive actions satisfies the safety constraints. I will also briefly discuss some practical challenges due to the sparse reward feedback and the need for rapid real-time adaptation in real-world systems, and the approaches to overcome these challenges.

BIO: Dileep Kalathil is an Assistant Professor in the Department of Electrical and Computer Engineering at Texas A&M University (TAMU). His main research area is reinforcement learning theory and algorithms, and their applications in communication networks and power systems. Before joining TAMU, he was a postdoctoral researcher in the EECS department at UC Berkeley. He received his Ph.D. from University of Southern California (USC) in 2014, where he won the best Ph.D. Dissertation Prize in the Department of Electrical Engineering. He received his M. Tech. from IIT Madras, where he won the award for the best academic performance in the Electrical Engineering Department. He received the NSF CRII Award in 2019 and the NSF CAREER award in 2021. He is a senior member of IEEE.

**Vijay Subramanian, University of Michigan, Ann Arbor.***Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space* :

Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies mainly focus on finite state settings, and do not directly apply to these models. To overcome this lacuna, in this work we study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes (MDPs) governed by an unknown parameter $\theta\in\Theta$, and defined on a countably-infinite state space $X=\mathbb{Z}_+^d$, with finite action space $A$, and an unbounded cost function. We take a Bayesian perspective with the random unknown parameter ${\theta}^*$ generated via a given fixed prior distribution on $\Theta$. To optimally control the unknown MDP, we propose an algorithm based on Thompson sampling with dynamically-sized episodes: at the beginning of each episode, the posterior distribution formed via Bayes’ rule is used to produce a parameter estimate, which then decides the policy applied during the episode. To ensure the stability of the Markov chain obtained by following the policy chosen for each parameter, we impose ergodicity assumptions. From this condition and using the solution of the average cost Bellman equation, we establish an $\tilde O(\sqrt{|\mathcal A|T})$ upper bound on the Bayesian regret of our algorithm, where $T$ is the time-horizon. Finally, to elucidate the applicability of our algorithm, we consider two different queuing models with unknown dynamics, and show that our algorithm can be applied to develop approximately optimal control algorithms. This is joint work with Saghar Adler from the University of Michigan.

BIO: Vijay Subramanian received the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 1999. He worked at Motorola Inc., at the Hamilton Institute, Maynooth, Ireland, for many years, and also in the EECS Department, Northwestern University, Evanston, IL, USA. In Fall 2014, he started in his current position as an Associate Professor with the EECS Department at the University of Michigan, Ann Arbor. His research interests are in stochastic analysis, random graphs, multi-agent systems, and game theory (mechanism and information design) with applications to social, economic and technological networks.

**Raman Goyal (Co-Organizer), Palo Alto Research Center, SRI International***An Information-state based Approach to the Optimal Output Feedback Control of Nonlinear Systems*:

This work develops a data-based approach to the closed-loop output feedback control of nonlinear dynamical systems with a partial nonlinear observation model. We propose an “information-state” based approach to rigorously transform the partially observed problem into a fully observed problem where the information-state consists of the past several observations and control inputs. We further show the equivalence of the transformed and the initial partially observed optimal control problems and provide the conditions to solve for the deterministic optimal solution. We develop a data-based generalization of the iterative Linear Quadratic Regulator (iLQR) to partially-observed systems using a local linear time-varying model of the information-state dynamics approximated by an Autoregressive–moving-average (ARMA) model, that is generated using only the input-output data. This open-loop trajectory optimization solution is then used to design a local feedback control law, and the composite law then provides an optimum solution to the partially observed feedback design problem. The efficacy of the developed method is shown by controlling complex high dimensional nonlinear dynamical systems in the presence of model and sensing uncertainty.

Bio: Raman Goyal earned his Ph.D. in Aerospace Engineering from Texas A&M University, College Station, and his B.Tech. degree in Mechanical Engineering from IIT Roorkee, India. He is currently employed as a Research Scientist at Palo Alto Research Center (PARC), part of SRI International.

Raman is interested in intelligent learning approaches for optimal control of stochastic nonlinear systems. He has also worked on modeling, design, and control of deployable and deformable tensegrity-based robots.

**Suman Chakravorty (Co-organizer), Texas A&M University, College Station.***The Structure of Optimal Nonlinear Feedback Control and its Implications.*

ABSTRACT: The optimal nonlinear feedback law can be synthesized via the solution of an associated Dynamic Programming (DP) problem. Unfortunately, the DP problem is thought to be fundamentally intractable owing to Bellman’s infamous “Curse of Dimensionality”. We show that the deterministic problem has a perturbation structure in that higher order terms in the feedback expansion do not affect the lower order terms implying we can get accurate local solutions. We also show that the deterministic feedback law is near optimal to fourth order in a small noise parameter to the true global stochastic optimal policy. We show that satisfying the Minimum Principle is sufficient to obtain the globally optimal open loop solution for deterministic nonlinear control problems, under mild assumptions, which then determines all the higher order feedback terms, the equations for which are derived. Furthermore, we show that the perturbation structure is lost in the stochastic problem and empirical results show that solving the stochastic DP problem is highly susceptible to errors, even for very low dimensional problems, and in practice, the deterministic feedback law offers superior performance. %Next, we consider the partially observed optimal control problem and transform it into an equivalent fully observed problem via the construction of an “information state”. We derive a “Minimum Principle” for the partially observed optimal control in the information state. We develop a linear time varying (LTV) system identification approach in terms of the information state and use it to develop an Iterative LQR (ILQR) based solution to the partially observed optimal control problem that is globally optimum.

We consider the implications on the problem of Reinforcement Learning (RL). Most RL techniques search over a complex global nonlinear feedback parametrization making them suffer from high training times as well as solution variance. Instead, we advocate searching over a local feedback representation consisting of an open-loop sequence, and an associated optimal linear feedback law completely determined by the open-loop. We show that this alternate approach, termed decoupled data-based control (D2C) results in highly efficient training, the answers obtained are globally optimum locally, have negligible variance, and the resulting closed loop performance is superior to global state of the art RL techniques. %If we replan whenever required, similar to Model Predictive Control (MPC), which is feasible due to the fast and reliable local solution, it allows us to recover the global feedback law.\ %We show the generalization of the D2C approach to partially observed problems via the information state construct. We present applications to complex Robotic Control problems including Swimming Robots and Tensegrity Robots as well as Material Microstructure control.

BIO: Suman Chakravorty obtained his B.Tech in Mechanical Engineering in 1997 from the Indian Institute of Technology, Madras and his PhD in Aerospace Engineering from the University of Michigan, Ann Arbor in 2004. He joined the Aerospace Engineering Department at Texas A\&M University, College Station, in August 2004 as an Assistant Professor, where he is currently a Professor. Dr. Chakravorty’s research interests lie in the estimation and control of stochastic dynamical systems with application to robotic planning and control, materials processing and space situational awareness (SSA) problems. He has served as an an Associate Editor for the ASME Journal on Dynamical Systems, Measurement and Control and the IEEE Robotics and Automation Letters. He has also served on the Member Activities Board (MAB) of the IEEE Robotics and Automation Society (RAS).

## Schedule:

The workshop will be organized along t`he`

“data-based”, “Nonlinear”, “Partially Observed” `and`

“Robust” themes. We will have two speakers for each topic. Each talk will be for one hour with 45 minutes for the speaker and 15 minutes for Q&A. The tentative schedule will be as follows.

8 AM – 9 AM. | Ryan Russell, U. Texas (Nonlinear) |

9 AM – 10 AM. | Mehran Mesbahi, U. Washington (Nonlinear) |

10 AM – 11 AM. | Suman Chakravorty, TAMU (Nonlinear) |

11 AM – 12 PM. | Aditya Mahajan, McGill (Partially Observed) |

12 PM – 1 PM. | Lunch Break |

1 PM – 2 PM. | Raman Goyal, PARC (Partially Observed) |

2 PM – 3 PM. | Dileep Kalthil, TAMU (Robust) |

3 PM – 4 PM. | Vijay Subramanian, U. Michigan (Robust) |

4 PM – 5 PM. | Prashant Mehta, U. Illinois (Partially Observed) |

5 PM – 6.30 PM. | Panel Discussion. |

**Registration details: https://acc2024.a2c2.org/registration/registration **