of 12
Proceedings of Machine Learning Research vol 144:1–12, 2021
3rd Annual Conference on Learning for Dynamics and Control
Stable Online Control of Linear Time-Varying Systems
Guannan Qu
GQU
@
CALTECH
.
EDU
Yuanyuan Shi
YSHI
7@
CALTECH
.
EDU
Sahin Lale
ALALE
@
CALTECH
.
EDU
Anima Anandkumar
ANIMA
@
CALTECH
.
EDU
Adam Wierman
ADAMW
@
CALTECH
.
EDU
California Institute of Technology, Pasadena, CA
Equal contribution
Editors:
A. Jadbabaie, J. Lygeros, G. J. Pappas, P. A. Parrilo, B. Recht, C. J. Tomlin, M. N. Zeilinger
Abstract
Linear time-varying (LTV) systems are widely used for modeling real-world dynamical systems
due to their generality and simplicity. Providing stability guarantees for LTV systems is one of the
central problems in control theory. However, existing approaches that guarantee stability typically
lead to significantly sub-optimal cumulative control cost in online settings where only current or
short-term system information is available. In this work, we propose an efficient online control
algorithm, COvariance Constrained Online Linear Quadratic (COCO-LQ) control, that guarantees
input-to-state stability for a large class of LTV systems while also minimizing the control cost. The
proposed method incorporates a state covariance constraint into the semi-definite programming
(SDP) formulation of the LQ optimal controller. We empirically demonstrate the performance of
COCO-LQ in both synthetic experiments and a power system frequency control example.
Keywords:
Time-varying systems, online linear quadratic control, stability guarantee
1. Introduction
Time-invariant systems have traditionally been the main focus of the study for the linear dynamical
systems community. However, real-world systems are often
time-varying.
For example, consider a
power system that includes renewable generation (e.g. solar/wind). Due to the intermittency of re-
newable energy, the system dynamics for frequency regulation in the power system are time-varying.
Applying a time-invariant controller in this setting may lead to frequency instability and line fail-
ures (Ulbig et al., 2014). Time-varying systems are also crucial for many other applications, such as
autonomous vehicles and aircraft control (Falcone et al., 2008). While not all time-varying systems
have linear dynamics, many applications can be approximated by linear time-varying (LTV) systems
via a local linear approximation at each time step (Todorov and Li, 2005), e.g., the frequency control
example described above. As a result, LTV systems are widely-used and there is a large literature
focused on designing controllers for LTV systems (Amato et al., 2010; Ouyang et al., 2017).
Perhaps the most fundamental challenge in dynamical systems is stability. While the design
of stable linear time-invariant (LTI) systems is well understood, the same cannot be said for LTV
systems. To this point, several notions of stability have recieved attention, e.g., input-to-state stabil-
ity (ISS), mean-square stability and Lyapunov stability. ISS is the most widely adopted notion and
aims to guarantee the boundedness of the state given bounded initial conditions (Hong et al., 2010).
© 2021 G. Qu, Y. Shi, S. Lale, A. Anandkumar & A. Wierman.
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
In most applications of LTV systems, it is crucial to guarantee ISS both in order to avoid saturation,
maintain the robustness and validity of linearization (Tarbouriech et al., 2006; Khalil, 2002).
While there is considerable prior work focused on stability in LTV systems, most prior work
studies stability in the offline setting where either the sequence of system parameters are known,
e.g., (Amato et al., 2010; Li et al., 2019a), or the system parameters have a particular variation
pattern, e.g., (Garcia et al., 2009). Maintaining ISS guarantees becomes significantly harder in
the online setting where the system parameters are observed in real-time and may have arbitrary
variations. This online setting is the most relevant to many applications, e.g., frequency regulation.
Though stability is crucial, it is not enough for a controller to be stable. A controller must also
have low cost. For instance, in order to stabilize the dynamics, a controller may use arbitrarily big
control inputs, which may result in sub-optimal cost. In classical optimal control problems, e.g. the
time-varying linear quadratic (LQ) control setting, the goal is to design a stabilizing controller that
minimizes the cost for a particular finite horizon while assuming access to the whole trajectory for
that duration. It is possible to characterize the optimal policy in such settings (Bertsekas et al., 1995);
however, in the online setting when only current or short-termed system information is available,
these methods may not guarantee stability, e.g., see Section 3. There have been recent efforts to
provide sub-optimality guarantees on the acquired cost in the online LTV setting, e.g., (Gradu et al.,
2020), but it is unclear if the proposed controllers maintain stability for all time-steps since the main
focus is on minimizing the cumulative cost.
Thus, despite considerable recent work, much remains to be understood about the design of
online LTV controllers. In particular, this paper is motivated by the following question:
Is it possible for an online controller to guarantee stability and maintain low cost in LTV systems?
Contributions.
In this work, we answer question above affirmatively. Specifically, we propose
Co
variance
C
onstrained
O
nline
L
inear
Q
uadratic (COCO-LQ) control, a novel online control al-
gorithm that aims to minimize the control cost while ensuring provable stability guarantees in LTV
systems without restricting how slow or fast the underlying system changes. Further, we demon-
strate the performance of the proposed method in various synthetic LTV systems and in the power
system frequency control example that motivated our study.
The main technical contribution of the paper is a stability guarantee for COCO-LQ in LTV sys-
tems. Specifically, we show that COCO-LQ guarantees ISS in online time-varying systems. The
key technique that underpins the proposed algorithm is the addition of a novel semi-definiteness
constraint on the state covariance matrix into the standard online semi-definite programming (SDP)
formulation of linear quadratic optimal control. We show that this constraint promotes the sequen-
tial strong stability of the controllers (Cohen et al., 2018), which in turn guarantees ISS with a proper
choice of an algorithm hyperparameter. Adding this additional constraint is simple and does not re-
sult in a significant increase of computational complexity compared to the standard LQ formulation.
Moreover, we prove that if the proposed SDP is not directly feasible, short-term predictions on the
future system parameters are necessary and can be used in COCO-LQ in order to ensure ISS.
Related work.
The work in this paper builds on the design of linear time-invariant (LTI) con-
trollers to provide a new approach for the design of stable controllers for linear-time-varying (LTV)
systems. As such, we describe related work on both LTI and LTV systems below.
LTI Systems.
In study of control of LTI systems, linear quadratic regulator (LQR) has been con-
sidered in detail. In the classical setting where the underlying system is known, the optimal control
2
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
law is given by a linear feedback controller obtained by solving Riccati equations (Bertsekas et al.,
1995). Alternatively, the optimal control problem can also be posed via semi-definite programming
(SDP) (Vandenberghe and Boyd, 1996), which is the approach we build on in the current paper.
Recently, there has been growing interest in online control of these linear systems when the un-
derlying dynamics are unknown. Most of these works study the problem with a regret minimization
perspective, e.g., (Abbasi-Yadkori and Szepesv
́
ari, 2011; Dean et al., 2018; Lale et al., 2020a,b).
However, these methods have so far only been applied in LTI systems with time-varying costs and
disturbances. Extensions to LTV dynamics, which are the focus of this paper, are not known.
LTV Systems.
As in the case of LTI systems, optimal control of LTV systems where the se-
quence of system parameters can be obtained by solving backwards Riccati equations (Bertsekas
et al., 1995). However, in the online case when the sequence of systems is unknown, the design of
controllers is challenging. There are several lines of work in adaptive control and model-predictive
control (MPC) that have been studied to this point. In adaptive control of LTV systems, the un-
derlying systems are unknown and the results generally assume slow and bounded or fixed system-
atic variation of dynamics with bounded disturbances (Middleton and Goodwin, 1988; Marino and
Tomei, 2000; Ouyang et al., 2017). In MPC of LTV systems, a finite horizon of sequence of systems
(predictions) is known and the system is again assumed to be slowly varying or open-loop stable,
e.g., (Zheng and Morari, 1994; Falcone et al., 2007). Different from prior works, in the current work
we consider the online problem and make no assumptions about how the system varies over time.
As in the LTI setting, the study of regret minimization in LTV systems has recently received
attention. Goel and Hassibi (2020); Gradu et al. (2020) are most related to the current paper. Goel
and Hassibi (2020) considers the setting where the sequence of systems is known and provides
regret-optimal controller framework. Gradu et al. (2020) studies the adaptive regret of online control
in LTV systems with bounded cost. Note that when the cost is bounded, a finite regret need not
guarantee stability. In contrast, we use a quadratic (unbounded) cost and we can guarantee stability.
Notation.
We denote the Euclidean norm of a vector
x
as
x
. For a matrix
A
,
A
is its spectral
norm,
A
>
is its transpose, and
Tr(
A
)
is its trace.
N
(
μ,
Σ)
denotes normal distribution with mean
μ
and covariance
Σ
.
A

B
and
A

B
denote that
A
B
is positive definite and positive semi-
definite respectively.
A
B
denotes the element-wise inner product of
A
and
B
,
i.e.
,
Tr(
A
>
B
)
.
2. Model & Background
We consider the following linear time-varying (LTV) system,
x
t
+1
=
A
t
x
t
+
B
t
u
t
+
w
t
,
(1)
where
x
t
R
d
is the system state,
u
t
R
p
is the control input and
w
t
R
d
is the disturbance at
time
t
. The system is stochastic,
i.e.
,
w
t
∼ N
(0
,W
)
for
W

0
. The cost at each time-step is a
quadratic function of the state and control,
x
>
t
Qx
t
+
u
>
t
Ru
t
, where
Q,R

0
.
The decision maker operates in an online setting. That is, at each time-step
t
, the learner
observes the state
x
t
and system matrix
(
A
t
,B
t
)
before choosing action
u
t
and suffering cost
x
>
t
Qx
t
+
u
>
t
Ru
t
. We assume that the cost matrices
(
Q,R
)
are time-invariant and known to the
learner. However, future system matrices
(
A
t
+1
,...,A
T
)
and
(
B
t
+1
,...,B
T
)
are unknown to the
learner and are chosen by the environment, potentially stochastically or adversarially.
3
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
Stability.
One of the most central goals for controller design is to ensure stability. In this work,
we focus on the notion of input to state stability (ISS) and strive to design controllers that provide
ISS. ISS has been the main notion of stability considered in designing stabilizing controllers both
in linear and nonlinear systems (Hong et al., 2010; Sontag, 2008; Jiang and Wang, 2001). To
formally define ISS, let
K
be the set of functions from nonnegative reals to nonnegative reals that
are continuous, strictly increasing, and bijective. Then, ISS is defined as follows.
Definition 1 (ISS)
A LTV system with deterministic policy
A
is said to be input to state stable if
there exists functions
β
1
: [0
,
)
×
N
[0
,
)
and
β
2
∈K
that satisfy
β
1
(
·
,t
)
∈K
for any
t
N
,
lim
t
→∞
β
1
(
a,t
) = 0
for any
a
0
such that, for any disturbance sequence
{
w
t
}
t
=0
, any initial
time
t
0
, any initial state
x
t
0
, and any
t
t
0
, we have
x
t
‖≤
β
1
(
x
t
0
,t
t
0
) +
β
2
(sup
t
N
w
t
)
.
Cost.
In addition to stability, another important objective for controller design is maintaining a
small, near-optimal control cost. Here we adopt the standard linear quadratic (LQ) cost model,
i.e.
,
J
T
(
A
) = lim
T
→∞
1
T
E
[
T
t
=1
x
>
t
Qx
t
+
u
>
t
Ru
t
]
,
(2)
where
u
1
,...,u
t
are chosen according to policy
A
, and the expectation is taken with respect to the
randomness of noise sequence
w
t
.
In this work, our goal is to ensure both stability and near-optimal cost. It should be noted that
there is a trade-off between these two goals. On the one hand, a stabilizing controller without cost-
awareness may produce arbitrarily large control inputs and induce high cost, which is impractical
to implement. On the other hand, a greedy approach that merely focuses on cost minimization may
lead to instability, as we highlight in the Section 3 below.
Though our focus is on LTV systems, our approach builds on the SDP formulation of the optimal
controller for LTI systems in (Vandenberghe and Boyd, 1996).
Proposition 2
(Vandenberghe and Boyd, 1996) When
A
t
=
A,B
t
=
B
and
(
A,B
)
is controllable,
the optimal
K
=
LQR
(
A,B,Q,R
)
where
u
t
=
K
x
t
, can be obtained by the following SDP
min
Σ

0
[
Q
0
0
R
]
Σ
s.t.
Σ
xx
=
[
A
t
B
t
]
Σ
[
A
t
B
t
]
>
+
W,
which has a unique symmetric solution
Σ
that decomposes to the following blocks
Σ
=
[
Σ
xx
Σ
xu
Σ
xu
>
Σ
uu
]
,
where
Σ
xx
R
d
×
d
,
Σ
xu
R
d
×
p
and
Σ
uu
R
p
×
p
. Then, the optimal controller is
K
= Σ
xu
>
xx
)
1
.
The optimal LQR controller described above both stabilizes the system and achieves the minimum
cost. The current paper makes a step toward understanding if it is possible to extend this formulation
to the case of LTV systems.
3. A Naive Approach
How to achieve stable, cost-optimal control of LTI systems is well-known; however this is not the
case in LTV systems. To illustrate the challenge of online control of LTV systems, we start by
studying the performance of a naive “plug in” approach where upon receiving
(
A
t
,B
t
)
an optimal
controller for
A
t
,B
t
is computed under the assumption that the system is time-invariant. Due to its
4
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
simplicity, this approach has been employed in many contexts, e.g. Li et al. (2019b) for a Markov
decision process setting. In this section we provide an example which shows that such a myopic
approach based on optimal LTI control described above fails to stabilize the system even in simple
settings where
A
t
can only switch between two possible choices and
B
t
is fixed. This highlights that
one cannot naively apply LTI design approaches in LTV systems and expect to maintain stability.
Example 1
Consider a system with
Q
=
I
,
R
=
I
,
w
t
= 0
, and
A
=
[
ρ
0
a ρ
]
,A
=
[
ρ a
0
ρ
]
,
where
0
<ρ<
1
, and
a>
2
. Suppose
A
t
alternates between
A
and
A
and
B
t
=
B
=
I
. Define the
optimal LTI controllers for
A
and
A
as
K
:=
LQR
(
A,B,Q,R
)
and
K
:=
LQR
(
A
,B,Q,R
)
.
To show that the optimal LTI controllers will not stabilize the system, we consider a case where

0
. In this case, one can check that
K,K
0
. Since
A
t
alternates between
A,A
,
K
t
also
alternates between
K
and
K
under the myopic design we are considering. Thus, the system state
follows
x
t
+2
= (
A
+
K
)(
A
+
K
)
x
t
.
Notice that as

0
,
(
A
+
K
)(
A
+
K
)
AA
=
[
ρ
2
aρ a
2
+
ρ
2
]
. Here,
AA
is unstable since its largest eigenvalue is greater than
1
2
Tr(
AA
) =
ρ
2
+
a
2
2
>
1
. Thus, for small enough

, the naive strategy that uses the LTI controller at each
time-step leads to instability.
4. Main Result
The previous section highlights that a naive application of LTI control cannot guarantee stability for
LTV systems. We now propose a new approach, COvariance Constrained Online LQ (COCO-LQ)
control (Section 4.1). Our main technical result shows that COCO-LQ provably guarantees stability
in LTV systems (Section 4.2) when the SDP is feasible. In Section 4.3, we discuss how to handle
the situation when the SDP is infeasible and Section 4.4 discuss the effect of model estimation error.
Detailed proofs could be found in the Appendix of our online report Qu et al. (2021).
4.1. COvariance Constrained Online LQ (COCO-LQ)
The naive approach discussed in Section 3 seeks to solve the LTI problem at every time step, which
is equivalent to solving the SDP in Proposition 2 for every
(
A
t
,B
t
)
. The reason this method fails
is that it only considers cost minimization without explicitly considering stability. The main idea of
COCO-LQ is to enforce stability via a state covariance constraint embedded into the SDP frame-
work. The proposed algorithm is stated formally in Algorithm 1. COCO-LQ solves an SDP (3) at
each time step that is similar to that in Proposition 2. The crucial difference is the new constraint
(3d), which involves parameter
α
. Plugging (3b) into constraint (3d) yields the following:
Σ
xx

1
1
α
W.
This highlights that constraint (3d) can be interpreted as an upper bound on the state covariance
matrix
Σ
xx
. When
α
= 0
, the controller essentially cancels out the dynamics, without taking into
account the cost of doing so. This ensures stability but can lead to large cost. At another extreme,
when
α
1
, the SDP solved at each time step is the same as for the LTI setting, and so COCO-
LQ matches the naive approach in Section 3. Thus,
α
trades off between stability and cost. In the
following section, we show that this novel state covariance constraint promotes sequential strong
stability (Cohen et al., 2018), which in turn guarantees ISS with a proper choice of
α
.
5
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
Algorithm 1:
COCO-LQ: COvariance Constrained Online LQ
Parameters:
α
[0
,
1)
Input:
Q,R,W

0
1
for
t
= 1
,
2
,...
do
2
Receive
state
x
t
, and system parameter
A
t
,B
t
3
Compute policy:
Let
Σ
t
R
n
×
n
be an optimal solution to the SDP program:
minimize
[
Q
0
0
R
]
Σ
(3a)
subject to
Σ
xx
=
[
A
t
B
t
]
Σ
[
A
t
B
t
]
>
+
W
(3b)
Σ

0
(3c)
[
A
t
B
t
]
Σ
[
A
t
B
t
]
>

α
Σ
xx
(3d)
and
K
t
= Σ
>
xu
Σ
1
xx
4
Play
u
t
=
K
t
x
t
5
Update
x
t
+1
=
A
t
x
t
+
B
t
u
t
+
w
t
,w
t
N
(0
,W
)
6
end
4.2. Stability
We now state our main technical result, which provides a formal stability guarantee for COCO-LQ.
Theorem 3
Let
0
α <
1
/
2
, and suppose
(3)
is feasible for all
t
, then the resulting dynamical
system satisfies ISS in the sense that for any disturbance sequence
{
w
t
}
t
=0
and for any
t
t
0
,
x
t
‖≤
ρ
t
t
0
x
t
0
+
κρ
1
ρ
sup
t
0
k<t
w
k
for
ρ
=
α
1
α
[0
,
1)
and
κ
=
κ
W
1
α
, where
κ
W
=
W
‖‖
W
1
is the condition number of
W
.
The key intuition underlying this result is that the additional state covariance constraint (3d) im-
plicitly enforces sequential strong stability (Cohen et al., 2018), which in turn ensures ISS. More
formally, sequential strong stability is defined as follows,
Definition 4
(Sequential Strong Stability) A sequence of policies
K
1
,K
2
,...,
such that
u
t
=
K
t
x
t
is
(
κ,γ,ρ
)
-sequential strongly stable (for
κ >
0
,
0
< γ
1
and
0
ρ <
1
) if there exist matrices
H
1
,H
2
,...,
and
L
1
,L
2
...,
such that
A
t
+
B
t
K
t
=
H
t
L
t
H
1
t
for all
t
, with the following properties:
(a)
L
t
‖≤
1
γ
; (b)
H
t
‖≤
β
1
and
||
H
1
t
||≤
1
2
with
κ
=
β
1
2
; (c)
H
1
t
+1
H
t
‖≤
ρ
1
γ
.
The following lemma formalizes the connection between (3d) and sequential strong stability.
Lemma 5
Under the conditions in Theorem 3, the policies designed by COCO-LQ are
(
κ,γ,ρ
)
-
sequential strongly stability for
κ
=
κ
W
1
α
= 1
α,ρ
=
α
1
α
where
κ
W
=
W
‖‖
W
1
.
With Lemma 5, proving the result in Theorem 3 only requires showing that sequential strong sta-
bility implies ISS. The complete proof of Lemma 5 and Theorem 3 are given in Appendix A of our
online report Qu et al. (2021). A critical assumption in Theorem 3 is that the SDP given in (3) is
feasible for
0
α <
1
/
2
. The following result shows that when
B
t
is full row rank, the problem is
always feasible. The proof of Lemma 6 is postponed to Appendix B in Qu et al. (2021).
6
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
Algorithm 2:
COCO-LQ-Prediction: COVariance Constrained Online LQ with Predictions
Parameters:
α
[0
,
1)
Input:
Q,R,W

0
1
for
t
= 1
,
2
,...
do
2
if
t
1 (mod
H
)
then
3
Receive
state
x
t
, and system parameters
(
A
t
,B
t
)
,...,
(
A
t
+
H
1
,B
t
+
H
1
)
4
Compute policy:
Let
Σ
t
R
n
×
n
be a solution to the constrained SDP in (3) with
(
R,A
t
,B
t
)
replaced by
(
̃
R,
̃
A
t
,
̃
B
t
)
, where
̃
R
=
R
.
.
.
R
(
H
repeating blocks),
̃
A
t
:=
A
t
+
H
1
···
A
t
,
̃
B
t
:= [
B
t
+
H
1
,A
t
+
H
1
B
t
+
H
2
,
···
,A
t
+
H
1
...A
t
+1
B
t
]
5
Set
K
t
= Σ
>
xu
Σ
1
xx
and
[
u
>
t
+
H
1
,...,u
>
t
]
>
=
K
t
x
t
6
end
7
Play
Implement the planned control action
u
t
8
Update
x
t
+1
=
A
t
x
t
+
B
t
u
t
+
w
t
,w
t
∼N
(0
,W
)
9
end
Lemma 6
When
B
t
is full row rank, then the SDP
(3)
is always feasible.
Note that having
B
t
full row rank is a sufficient but not necessary condition for feasibility of (3) of
COCO-LQ. When
B
t
is not full row rank, the feasibility assumption may still hold, and therefore
our assumption is weaker than the invertibility assumption used in the literature, e.g. Lai (1986).
More broadly, in Theorem 3,
α <
0
.
5
is a sufficient condition for stability. For
α
0
.
5
, stability
may still hold for some problem instances
(
A
t
,B
t
)
as will be shown in the simulations in Section 5.
How to provide a more refined instance-dependent threshold on
α
is an interesting future direction.
4.3. Infeasibility and the Role of Predictions
We now turn our attention to the case when the SDP given in (3) is infeasible. In this case it is
necessary for the controller to use additional information in order to stabilize the system. In partic-
ular, we provide an example that shows the necessity of predictions when
B
t
is not full row rank
in Appendix C of our online report Qu et al. (2021). This example shows that when
B
t
is not full
row rank, for any (deterministic) online control algorithm that has causal access to system matrices,
there exists a future sequence of
(
A
t
,B
t
)
in which the algorithm cannot stabilize the system. In this
section, we show that using
(
A
t
,B
t
)
together with short-term predictions of future system matrices
is enough to stabilize the system under standard controllability assumptions. Specifically, we extend
COCO-LQ to include future
H
steps of predictions in Algorithm 2. The key idea is to rewrite the
dynamics as
x
t
+
H
=
̃
A
t
x
t
+
̃
B
t
̄
u
t
+ [
I,A
t
+
H
1
,
···
,A
t
+
H
1
...A
t
+1
] ̄
w
t
(4)
where we define
̃
A
t
:=
A
t
+
H
1
···
A
t
,
̃
B
t
:= [
B
t
+
H
1
,A
t
+
H
1
B
t
+
H
2
,
···
,A
t
+
H
1
...A
t
+1
B
t
]
,
̄
u
t
:= [
u
>
t
+
H
1
,u
>
t
+
H
2
,...,u
>
t
]
>
and
̄
w
t
:= [
w
>
t
+
H
1
,w
>
t
+
H
2
,...,w
>
t
]
>
. When
H
is long
enough such that
̃
B
t
is full row rank, we can use Algorithm 1 on
̃
A
t
and
̃
B
t
and avoid the in-
feasibility issue, and our stability guarantee is provided below. The proof of Theorem 7 can be
found in Appendix D of our online report Qu et al. (2021).
7
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
Theorem 7
Suppose for each
t
, matrix
̃
B
t
= [
B
t
+
H
1
,A
t
+
H
1
B
t
+
H
2
,
···
,A
t
+
H
1
...A
t
+1
B
t
]
satisfies
̃
B
t
̃
B
>
t

σI
for some
σ >
0
, and
A
t
‖ ≤
a,
B
t
‖ ≤
b
for some
a,b >
0
. Then, the SDP
in Algorithm 2 is always feasible. Further, when
α <
1
/
2
, the closed-loop system is ISS for any
t
,
x
t
‖≤
κ
A
ρ
t
H
1
x
1
+
κ
A
κ
A
κ
max(1
,
ρ
1
ρ
) sup
1
s<t
w
s
,
where the same as Theorem 3,
ρ
=
α
1
α
[0
,
1)
and
κ
=
κ
W
1
α
with
κ
W
=
W
‖‖
W
1
being
the condition number of
W
; further,
κ
A
= 1 +
a
+
...
+
a
H
1
, and
κ
A
=
a
H
1
+
b
2
(1 +
a
+
···
+
a
H
1
)
2
κ
R
κ
+
a
H
σ
with
κ
R
being the condition number of
R
.
4.4. Estimation Error
In both Algorithm 1 and Algorithm 2, the exact knowledge of state-transition matrices
(
A
t
,B
t
)
or
the extended state-transition matrices
(
̃
A
t
,
̃
B
t
)
are needed when deriving the control actions. In
this section, we show that COCO-LQ can still obtain a stabilizing controller in the case where only
approximations are known, if the estimation error is controlled. Our main result is the following.
Theorem 8
Let
(
ˆ
A
t
,
ˆ
B
t
)
be an estimate of (
A
t
,
B
t
). Given
α
[0
,
1
2
)
, let
ρ
=
α
1
α
,
κ
=
W
‖‖
W
1
1
α
and
γ
= 1
α
. Let
K
1
,K
2
,...
be the policies designed by COCO-LQ for
(
ˆ
A
t
,
ˆ
B
t
)
with parameter
α
. When the estimation error satisfies,
max
{||
ˆ
A
t
A
t
||
2
,
||
ˆ
B
t
B
t
||
2
}≤
δ
γ
κ
(1 +
K
max
)
(5)
where
δ
can be any number in
(0
,
1
α
α
1
α
)
, and
K
max
is any uniform upper bound on
K
t
.
Then, the policies
K
t
are ISS when applied to the system
(
A
t
,B
t
)
,
x
t
‖≤
(
ρ
)
t
t
0
x
t
0
+
κρ
1
ρ
sup
t
0
k<t
w
k
,
where
ρ
=
1
(1
δ
)
γ
1
γ
ρ
(0
,
1)
. Finally, when
ˆ
A
t
‖ ≤
̄
σ
A
,
ˆ
B
t
‖ ≤
̄
σ
B
and
ˆ
B
t
ˆ
B
>
t

σ
2
B
I
, one uniform
upper bound for
K
t
is
K
max
=
κ
R
̄
σ
B
σ
2
B
(
κ
(1
γ
) + ̄
σ
A
)
with
κ
R
=
R
‖‖
R
1
.
A proof of Theorem 8 is provided in Appendix E in our online report Qu et al. (2021). This result
highlights the tradeoff between the estimation error and the algorithm performance. If we choose
a small
α
, the algorithm can tolerant a larger estimation error (i.e. larger right hand side of (5) can
be obtained) but may lead to high control cost due to the tight state co-variance constraint. If we
choose a larger
α
, the algorithm tolerates smaller estimation error while its performance improves
due to the less strict state co-variance constraint.
5. Experiments
The results in the previous section focus on stability of COCO-LQ approach. Here, we use experi-
mental results to highlight that COCO-LQ also performs near-optimally in terms of cost while also
stabilizing systems that the naive approach based on LTI control cannot. In Section 5.1, we test
our method on random, synthetic linear time varying systems, and in Section 5.2 we demonstrate
the algorithm performance in real-world power system frequency control settings. Due to space
limit, more experiment results on nonlinear systems via local linear approximation can be found in
Appendix F of our online report Qu et al. (2021).
8
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
5.1. Synthetic Time-Varying Systems
We first consider the control of switching and time-variant systems. The cost function is set as
Q
= 0
.
2
I,R
=
I
, and system is subject to Gaussian disturbance
w
t
N
(0
,
0
.
1
2
)
. We average the
simulation results over 5 runs and visualize the mean performance and standard deviation.
a.
Switching systems.
we consider a switching system following Example 1 in Section 3, where
A
t
alternates between
A
= [[0
.
99
,
1
.
5]
,
[0
,
0
.
99]]
and
A
= [[0
.
99
,
0]
,
[1
.
5
,
0
.
99]]
, and
B
t
=
I
.
b.
Time-variant systems.
We consider a system
A
t
= [[0
.
99
,
sin(
πt
2
)
|
e
t/
60
]
,
[
|
cos(
πt
2
)
|
e
t/
60
,
0
.
99]]
that is continually changing over time, and
B
t
=
I
.
Figure 1: Performance comparison of COCO-LQ and LQ on synthetic time-varying systems. The
left two figures show the state evolution, and right two figures show the normalized cost
(cost of COCO-LQ divided by cost of the offline optima) under different
α
.
As we can see in Figure 1, COCO-LQ is able to quickly and effectively stabilize the system under
various time-varying scenarios, which validates our theoretical findings. As
α
increases, the ac-
quired cost of COCO-LQ first decreases and then increases (explosion of state), highlighting that
α
can explicitly control the tradeoff between cost and stability. With proper selection of
α
, COCO-LQ
achieves near-optimal cost (within 30% of the offline optimal for both system a and b).
5.2. Frequency Control with Renewable Generation
We now consider a power system frequency control problem on standard IEEE WECC 3-machine
9-bus system (Figure 2(a)), which is a widely adopted system used in frequency stability studies.
The state space model of power system frequency dynamics follows Hidalgo-Gonzalez et al. (2019),
[
̇
θ
̇
ω
]
︸︷︷︸
̇
x
=
[
0
I
M
1
t
L
M
1
t
D
]
︷︷
A
t
[
θ
ω
]
+
[
0
M
1
t
]
︷︷
B
t
p
in
︸︷︷︸
u
t
(6)
where the state variable is defined as the stacked vector of the voltage angle
θ
and frequency
ω
.
M
t
=
diag
(
m
t,i
)
is the inertia matrix, where
m
t,i
represents the equivalent rotational inertia at
bus
i
and time
t
.
M
t
is time-varying and depends on the mix of online generators, since only
thermal generators provide rotational inertia and renewable generation does not Ulbig et al. (2014).
D
=
diag
(
d
i
)
is the damping matrix, where
d
i
is the generator damping coefficient.
L
is the
network susceptance matrix. The control variable
p
in
corresponds to the electric power generation.
We assume the system is changing between two states: a high renewable generation scenario
where
m
t,i
= 2
(i.e., 80 percent renewable with zero inertia and 20 percent of thermal generation
9
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
(
a
)
(
b
)
Figure 2: (a) IEEE WECC 3-machine 9-bus system schematic with generators at bus 1, 5, 9 are
mixture of thermal generation and renewable. (b) Frequency dynamics under offline op-
tima, baseline H-horizon control, and COCO-LQ. The dotted grey lines (
±
0
.
05
Hz) are
the safety margin of power system frequency variation.
with 10s inertia), and a low renewable generation scenario where
m
t,i
= 8
(i.e., 20 percent renew-
able and 80 percent thermal generation), with additional random fluctuations between
[0
,
0
.
2]
. This
setup represents the real-world situation where we have high solar output during the daytime, and
low output in the morning/evening, with intra-day variations due to clouds and weather changes.
Notice that
B
t
is not full rank, thus we need to leverage predictions, i.e.,
A
t
+1
and
B
t
+1
. For fair
comparison, we compete against the
H
-horizon optimal control in Bertsekas et al. (1995), which
is the extension of naive LTI controller to use
H
-step predictions. In both cases, we assume the
prediction is accurate and use the exact value of
A
t
+1
and
B
t
+1
for computing control actions.
Figure 2(b) visualizes the power system frequency dynamics under three controllers: the offline
optimal control, the baseline
H
-horizon optimal controller and the proposed COCO-LQ-Prediction
method. We ideally desire a controller that is able to maintain the frequency variation within
±
0
.
05
Hz and eventually stabilize the system. It can be observed that our algorithm succeeds at
maintaining the frequency stability under random, time-varying renewable generations. Further-
more, the performance of COCO-LQ is very close to the offline optimal, while the system frequency
diverges under the baseline
H
-horizon optimal control.
6. Conclusion
In this paper, we study the stability of LTV systems. Our results demonstrate the challenge of
ensuring stability for LTV systems compared to LTI systems. Motivated by this challenge, we
propose a COCO-LQ/COCO-LQ-Prediction policy that can guarantee stability for LTV systems
under certain assumptions. There are many interesting open questions that remain. For example,
the bound
α <
1
/
2
in Theorem 3 is a sufficient condition, and studying how to relax the bound
and how to derive instance-dependent bounds is an interesting future question. Another important
direction is to analyze the performance (e.g. the regret) of the proposed approach in order to quantify
the tradeoff between stability and performance.
10
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
References
Yasin Abbasi-Yadkori and Csaba Szepesv
́
ari. Regret bounds for the adaptive control of linear
quadratic systems. In
Proceedings of the 24th Annual Conference on Learning Theory
, pages
1–26, 2011.
Francesco Amato, Marco Ariola, and Carlo Cosentino. Finite-time control of discrete-time linear
systems: analysis and design conditions.
Automatica
, 46(5):919–924, 2010.
Dimitri P Bertsekas, Dimitri P Bertsekas, Dimitri P Bertsekas, and Dimitri P Bertsekas.
Dynamic
programming and optimal control
, volume 1. Athena scientific Belmont, MA, 1995.
Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar.
Online linear quadratic control.
arXiv preprint arXiv:1806.07104
, 2018.
Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for
robust adaptive control of the linear quadratic regulator. In
Advances in Neural Information
Processing Systems
, pages 4188–4197, 2018.
Paolo Falcone, Manuela Tufo, Francesco Borrelli, Jahan Asgari, and H Eric Tseng. A linear time
varying model predictive control approach to the integrated vehicle dynamics control problem in
autonomous systems. In
2007 46th IEEE Conference on Decision and Control
, pages 2980–2985.
IEEE, 2007.
Paolo Falcone, Francesco Borrelli, H Eric Tseng, Jahan Asgari, and Davor Hrovat. Linear time-
varying model predictive control and its application to active steering systems: Stability analysis
and experimental validation.
International Journal of Robust and Nonlinear Control: IFAC-
Affiliated Journal
, 18(8):862–875, 2008.
Germain Garcia, Sophie Tarbouriech, and Jacques Bernussou. Finite-time stabilization of linear
time-varying continuous systems.
IEEE Transactions on Automatic Control
, 54(2):364–369,
2009.
Gautam Goel and Babak Hassibi. Regret-optimal control in dynamic environments.
arXiv preprint
arXiv:2010.10473
, 2020.
Paula Gradu, Elad Hazan, and Edgar Minasyan. Adaptive regret for control of time-varying dynam-
ics.
arXiv preprint arXiv:2007.04393
, 2020.
Patricia Hidalgo-Gonzalez, Rodrigo Henriquez-Auba, Duncan S Callaway, and Claire J Tomlin.
Frequency regulation using data-driven controllers in power grids with variable inertia due to
renewable energy. In
2019 IEEE Power & Energy Society General Meeting (PESGM)
, pages
1–5. IEEE, 2019.
Yiguang Hong, Zhong-Ping Jiang, and Gang Feng. Finite-time input-to-state stability and applica-
tions to finite-time control design.
SIAM Journal on Control and Optimization
, 48(7):4395–4418,
2010.
Zhong-Ping Jiang and Yuan Wang. Input-to-state stability for discrete-time nonlinear systems.
Au-
tomatica
, 37(6):857–869, 2001.
11
S
TABLE
O
NLINE
C
ONTROL OF
LTV S
YSTEMS
Hassan K Khalil.
Nonlinear systems
, volume 3. 2002.
T.L Lai. Asymptotically efficient adaptive control in stochastic regression models.
Advances in
Applied Mathematics
, 7(1):23 – 45, 1986.
Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, and Anima Anandkumar.
Adaptive
control and regret minimization in linear quadratic gaussian (lqg) setting.
arXiv preprint
arXiv:2003.05999
, 2020a.
Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, and Anima Anandkumar. Explore more and
improve regret in linear quadratic regulators.
arXiv preprint arXiv:2007.12291
, 2020b.
Xiaodi Li, Xueyan Yang, and Shiji Song. Lyapunov conditions for finite-time stability of time-
varying time-delay systems.
Automatica
, 103:135–140, 2019a.
Yingying Li, Aoxiao Zhong, Guannan Qu, and Na Li. Online markov decision processes with time-
varying transition probabilities and rewards. In
ICML Real-world Sequential Decision Making
workshop
, 2019b.
Riccardo Marino and Patrizio Tomei. Robust adaptive regulation of linear time-varying systems.
IEEE Transactions on Automatic Control
, 45(7):1301–1311, 2000.
Richard H Middleton and Graham C Goodwin. Adaptive control of time-varying linear systems.
IEEE Transactions on Automatic Control
, 33(2):150–155, 1988.
Yi Ouyang, Mukul Gagrani, and Rahul Jain. Learning-based control of unknown linear systems
with thompson sampling.
arXiv preprint arXiv:1709.04047
, 2017.
Guannan Qu, Yuanyuan Shi, Sahin Lale, Anima Anandkumar, and Adam Wierman. Stable online
control of linear time-varying systems.
arXiv preprint arXiv:2104.14134
, 2021.
Eduardo D Sontag. Input to state stability: Basic concepts and results. In
Nonlinear and optimal
control theory
, pages 163–220. Springer, 2008.
Sophie Tarbouriech, Germain Garcia, and Adolf H Glattfelder.
Advanced Strategies in Control
Systems with Input and Output Constraints
, volume 346. Springer Science & Business Media,
2006.
Emanuel Todorov and Weiwei Li. A generalized iterative lqg method for locally-optimal feed-
back control of constrained nonlinear stochastic systems. In
Proceedings of the 2005, American
Control Conference, 2005.
, pages 300–306. IEEE, 2005.
Andreas Ulbig, Theodor S Borsche, and G
̈
oran Andersson. Impact of low rotational inertia on
power system stability and operation.
IFAC Proceedings Volumes
, 47(3):7290–7297, 2014.
Lieven Vandenberghe and Stephen Boyd. Semidefinite programming.
SIAM review
, 38(1):49–95,
1996.
Alex Zheng and Manfred Morari. Robust control of linear time-varying systems with constraints. In
Proceedings of 1994 American Control Conference-ACC’94
, volume 3, pages 2416–2420. IEEE,
1994.
12