An Integrated Approach for Failure Mitigation &
Localization in Power Systems
Chen Liang
∗
, Linqi Guo
∗
, Alessandro Zocca
†
, Shuyue Yu
∗
, Steven H. Low
∗
and Adam Wierman
∗
∗
Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
{
cliang2, lguo, syu5, slow, adamw
}
@caltech.edu
†
Department of Mathematics, Vrije Universiteit, Amsterdam, Netherlands
a.zocca@vu.nl
Abstract
—The transmission grid is often comprised of several
control areas that are connected by multiple tie lines in a mesh
structure for reliability. It is also well-known that line failures can
propagate non-locally and redundancy can exacerbate cascading.
In this paper, we propose an integrated approach to grid
reliability that (i) judiciously switches off a small number of tie
lines so that the control areas are connected in a tree structure;
and (ii) leverages a unified frequency control paradigm to provide
congestion management in real time. Even though the proposed
topology reduces redundancy, the integration of tree structure at
regional level and real-time congestion management can provide
stronger guarantees on failure localization and mitigation. We
illustrate our approach on the IEEE 39-bus network and evaluate
its performance on the IEEE 118-bus, 179-bus, 200-bus and
240-bus networks with various network congestion conditions.
Simulations show that, compared with the traditional approach,
our approach not only prevents load shedding in more failure
scenarios, but also incurs smaller amounts of load loss in
scenarios where load shedding is inevitable. Moreover, generators
under our approach adjust their operations more actively and
efficiently in a local manner.
Index Terms
—cascading failure, failure mitigation, frequency
control, power system reliability, topology design
I. I
NTRODUCTION
Reliability is critical in power systems. Tremendous efforts
from both the industry and academia have been made to
analyze cascading failures. Current industry practice is typi-
cally simulation-based, where contingencies are studied using
extensive simulations [1]. Such approaches are often limited
by computational power.
To provide tractable analysis, pure topological models have
been proposed, where failures propagate locally to neighboring
components with high probability [2]–[4]. However, these
epidemic models are not realistic as non-local failure prop-
agation is observed in both real-world and simulated cascade
data [5]. More realistic models use linearized DC power flow
to characterize power redistribution after transmission line
failures [6]–[9]. These DC models indeed exhibit both local
and non-local propagation of failures. See [10] for an extensive
This work has been supported by Resnick Fellowship, Linde Institute Re-
search Award, NWO Rubicon grant 680.50.1529, NSF through awards ECCS
1619352, CNS 1545096, CCF 1637598, ECCS 1739355, CNS 1518941, CPS
154471, ARPA-E through award de-ar0000699 (NODES), and DTRA through
award HDTRA 1-15-1-0003.
list of cascading failure models. It is observed in [9] that
successive failures can be quite far away from initial failures
under the DC model, which aligns with real cascade data.
Such non-local failure propagation comes from the inter-
connectivity of the power network. The transmission grid is
usually comprised of several control areas, which are operated
relatively independently with prescribed power exchanges de-
termined by economic dispatch and maintained by automatic
generation control (AGC) [11]. Traditionally, control areas are
interconnected by multiple tie lines in a
mesh
structure to
provide multiple alternative routes for power, as redundancy
often improves reliability [12]. Surprisingly, it is shown in [13]
that
tree partitioning
of the grid insulates the impact of failures
and precisely captures the boundaries of failure propagation.
This means that, while providing redundancy, multiple tie lines
also exacerbate non-local failure propagation.
Failure models based on DC power flow often assume power
injections remain unchanged after a failure as long as the
network remains connected. This is a reasonable assumption
under the traditional frequency control dynamics that operate
at a fast timescale (see Section II), although this connection
has not been widely mentioned in the literature. An important
feature of our work is that we explicitly model the interaction
between frequency control dynamics at a fast timescale and the
DC power flow at a slow timescale in the cascading process,
and leverage this interaction for failure mitigation.
In this paper, we propose an integrated approach to grid
reliability consisting of two main components:
topology design
and
real-time response
. For topology design, we propose to
judiciously switch off a small number of tie lines so that the
control areas are interconnected in a tree structure. At real-
time, we leverage the recently proposed unified frequency con-
trol [14] to provide congestion management at a fast timescale.
Even though the proposed approach reduces redundancy in
network topology, the integrated design of tree-connected
control areas and real-time congestion management provides
stronger guarantees on failure localization and mitigation,
leading to a higher overall reliability. This new framework
builds on our earlier work [13], [15]. Here, we extend the
approach and evaluate its performance in IEEE test networks.
We show in Tables III and IV of Section VI using simula-
tions over the IEEE 118-bus, 179-bus, 200-bus and 240-bus
networks that our proposed approach not only prevents load
shedding in more failure scenarios, but also incurs smaller
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020
arXiv:2004.10401v1 [eess.SY] 22 Apr 2020
amounts of load loss in scenarios where load shedding is
inevitable. Moreover, generators under our approach respond
more actively and efficiently to terminate cascading failures
with only local adjustments.
The rest of the paper is organized as follows. We review
in Section II the DC failure model and propose an integrated
failure model that incorporates the effect of frequency control
on failure propagation. Our proposed approach is presented
in Section III with implementation details on topology design
and real-time response. We then demonstrate the theoretical
guarantees and benefits of our approach in Section IV. Lastly,
we illustrate and evaluate our approach over the IEEE test
networks in Sections V and VI. All proofs are omitted and
can be found in a detailed 3-part paper under preparation.
II. F
AILURE
M
ODELS
To begin, we review a widely used DC failure propagation
model, and then propose an integrated model that extends the
DC model to incorporate frequency control dynamics.
A. DC Failure Model
Power grids are usually modeled by a set of non-linear and
non-convex AC power flow equations [11]. It is, however,
less efficient for large-scale power networks. In this paper,
we adopt the linearized DC power flow for tractability, which
is widely used in contingency analysis [6]–[9], [16]. In par-
ticular, we represent the power transmission network by a
directed graph
G
= (
N
,
E
)
, where
N
=
{
1
,
2
,...,n
}
and
E
=
{
e
1
,...,e
m
}
are the sets of buses and transmission
lines, respectively. The terms bus/node and line/edge are used
interchangeably in this paper. An edge
e
in
E
between nodes
i
and
j
is denoted as either
e
or
(
i,j
)
, and an arbitrary direction
is assigned to each edge. We assume lines are purely reactive,
and each line
e
is characterized by its susceptance
b
e
.
The cascading process is described in stages indexed by
k
∈
N
. The topology at stage
k
is denoted as
G
(
k
) =
(
N
,
E
(
k
))
and
G
(0)
denotes the original network. Given the
power injections and phase angles
p
(
k
)
,θ
(
k
)
∈
R
n
, the line
flows
f
(
k
)
∈
R
m
(
k
)
are the solution to the following DC
power flow equations:
p
(
k
) =
C
(
k
)
f
(
k
)
,
(1a)
f
(
k
) =
B
(
k
)
C
(
k
)
T
θ
(
k
)
,
(1b)
where
B
(
k
) := diag
(
b
1
,...,b
m
(
k
)
)
is the diagonal suscep-
tance matrix, and
C
(
k
)
∈
R
n
×
m
(
k
)
is the node-edge incidence
matrix. Rows and columns of
C
(
k
)
correspond to nodes and
edges. For every edge
e
= (
i,j
)
∈ E
(
k
)
, we set
C
i,e
(
k
) = 1
and
C
j,e
(
k
) =
−
1
, while all other entries are set to zero.
In order for the linear system (1) to have a solution, the
power must be balanced on each island of the network, i.e.,
∑
i
∈N
l
p
i
(
k
) = 0
for each connected component
N
l
of
G
(
k
)
.
Under this condition, line flows are uniquely determined by
f
(
k
) =
B
(
k
)
C
(
k
)
T
(
C
(
k
)
B
(
k
)
C
(
k
)
T
)
†
p
(
k
)
,
where
(
·
)
†
denotes the Moore-Penrose inverse.
We now formally describe the DC failure model. The
cascade starts from an initial set of line failures and propagates
in stages. At each stage
k
∈
N
+
, assume a set
E
(
k
)
⊂E
(
k
−
1)
of lines fail. The power injections
p
(
k
)
first adjust based on
certain balancing rule
R
for each island, then the line flows
redistribute over the new topology
G
(
k
) = (
N
,
E
(
k
))
, where
E
(
k
) =
E
(
k
−
1)
\
E
(
k
)
. We adopt the following deterministic
outage rule: every line
e
with power flow exceeding its line
limit
π
e
is tripped at the next stage, i.e.,
E
(
k
+ 1) =
{
e
:
|
f
e
(
k
)
|
> π
e
,e
∈ E
(
k
)
}
. If all lines are within their limits
(
E
(
k
+ 1) =
∅
), the cascade is terminated; otherwise the
process repeats for stage
k
+ 1
.
The evolution of the cascading failure critically depends on
the power balancing rule
R
. A commonly used rule
R
c
in the
cascading failure literature can be described as follows [9]: (i)
If the network remains connected after the failure
E
(
k
)
, then
p
(
k
)
remains the same as previous stage
p
(
k
−
1)
; otherwise
(ii) all the nodes proportionally adjust their injections to
compensate for the imbalance on each island. As explained
in the next subsection, this rule can be interpreted as a special
case of the integrated failure model that we now present.
B. Integrated Failure Model
From this subsection on, we consider the failure dynamics
for a single stage and drop the index
k
for presentation clarity.
We use superscript
(
·
)
0
to denote the pre-contingency nominal
steady-state values, and symbols followed by
(
t
)
to indicate
the dynamic process.
Assume the pre-contingency grid
G
0
= (
N
,
E
0
)
operates
at a nominal steady-state
(
f
0
,p
0
,θ
0
)
which satisfies the DC
power flow equation (1), i.e.
f
0
=
B
0
C
0
T
(
C
0
B
0
C
0
T
)
†
p
0
.
Given a set
E
of line failures, we use
E
:=
E
0
\
E
to denote the
surviving lines. Let
B,C
denote the susceptance and incidence
matrices for the surviving network
G
= (
N
,
E
)
. The dynamics
for bus frequency
deviations
ω
(
t
)
∈
R
|N|
and line flows
f
(
t
)
∈
R
|E|
over the remaining lines can be described by
the following linear swing and power flow equations [11]:
M
̇
ω
=
p
0
−
d
(
t
)
−
Dω
(
t
)
−
Cf
(
t
)
,
(2a)
̇
f
=
BC
T
ω
(
t
)
,
(2b)
where
M
∈
R
|N|×|N|
is the diagonal matrix containing
information about the system inertia,
d
(
t
)
∈
R
|N|
is the
deviation
on controllable power injections (such as droop
control, generator ramping in response of power imbalance,
and load-side participation), and
Dω
(
t
)
∈
R
|N|
denotes the
deviation
on system damping as well as load dynamics. The
initial values for dynamics (2) are
ω
(0) = 0
and
f
(0) =
f
0
E
.
Note here that the pre-contingency injection
p
0
should be
interpreted as the sum of nominal values of generator injection,
controllable injection, load, and system damping. As such,
d
(
t
)
represents the system response in terms of the controllable load
deviation after a failure event.
A state
x
∗
:= (
d
∗
,ω
∗
,f
∗
)
∈
R
|N|
×
R
|N|
×
R
|E|
is said to
be an
equilibrium
1
if the right-hand sides of (2) are zero at
x
∗
.
1
We do not explicitly model the phase angle dynamics
̇
θ
=
ω
and,
hence, do not enforce zero frequency deviation at equilibrium in our setup.
This allows (2) to model both primary and secondary frequency control,
demonstrating their close connections in a common framework. See [17] for
more discussions.
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020
It is clear that, at equilibrium, the post-contingency line flows
f
∗
satisfy the DC power flow equations with
post-contingency
injections
p
∗
=
p
0
−
d
∗
−
Dω
∗
over
post-contingency
network
G
. This fact suggests that the power balancing rule mentioned
in the previous subsection can be interpreted as a result of
the linear frequency dynamics. Indeed, we show that when
classical droop control is adopted, i.e.
d
j
(
t
) =
α
j
ω
j
(
t
)
, the
balancing rule
R
c
is recovered.
As shown in [18], the equilibrium of (2) under droop control
is the optimal solution of the following optimization
2
:
min
ω,d,f,θ
∑
j
∈N
d
2
j
2
α
j
+
D
j
w
2
j
2
(3a)
s.t.
p
0
−
d
−
Dω
=
Cf,
(3b)
f
−
BC
T
θ
= 0
.
(3c)
If the grid becomes disconnected with several islands
{N
1
,...,
N
l
}
, then the optimal solution of (3) is:
d
∗
j
=
α
j
∑
i
∈N
k
(
α
i
+
D
i
)
∑
i
∈N
k
p
0
i
,
for
j
∈N
k
,
(4a)
D
j
ω
∗
j
=
D
j
∑
i
∈N
k
(
α
i
+
D
i
)
∑
i
∈N
k
p
0
i
,
for
j
∈N
k
.
(4b)
Thus, in the equilibrium state the power injections adjust
linearly in the power imbalance
∑
i
∈N
k
p
0
i
on each island
N
k
of the post-contingency network
G
, precisely as prescribed by
the power balancing rule
R
c
.
We now describe our integrated failure model, which ex-
tends DC power flow at a slow timescale by incorporating
frequency dynamics at a fast timescale. After line failures, in-
stead of solving DC power flow equations with balancing rule
R
, the system evolves according to the frequency dynamics
(2) and converges to an equilibrium. As before, a line trips in
the next round if and only if its
steady-state
flow exceeds its
capacity, while an overload during transient does not cause a
line failure. This is a reasonable assumption, as line outages
normally require time for thermal accumulation.
Compared with DC model, there are many benefits of
this integrated failure model. First, it provides a clear ex-
planation of the power balancing rules already introduced in
the literature. Indeed, the validity of various balancing rules
can be justified in a similar manner from a frequency dy-
namic perspective. Second, the equilibrium of (2) can usually
be efficiently obtained from optimization problems like (3).
Tractable analysis can thus be performed without simulating
transient dynamics. More importantly, the integrated failure
model offers a systematic method to analyze network evolution
under various control actions, allowing us to
reverse engineer
the controller
d
(
t
)
in order to find a potentially better system
response to failures. Our proposal of integrating the unified
controller, to be presented in next section, in the context of
failure mitigation is an example of how we can leverage this
model to improve the system reliability in achieving desirable
control properties.
2
For simplicity, we assume the constraints on control actions
d
are inactive.
III. A
N
I
NTEGRATED
A
PPROACH TO
F
AILURE
M
ITIGATION
In this section, we first give a high-level description of our
proposed approach and then provide technical and implemen-
tation details on the topology design and real-time response.
A. Overview
Our proposal to improve grid reliability consists of two main
components:
topology design
and
real-time response
. It aims
to achieve the following desirable properties:
•
Optimal mitigation
: Cascading failures should be stopped,
and system adjustments for generations and loads (includ-
ing load shedding) should be minimized.
•
Local impact
: Disturbances in a control area should not
impact other areas if at all possible. Control areas can
thus be operated relatively independently.
•
Autonomous response
: The control actions should be im-
plemented in real-time in an autonomous and distributed
manner. Current approaches to failure mitigation often
involves human in the loop, rendering it slower, less
optimal and possibly more error-prone.
For the topology design, we propose a tree structure at
the control area level, contrary to the conventional design
where areas are connected in a mesh structure through mul-
tiple tie lines for grid reliability. At real-time, we adopt a
unified controller for frequency regulation that also manages
congestion at fast timescale. A distributed detection algorithm
is implemented in parallel to assess the severity of failures and
adjust the controller to stabilize the grid when necessary.
B. Topology Design
Redundancy has been the key mechanism for grid reliability,
e.g., the
N
−
1
security standard [11], [19]–[21]. Different
control areas in current transmission grids are thus mesh-
connected by multiple tie lines in order to provide multiple
alternative routes for power to flow through.
It has been shown recently that such a redundancy-based
design allows the impact of failures to propagate more broadly,
while tree partitioning of the network guarantees control area
independence, i.e., line failures are constrained within their
own control area [13]. We thus propose to judiciously switch
off a small number of tie lines so that the resulting control
areas are connected in a tree structure, aiming to improve grid
reliability through better failure localization.
More specifically, consider a grid
G
= (
N
,
E
)
with control
areas described by a partition
P
:=
{N
1
,...,
N
l
}
, where
N
i
∩ N
j
=
∅
for
i
6
=
j
and
⋃
l
i
=1
N
i
=
N
. We denote
T
(
P
) :=
{
(
s,t
)
∈E|
s
∈N
i
,t
∈N
j
,i
6
=
j
}
as the set of all
tie lines connecting different areas. The reduced graph
G
P
(
E
)
under partition
P
is a graph obtained from
G
by collapsing
each area
N
i
into a “super node” and adding an edge between
super nodes
N
i
and
N
j
for each tie line connecting them. As
mentioned earlier, the redundancy-based design usually leads
to a non-simple (i.e., there may be multiple lines between two
super nodes) or cyclic reduced graph. Our method aims to
select a subset of tie lines
T
⊂T
(
P
)
to switch off, such that
the control areas of the remaining network are connected in a
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020
tree topology, i.e. the reduced graph
G
P
(
E\
T
)
is a tree. This
implies that
|T
(
P
)
|−
l
+ 1
tie lines will be switched off.
Similarly to line failures, tie line switching actions change
the system operating point as power flows redistribute in the
new network topology. Let
p
denote the average injections for
topology design purpose. We choose the set
T
of candidate
lines to minimize
network congestion level
γ
(
T
)
defined as:
γ
(
T
) = max
e
∈E\
T
|
f
e
(
T
)
|
/π
e
,
where
f
e
(
T
)
is the line flow on line
e
after the lines in
T
are switched off. We are therefore interested in solving the
optimization problem:
min
T
⊂T
(
P
)
γ
(
T
)
(5a)
s.t.
G
P
(
E \
T
)
is a tree.
(5b)
The complexity of the optimization (5) originates from
finding all possible subsets
T
of
T
(
P
)
to switch off. Solving
the above optimization problem often becomes intractable for
large-scale power grids. Nevertheless, such switching actions
are only implemented occasionally, rather than continuously
in real time. An approximate but faster algorithm is proposed
in [22] where tree-connected areas are created by recursively
splitting the existing ones, yielding very good results for most
application scenarios.
We remark that it is not guaranteed that
γ
(
T
∗
)
≤
1
, where
T
∗
is an optimal selection, implying that some transmission
lines may become overloaded after switching actions. This
may be alleviated if one has the flexibility to design the
control areas of the grid. We refer interested readers to [22]
for optimal partitioning of the grid using network modularity
clustering algorithms. However,
γ
(
T
∗
)
<
1
indeed holds for
most practical scenarios simulated in [22], especially when the
original grid is not heavily congested.
C. Real-time Response
Once a tree-connected control area structure is created, the
unified controller (UC) is implemented as a frequency regu-
lation method to autonomously respond to disturbances such
as loss of generation/load and line failures. The closed-loop
dynamics of UC are more elaborate than (2); see [14], [18]
for details. It is shown there that, under mild conditions, the
closed-loop equilibrium under UC is globally asymptotically
stable. Moreover it is the optimal solution of the following
optimization:
min
f,d,θ
∑
j
∈N
d
2
j
2
α
j
(6a)
s.t.
p
0
−
d
−
Cf
= 0
,
(6b)
f
=
BC
T
θ,
(6c)
E
T
Cf
=
E
T
C
0
f
0
,
(6d)
f
e
≤
f
e
≤
f
e
, e
∈E
,
(6e)
d
j
≤
d
j
≤
d
j
, j
∈N
,
(6f)
where (6b) and (6c) describe the DC power flow equa-
tions, (6d) ensures zero area control error, (6e) enforces
line flow limits, and (6f) enforces control limits. Matrix
E
T
describes the control areas
P
=
{N
1
,...,
N
l
}
, namely
E
T
∈ {
0
,
1
}
l
×|N|
is defined as
E
T
(
i,j
)
= 1
if node
j
∈ N
i
and
E
T
(
i,j
)
= 0
otherwise. Therefore, the
i
-th row of the
constraint
E
T
Cf
=
E
T
C
0
f
0
ensures that the net total power
flow on tie lines connected to area
N
i
is restored to the pre-
contingency value
3
.
Explaining the design of UC (which prescribes the dynamics
of controllable power injection
d
(
t
)
) is beyond the scope of
this paper; interested readers are referred to [18]. Nevertheless,
it should be noted that the controller is derived from a variant
of primal-dual algorithm to solve (6), where the primal updates
are carried out by the network dynamics itself, while the
dual variables
λ
are updated with only local communications.
Under mild conditions, UC converges to the optimal solution
of (6), provided that the optimization is feasible.
If the disturbances are small, (6) is likely to be feasible and
UC is then guaranteed to drive the network to an equilibrium,
where the nominal frequency and the inter-area flows are
restored (zero area control error), and more importantly the
line limits are enforced. Such failures can thus be properly
mitigated, leveraging the congestion management of UC.
However, when a
severe
disturbance occurs, (6) may no
longer be feasible, which implies that the controller is not
capable of achieving all its control goals. If such failures
happen, UC is no longer stable and can potentially lead to a
large scale outage. It is thus crucial to promptly identify such
severe failures and respond accordingly. To do so, we propose
a distributed algorithm to assess the severity of failures in
real-time, and a proactive procedure to adjust the controller to
stabilize the system if necessary; see [15] for more details.
1) Severe Failure Detection:
Severe failures can be detected
by monitoring the dual variables of UC in real-time. As shown
in the following proposition, after a severe failure at least one
dual variable grows arbitrarily large during the transient phase
and the system never reaches an equilibrium point. Therefore,
a warning can be raised whenever a dual variable exceeds a
predefined threshold.
Proposition III.1.
If the optimization problem
(6)
is infeasible
and a primal-dual update algorithm is implemented to solve
(6)
, then there exists a dual variable
λ
i
such that
lim sup
t
→∞
|
λ
i
(
t
)
|
=
∞
.
This approach is guaranteed to detect all severe failures, but
it may yield false alarms as some dual variables can possibly
become very large during transient even when (6) is feasible.
There is thus a trade-off between the speed and accuracy of
detection: a larger threshold would reduce the false alarm
rate, but also requires longer time to detect severe failures.
Therefore, the threshold should be selected carefully based on
system parameters and application scenarios.
3
If a control area is disconnected from the grid, the corresponding row of
(6d) will be lifted to balance the injections.
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020
2) Response to Severe Failures:
When severe failures hap-
pen, it is impossible for UC to enforce all the constraints
in (6). To alleviate the impact of such failures and stabilize
the system, we propose to adjust the controller by proactively
lifting constraints in (6) until it becomes feasible again.
However, not all constraints can be relaxed, as some of
them are essential to system stability. In particular, (6b) and
(6c) represents the physics of power flow, while (6e) ensures
post-contingency flows are safe. The other two constraints can
be lifted without compromising stability as follows.
First, some of the zero area control error constraints
E
T
Cf
=
E
T
C
0
f
0
can be lifted by setting the corresponding
dual variables to
0
. This action allows more control areas to
adjust their operations in the mitigation of the failure. This may
be undesirable as it is counter to our goal of failure localiza-
tion. Second, the range
[
d
j
,
d
j
]
for the controllable injections
can be relaxed by enlarging its width. This action usually
corresponds to shedding loads, which is clearly undesirable.
By relaxing some rows of the
E
T
Cf
=
E
T
C
0
f
0
con-
straints and allowing controlled load shedding, problem (6)
can always be made feasible with all line flows satisfying
their limits (more details below). Hence any severe failure
can be terminated and will not cascade out of control, though
with possible degradation on localization performance and
potential loss of load. Therefore, such a constraint lifting
procedure should be implemented carefully to prioritize the
desired objectives.
IV. D
ISCUSSION
In this section, we discuss our proposed approach and
demonstrate how it achieves all our design goals.
A. Guaranteed Mitigation Performance
Our proposed approach provably terminates line failures
and optimally drives the system to a desired equilibrium.
For these guarantees the unified frequency control is of key
importance. Assume the pre-contingency flows are within safe
limits. It can be shown that the relaxed optimization problem
in which (6d) is lifted and (6f) is enlarged by allowing all
loads to be shedded, yields a trivial feasible point by reducing
all generations and loads to zero. Therefore, regardless of
whether the failure is severe or not, we can always relax (6)
progressively to a feasible optimization, at the cost of more
affected areas and/or potential load loss. Nevertheless, given
the global stability of UC, the system is guaranteed to converge
to an equilibrium such that post-contingency flows are below
line limits, i.e., successive failures are prevented.
The objective (6a) should be interpreted as the penalty
for the control actions
d
. The optimal solution
d
∗
is ex-
actly the adjustment of power injections at the equilibrium
under the unified frequency control in response to a failure.
Therefore, UC manages to mitigate the failure optimally in
terms of minimizing the injection adjustments. We remark that
the control actions can be further separated into generations
and loads
d
=
d
G
−
d
L
, and the objective (6a) can be
designed to implement different priorities of generation/load
adjustments. For instance, if the objective (6a) is modified to
∑
j
∈N
(
d
G
j
)
2
2
α
G
j
+
(
d
L
j
)
2
2
α
L
j
, then smaller
α
L
j
’s can be used to prefer
generation adjustments over load adjustments.
B. Guaranteed Localization Performance
The proposed approach is guaranteed to terminate failures
with only local impact at equilibrium if the failure is not
severe. The tree structure plays a crucial role in this aspect. To
demonstrate our localization result, we clarify the meaning of
“local impact” through the notion of
associated areas
. We say
an area
N
k
is associated with failures
E
if there exists an edge
e
= (
i,j
)
∈
E
such that either
i
∈N
k
or
j
∈N
k
. Moreover,
we only study the localization when the system converges to an
equilibrium, while the non-local response during the transient
stage is ignored.
The following theorem characterizes the localization per-
formance of our proposed approach: after a non-severe failure
the system converges to an equilibrium where the failure is
mitigated with unchanged injections for non-associated areas.
This guarantee extends the results in [13] in two ways: (i) both
bridge
4
and non-bridge failures are localized within associated
areas; (ii) failures are terminated so that successive failures
with potential global impact are prevented.
Theorem IV.1.
Assume the control areas are tree-connected.
Given a set
E
of failures, if
(6)
is feasible, then
d
∗
j
= 0
for
all
j
∈N
k
if
N
k
is not associated with
E
.
This strong localization guarantee stems from the combi-
nation of tree-connected control areas and unified frequency
control. Without the tree structure, UC would only guarantee
the overall power imbalance is restored in each control area,
while the individual operating conditions may already vary.
C. Other Benefits
Another appealing feature of our proposed control is that it
can be implemented in an autonomous and distributed fashion,
allowing for real-time response to failures. The problem of
how to optimally shed load to mitigate failures has already
been considered in the literature e.g. [8]. The authors solve
a similar optimization problem but using a central controller
that requires global information and failure information. In
contrast, our approach operates as a closed-loop distributed
controller and drives the system to a desired equilibrium
autonomously.
Our method has also another economic benefit. In practice,
N
−
1
preventive SCOPF is solved to ensure system reliability.
Our approach, however, can serve as a corrective method to
mitigate non-severe failures. Thus SCOPF only need consider
those severe failures, potentially leading to significant savings.
V. A
N
I
LLUSTRATIVE
E
XAMPLE
In this section, we illustrate the dynamic response of our ap-
proach for the IEEE 39-bus network with parameters adapted
from [18]. It consists of two control areas, which are connected
by three tie lines, namely
(1
,
2)
,
(2
,
3)
and
(26
,
27)
.
4
A bridge is a cut edge whose removal disconnects the network.
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020
Figure 1. IEEE 39-bus network with two control areas from [18]. The two
blue tie lines are switched off to create a tree structure with
(2
,
3)
as bridge.
We switch off the tie lines
(1
,
2)
and
(26
,
27)
, chosen
heuristically as those with the smallest absolute line flow. Two
control areas are then connected in a tree structure as shown
in Figure 1. We implement the unified controller on all nodes
as follows. We only allow to adjust generations at first, but, if
a severe failure is detected, the controller is then allowed to
reduce loads. As a last resort, zero area control error can be
lifted. The threshold is set to
0
.
5
pu for dual variables.
As illustrated in Figure 2, in the case of the non-severe fail-
ure
(4
,
14)
, the dual variables are always below the threshold
and the system quickly converges to a safe equilibrium. On
the other hand, the failure
(6
,
7)
leads to unstable oscillations
of dual variables and a severe warning is raised at
10
sec, as
depicted in Figure 3. The controller is then allowed to shed
loads, action that quickly re-stabilizes the system. Note that
the flow on line
(25
,
26)
remains unchanged at steady state for
both failures, as it belongs to a non-associated control area.
0
10
20
30
40
50
Time
-0.02
-0.01
0
0.01
0.02
Dual Variable (pu)
(a) Dual variable dynamics
0
10
20
30
40
50
Time
-3
-2
-1
0
1
2
Flow Deviation (pu)
(4,5)
(6,7)
(25,26)
(b) Line flow dynamics
Figure 2. System dynamics after the non-severe failure of line
(4
,
14)
. The
controller is only allowed to reduce generations.
VI. S
IMULATIONS
A. Simulation Setup
We evaluate our integrated approach on the IEEE 118-bus,
179-bus, 200-bus and 240-bus networks. For each network, we
first partition it into two control areas using network modu-
larity clustering algorithm as proposed in [22]. The tie line
with largest absolute flow is selected as the remaining bridge,
0
20
40
60
Time
-0.06
-0.04
-0.02
0
0.02
0.04
Dual Variable (pu)
Severe failure detected!
(a) Dual variable dynamics
0
20
40
60
Time
-2
-1
0
1
2
Flow Deviation (pu)
(4,5)
(4,14)
(25,26)
(b) Line flow dynamics
Figure 3. System dynamics after the severe failure of line
(6
,
7)
. A warning
is raised at time
10
sec, at which point the controller is allowed to shed loads.
Table I
S
TATISTICS ON
IEEE
TEST NETWORKS
.
Network
# of Edges
Control Area Sizes
# of Tie Lines
118
186
(83, 35)
4
179
263
(98, 81)
4
200
242
(103, 94)
9
240
448
(143, 97)
10
while all other tie lines are switched off. More information on
these test networks are provided in Table I.
The simulated scenarios are created as follows. For each
network, we first adopt the nominal load profile from [23] and
obtain the generation profile through a DC OPF on the original
network. The power injections from this DC OPF problem are
used for both the original network and the network after line
switching, while the resulting nominal line flows will generally
be different. Next, we iterate over every transmission line as
the initial single-line failure (tie lines that have been switched
off are ignored) and simulate the cascading process under four
control approaches:
1) Unified controller on tree-connected areas;
2) Unified controller on mesh-connected areas;
3) Automatic generation control (AGC)
5
on tree-connected
areas;
4) Automatic generation control on mesh-connected areas.
For presentation simplicity, we refer these approaches as
UC+T, UC+M, AGC+T and AGC+M. Moreover, to test their
performance under various network congestion conditions, we
scale the line flow limit and generation limit by a common fac-
tor
α
= 0
.
5
,
1
,
1
.
5
. For a fair comparison, all strategies follow
the same relaxation procedure as described in the previous
section when the optimization is infeasible. We remark that
since AGC does not enforce line limits, the cascading failure
may proceed for multiple stages as described in Section II,
while UC always terminates the failure at the first stage.
We quantify the mitigation performance using the
load loss
rate
(LLR), defined as the ratio of the total loss of load to the
total demand. The localization performance is evaluated by
the
adjusted generator rate
(AGR) which is the ratio of the
number of generators whose operating points have changed
after the failures to the total number of generators. The range
5
We model AGC as an algorithm that solves (6) without flow limits (6e).
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020
for both LLR and AGR is between
[0
,
100%]
, and a smaller
value indicates better mitigation/localization performance. The
complete results are summarized in Tables III and IV for
reference. In the following subsections, we highlight several
insights from these results.
B. Performance of Our Approach
We first evaluate the performance of our integrated approach
UC+T, and compare it with the traditional approach AGC+M.
Table II shows the fraction of failure scenarios with load
loss (LL) and adjusted generations (AG), as well as the
average LLR and AGR of those scenarios for these two control
approaches. Each cell contains three values, representing the
results for
α
= 0
.
5
,
1
,
1
.
5
respectively.
In terms of failure mitigation, our proposal prevents load
shedding from happening in more failure scenarios over 118-
bus, 179-bus, and 240-bus networks under almost all network
congestion conditions. More importantly, our approach always
achieves higher grid reliability, in the sense that the average
load loss, for failures where load shedding is inevitable,
is uniformly lower compared with the traditional AGC+M
approach. This improvement is particularly significant when
the grid becomes congested (smaller
α
). The average LLR,
averaged over all failure scenarios with load loss, is 3% or
lower under UC+T but can be as high as 19% under AGC+M.
Moreover, large-scale outages with load loss larger than
10%
almost never happen under our proposed UC+T approach.
In terms of failure localization, we observe that on 118-
bus, 179-bus and 240-bus networks, generators under UC+T
are adjusted for more failure scenarios when the network is
less congested, while achieving less load loss compared to the
traditional approach. When the congestion becomes worse, the
traditional AGC+M approach not only leads to more scenarios
with adjusted generators, but results in more load loss as
well. This fact indicates that generators under UC+T approach
adjust their operations more actively in response to failures.
Moreover, generators react more efficiently and locally under
UC+T, as the average AGR is always lower.
It should be noted that there are more failure scenarios
with load loss or generator adjustment for our approach over
the 200-bus network. Nevertheless, the LLR and AGR are
still uniformly lower on average under UC+T. One possible
explanation might be that the 200-bus network is very sparse
(see Table I), with limited power transfer capacity across
control areas and, hence, more prone to potential failures. By
adjusting the generators and loads more proactively, UC+T
tries to make sure the post-contingency injections are more
robust in the new topology. This, again, confirms that UC+T
tends to adjust generators more actively to failures, and the
adjustment is prescribed in a more efficient and local manner.
C. Effects of the Unified Controller
The benefits of our approach stem from both UC that en-
forces line limits at fast timescale and tree-connected topology
6
We highlight in bold font the approach with better performance (two values
may look the same after rounding). See Tables III and IV for more precise
results.
Table II
S
TATISTICS ON
UC+T (
FIRST ROW
)
AND
AGC+M (
SECOND ROW
)
6
.
Network
118
179
200
240
% of Scen.
w/ LL
49
,
8
,
5
4
,
2
,
2
21
,
19
,
15
22
,
9
,
3
99
,
26
,
4
59
,
4
,
1
24
,
8
,
7
98
,
95
,
3
% of Scen.
w/ AG
94
,
53
,
8
71
,
44
,
37
56
,
47
,
31
95
,
74
,
22
99
,
27
,
7
84
,
63
,
20
37
,
20
,
21
98
,
95
,
16
Avg.
LLR (%)
2
,
2
,
1
2
,
2
,
2
3
,
2
,
1
1
,
1
,
2
19
,
5
,
4
5
,
3
,
2
14
,
2
,
1
10
,
6
,
3
Avg.
AGR (%)
10
,
12
,
16
27
,
27
,
37
30
,
36
,
51
24
,
22
,
52
21
,
15
,
18
74
,
75
,
39
50
,
36
,
51
60
,
56
,
63
that localizes the impact of initial failures. In this subsection
and the next one we disentangle the benefit of each.
We first demonstrate the load loss reduction capability
of UC by comparing UC-based and AGC-based approaches
(UC+T vs. AGC+T and UC+M vs. AGC+M). For each pair
of approaches, we compute the difference of LLR for every
failure scenario and plot the results in Figure 4. Note that the
failure index is re-ordered so that the difference of LLR is in
a non-decreasing order for better presentation.
The main insight from Figure 4 is that UC-based approaches
(whether the control areas are tree-connected or not) almost
always reduce load loss, and such reductions are more pro-
nounced as network congestion increases. We remark that
there are indeed failure scenarios with less load loss under
AGC-based approaches. When this happens, however, the
failures always cascade through multiple stages under AGC
and, hence, the time to re-stabilize the system is significantly
longer. This suggests that UC+T prioritizes a shorter system
stabilization time over load loss rate in the relatively rare
cases where quickly stopping the cascading process leads to
increased load loss.
D. Effects of the Tree Partitioning
We now demonstrate the failure localization capability of
tree partitioning by comparing tree-based and mesh-based ap-
proaches (namely UC+T vs. UC+M and AGC+T vs. AGC+M).
Similarly, for each pair of approaches, we demonstrate in
Figure 5 the difference of AGR for every scenario.
For 118-bus, 179-bus and 240-bus networks, both UC+T
and UC+M result in similar amount of failure scenarios with
load loss, while the UC+T approach achieves slightly lower
LLR on average, as shown in Tables III and IV. However, in
terms of the localization performance, tree-based approaches
tend to adjust fewer generators as shown in Figure 5. More
importantly, generators under UC+T always try to mitigate the
failure with only local adjustments. Generation adjustments in
the control areas without any failures almost never happen.
As in the previous subsection, for the IEEE 200-bus net-
work, UC+T results in more failure scenarios with load loss or
adjusted generations than UC+M, possibly due to the sparsity
of the original network. Nevertheless, the difference in terms
of average load loss is negligible.
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020
0
50
100
150
200
Failure ID
-60
-40
-20
0
(a) 118-bus network
0
100
200
300
Failure ID
-60
-40
-20
0
(b) 179-bus network
0
50
100
150
200
250
Failure ID
-60
-40
-20
0
(c) 200-bus network
0
100
200
300
400
Failure ID
-60
-40
-20
0
(d) 240-bus network
0
50
100
150
200
Failure ID
-60
-40
-20
0
(e) 118-bus network
0
100
200
300
Failure ID
-60
-40
-20
0
(f) 179-bus network
0
50
100
150
200
250
Failure ID
-60
-40
-20
0
(g) 200-bus network
0
100
200
300
400
Failure ID
-60
-40
-20
0
(h) 240-bus network
Figure 4. Difference of LLR between UC-based and AGC-based approaches.
0
50
100
150
200
Failure ID
-100
-50
0
50
100
(a) 118-bus network
0
100
200
300
Failure ID
-100
-50
0
50
100
(b) 179-bus network
0
50
100
150
200
250
Failure ID
-100
-50
0
50
100
(c) 200-bus network
0
100
200
300
400
Failure ID
-100
-50
0
50
100
(d) 240-bus network
0
50
100
150
200
Failure ID
-100
-50
0
50
100
(e) 118-bus network
0
100
200
300
Failure ID
-100
-50
0
50
100
(f) 179-bus network
0
50
100
150
200
250
Failure ID
-100
-50
0
50
100
(g) 200-bus network
0
100
200
300
400
Failure ID
-100
-50
0
50
100
(h) 240-bus network
Figure 5. Difference of AGR between tree structure based and mesh structure based approaches.
VII. C
ONCLUSIONS
In this paper we propose an integrated approach to failure
mitigation and localization consisting of topology design and
real-time response. Both analytical results and case studies
validate the potential and capability of our proposed control.
There are several research directions for further exploration,
including: (i) extending the approach to non-linear system and
dynamics, (ii) understanding the optimal trade-off between lo-
calization and redundancy, and (iii) managing generation/line
reserves to avoid severe failures.
R
EFERENCES
[1] M. Vaiman, K. Bell, Y. Chen, B. Chowdhury, I. Dobson, P. Hines,
M. Papic, S. Miller, and P. Zhang, “Risk assessment of cascading
outages: Methodologies and challenges,”
IEEE Transactions on Power
Systems
, vol. 27, no. 2, pp. 631–641, May 2012.
[2] C. D. Brummitt, R. M. DSouza, and E. A. Leicht, “Suppressing cas-
cades of load in interdependent networks,”
Proceedings of the National
Academy of Sciences
, vol. 109, no. 12, pp. E680–E689, 2012.
[3] Z. Kong and E. M. Yeh, “Resilience to degree-dependent and cascading
node failures in random geometric networks,”
IEEE Transactions on
Information Theory
, vol. 56, no. 11, pp. 5533–5546, Nov. 2010.
[4] P. Crucitti, V. Latora, and M. Marchiori, “A topological analysis of the
Italian electric power grid,”
Physica A: Statistical mechanics and its
applications
, vol. 338, no. 1-2, pp. 92–97, 2004.
[5] P. Hines, I. Dobson, and P. Rezaei, “Cascading power outages propagate
locally in an influence graph that is not the actual grid topology,”
IEEE
Transactions on Power Systems
, vol. 32, no. 2, pp. 958–967, 2017.
21st Power Systems Computation Conference
PSCC 2020
Porto, Portugal — June 29 – July 3, 2020