2004.10401.pdf

An Integrated Approach for Failure Mitigation &

Localization in Power Systems

Chen Liang

∗

, Linqi Guo

∗

, Alessandro Zocca

†

, Shuyue Yu

∗

, Steven H. Low

∗

and Adam Wierman

∗

Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA

{

cliang2, lguo, syu5, slow, adamw

}

@caltech.edu

†

Department of Mathematics, Vrije Universiteit, Amsterdam, Netherlands

a.zocca@vu.nl

Abstract

—The transmission grid is often comprised of several

control areas that are connected by multiple tie lines in a mesh

structure for reliability. It is also well-known that line failures can

propagate non-locally and redundancy can exacerbate cascading.

In this paper, we propose an integrated approach to grid

reliability that (i) judiciously switches off a small number of tie

lines so that the control areas are connected in a tree structure;

and (ii) leverages a unified frequency control paradigm to provide

congestion management in real time. Even though the proposed

topology reduces redundancy, the integration of tree structure at

regional level and real-time congestion management can provide

stronger guarantees on failure localization and mitigation. We

illustrate our approach on the IEEE 39-bus network and evaluate

its performance on the IEEE 118-bus, 179-bus, 200-bus and

240-bus networks with various network congestion conditions.

Simulations show that, compared with the traditional approach,

our approach not only prevents load shedding in more failure

scenarios, but also incurs smaller amounts of load loss in

scenarios where load shedding is inevitable. Moreover, generators

under our approach adjust their operations more actively and

efficiently in a local manner.

Index Terms

—cascading failure, failure mitigation, frequency

control, power system reliability, topology design

I. I

NTRODUCTION

Reliability is critical in power systems. Tremendous efforts

from both the industry and academia have been made to

analyze cascading failures. Current industry practice is typi-

cally simulation-based, where contingencies are studied using

extensive simulations [1]. Such approaches are often limited

by computational power.

To provide tractable analysis, pure topological models have

been proposed, where failures propagate locally to neighboring

components with high probability [2]–[4]. However, these

epidemic models are not realistic as non-local failure prop-

agation is observed in both real-world and simulated cascade

data [5]. More realistic models use linearized DC power flow

to characterize power redistribution after transmission line

failures [6]–[9]. These DC models indeed exhibit both local

and non-local propagation of failures. See [10] for an extensive

This work has been supported by Resnick Fellowship, Linde Institute Re-

search Award, NWO Rubicon grant 680.50.1529, NSF through awards ECCS

1619352, CNS 1545096, CCF 1637598, ECCS 1739355, CNS 1518941, CPS

154471, ARPA-E through award de-ar0000699 (NODES), and DTRA through

award HDTRA 1-15-1-0003.

list of cascading failure models. It is observed in [9] that

successive failures can be quite far away from initial failures

under the DC model, which aligns with real cascade data.

Such non-local failure propagation comes from the inter-

connectivity of the power network. The transmission grid is

usually comprised of several control areas, which are operated

relatively independently with prescribed power exchanges de-

termined by economic dispatch and maintained by automatic

generation control (AGC) [11]. Traditionally, control areas are

interconnected by multiple tie lines in a

mesh

structure to

provide multiple alternative routes for power, as redundancy

often improves reliability [12]. Surprisingly, it is shown in [13]

that

tree partitioning

of the grid insulates the impact of failures

and precisely captures the boundaries of failure propagation.

This means that, while providing redundancy, multiple tie lines

also exacerbate non-local failure propagation.

Failure models based on DC power flow often assume power

injections remain unchanged after a failure as long as the

network remains connected. This is a reasonable assumption

under the traditional frequency control dynamics that operate

at a fast timescale (see Section II), although this connection

has not been widely mentioned in the literature. An important

feature of our work is that we explicitly model the interaction

between frequency control dynamics at a fast timescale and the

DC power flow at a slow timescale in the cascading process,

and leverage this interaction for failure mitigation.

In this paper, we propose an integrated approach to grid

reliability consisting of two main components:

topology design

and

real-time response

. For topology design, we propose to

judiciously switch off a small number of tie lines so that the

control areas are interconnected in a tree structure. At real-

time, we leverage the recently proposed unified frequency con-

trol [14] to provide congestion management at a fast timescale.

Even though the proposed approach reduces redundancy in

network topology, the integrated design of tree-connected

control areas and real-time congestion management provides

stronger guarantees on failure localization and mitigation,

leading to a higher overall reliability. This new framework

builds on our earlier work [13], [15]. Here, we extend the

approach and evaluate its performance in IEEE test networks.

We show in Tables III and IV of Section VI using simula-

tions over the IEEE 118-bus, 179-bus, 200-bus and 240-bus

networks that our proposed approach not only prevents load

shedding in more failure scenarios, but also incurs smaller

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020

arXiv:2004.10401v1 [eess.SY] 22 Apr 2020

amounts of load loss in scenarios where load shedding is

inevitable. Moreover, generators under our approach respond

more actively and efficiently to terminate cascading failures

with only local adjustments.

The rest of the paper is organized as follows. We review

in Section II the DC failure model and propose an integrated

failure model that incorporates the effect of frequency control

on failure propagation. Our proposed approach is presented

in Section III with implementation details on topology design

and real-time response. We then demonstrate the theoretical

guarantees and benefits of our approach in Section IV. Lastly,

we illustrate and evaluate our approach over the IEEE test

networks in Sections V and VI. All proofs are omitted and

can be found in a detailed 3-part paper under preparation.

II. F

AILURE

ODELS

To begin, we review a widely used DC failure propagation

model, and then propose an integrated model that extends the

DC model to incorporate frequency control dynamics.

A. DC Failure Model

Power grids are usually modeled by a set of non-linear and

non-convex AC power flow equations [11]. It is, however,

less efficient for large-scale power networks. In this paper,

we adopt the linearized DC power flow for tractability, which

is widely used in contingency analysis [6]–[9], [16]. In par-

ticular, we represent the power transmission network by a

directed graph

= (

)

, where

{

,...,n

}

and

{

,...,e

}

are the sets of buses and transmission

lines, respectively. The terms bus/node and line/edge are used

interchangeably in this paper. An edge

between nodes

and

is denoted as either

(

i,j

)

, and an arbitrary direction

is assigned to each edge. We assume lines are purely reactive,

and each line

is characterized by its susceptance

The cascading process is described in stages indexed by

∈

. The topology at stage

is denoted as

(

) =

(

))

and

(0)

denotes the original network. Given the

power injections and phase angles

(

)

,θ

(

)

∈

, the line

flows

(

)

∈

(

)

are the solution to the following DC

power flow equations:

(

) =

(

)

(

)

(1a)

(

) =

(

)

(

)

(

)

(1b)

where

(

) := diag

(

,...,b

(

)

is the diagonal suscep-

tance matrix, and

(

)

∈

(

)

is the node-edge incidence

matrix. Rows and columns of

(

)

correspond to nodes and

edges. For every edge

= (

i,j

)

∈ E

(

)

, we set

i,e

(

) = 1

and

j,e

(

) =

−

, while all other entries are set to zero.

In order for the linear system (1) to have a solution, the

power must be balanced on each island of the network, i.e.,

∑

∈N

(

) = 0

for each connected component

(

)

Under this condition, line flows are uniquely determined by

(

) =

(

)

(

)

(

)

(

)

(

)

†

(

)

where

(

)

†

denotes the Moore-Penrose inverse.

We now formally describe the DC failure model. The

cascade starts from an initial set of line failures and propagates

in stages. At each stage

∈

, assume a set

(

)

⊂E

(

−

of lines fail. The power injections

(

)

first adjust based on

certain balancing rule

for each island, then the line flows

redistribute over the new topology

(

) = (

(

))

, where

(

) =

(

−

(

)

. We adopt the following deterministic

outage rule: every line

with power flow exceeding its line

limit

is tripped at the next stage, i.e.,

(

+ 1) =

{

(

)

> π

∈ E

(

)

}

. If all lines are within their limits

(

+ 1) =

∅

), the cascade is terminated; otherwise the

process repeats for stage

+ 1

The evolution of the cascading failure critically depends on

the power balancing rule

. A commonly used rule

in the

cascading failure literature can be described as follows [9]: (i)

If the network remains connected after the failure

(

)

, then

(

)

remains the same as previous stage

(

−

; otherwise

(ii) all the nodes proportionally adjust their injections to

compensate for the imbalance on each island. As explained

in the next subsection, this rule can be interpreted as a special

case of the integrated failure model that we now present.

B. Integrated Failure Model

From this subsection on, we consider the failure dynamics

for a single stage and drop the index

for presentation clarity.

We use superscript

(

)

to denote the pre-contingency nominal

steady-state values, and symbols followed by

(

)

to indicate

the dynamic process.

Assume the pre-contingency grid

= (

)

operates

at a nominal steady-state

(

,θ

)

which satisfies the DC

power flow equation (1), i.e.

(

)

†

Given a set

of line failures, we use

to denote the

surviving lines. Let

B,C

denote the susceptance and incidence

matrices for the surviving network

= (

)

. The dynamics

for bus frequency

deviations

(

)

∈

|N|

and line flows

(

)

∈

|E|

over the remaining lines can be described by

the following linear swing and power flow equations [11]:

−

(

)

−

Dω

(

)

−

(

)

(2a)

(

)

(2b)

where

∈

|N|×|N|

is the diagonal matrix containing

information about the system inertia,

(

)

∈

|N|

is the

deviation

on controllable power injections (such as droop

control, generator ramping in response of power imbalance,

and load-side participation), and

Dω

(

)

∈

|N|

denotes the

deviation

on system damping as well as load dynamics. The

initial values for dynamics (2) are

(0) = 0

and

(0) =

Note here that the pre-contingency injection

should be

interpreted as the sum of nominal values of generator injection,

controllable injection, load, and system damping. As such,

(

)

represents the system response in terms of the controllable load

deviation after a failure event.

A state

∗

:= (

∗

,ω

∗

)

∈

|N|

|E|

is said to

be an

equilibrium

if the right-hand sides of (2) are zero at

∗

We do not explicitly model the phase angle dynamics

and,

hence, do not enforce zero frequency deviation at equilibrium in our setup.

This allows (2) to model both primary and secondary frequency control,

demonstrating their close connections in a common framework. See [17] for

more discussions.

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020

It is clear that, at equilibrium, the post-contingency line flows

∗

satisfy the DC power flow equations with

post-contingency

injections

∗

−

∗

−

Dω

∗

over

post-contingency

network

. This fact suggests that the power balancing rule mentioned

in the previous subsection can be interpreted as a result of

the linear frequency dynamics. Indeed, we show that when

classical droop control is adopted, i.e.

(

) =

(

)

, the

balancing rule

is recovered.

As shown in [18], the equilibrium of (2) under droop control

is the optimal solution of the following optimization

min

ω,d,f,θ

∑

∈N

(3a)

s.t.

−

Dω

Cf,

(3b)

−

= 0

(3c)

If the grid becomes disconnected with several islands

,...,

}

, then the optimal solution of (3) is:

∗

∑

∈N

(

)

∑

∈N

for

∈N

(4a)

∗

∑

∈N

(

)

∑

∈N

for

∈N

(4b)

Thus, in the equilibrium state the power injections adjust

linearly in the power imbalance

∑

∈N

on each island

of the post-contingency network

, precisely as prescribed by

the power balancing rule

We now describe our integrated failure model, which ex-

tends DC power flow at a slow timescale by incorporating

frequency dynamics at a fast timescale. After line failures, in-

stead of solving DC power flow equations with balancing rule

, the system evolves according to the frequency dynamics

(2) and converges to an equilibrium. As before, a line trips in

the next round if and only if its

steady-state

flow exceeds its

capacity, while an overload during transient does not cause a

line failure. This is a reasonable assumption, as line outages

normally require time for thermal accumulation.

Compared with DC model, there are many benefits of

this integrated failure model. First, it provides a clear ex-

planation of the power balancing rules already introduced in

the literature. Indeed, the validity of various balancing rules

can be justified in a similar manner from a frequency dy-

namic perspective. Second, the equilibrium of (2) can usually

be efficiently obtained from optimization problems like (3).

Tractable analysis can thus be performed without simulating

transient dynamics. More importantly, the integrated failure

model offers a systematic method to analyze network evolution

under various control actions, allowing us to

reverse engineer

the controller

(

)

in order to find a potentially better system

response to failures. Our proposal of integrating the unified

controller, to be presented in next section, in the context of

failure mitigation is an example of how we can leverage this

model to improve the system reliability in achieving desirable

control properties.

For simplicity, we assume the constraints on control actions

are inactive.

III. A

NTEGRATED

PPROACH TO

AILURE

ITIGATION

In this section, we first give a high-level description of our

proposed approach and then provide technical and implemen-

tation details on the topology design and real-time response.

A. Overview

Our proposal to improve grid reliability consists of two main

components:

topology design

and

real-time response

. It aims

to achieve the following desirable properties:

•

Optimal mitigation

: Cascading failures should be stopped,

and system adjustments for generations and loads (includ-

ing load shedding) should be minimized.

•

Local impact

: Disturbances in a control area should not

impact other areas if at all possible. Control areas can

thus be operated relatively independently.

•

Autonomous response

: The control actions should be im-

plemented in real-time in an autonomous and distributed

manner. Current approaches to failure mitigation often

involves human in the loop, rendering it slower, less

optimal and possibly more error-prone.

For the topology design, we propose a tree structure at

the control area level, contrary to the conventional design

where areas are connected in a mesh structure through mul-

tiple tie lines for grid reliability. At real-time, we adopt a

unified controller for frequency regulation that also manages

congestion at fast timescale. A distributed detection algorithm

is implemented in parallel to assess the severity of failures and

adjust the controller to stabilize the grid when necessary.

B. Topology Design

Redundancy has been the key mechanism for grid reliability,

e.g., the

−

security standard [11], [19]–[21]. Different

control areas in current transmission grids are thus mesh-

connected by multiple tie lines in order to provide multiple

alternative routes for power to flow through.

It has been shown recently that such a redundancy-based

design allows the impact of failures to propagate more broadly,

while tree partitioning of the network guarantees control area

independence, i.e., line failures are constrained within their

own control area [13]. We thus propose to judiciously switch

off a small number of tie lines so that the resulting control

areas are connected in a tree structure, aiming to improve grid

reliability through better failure localization.

More specifically, consider a grid

= (

)

with control

areas described by a partition

,...,

}

, where

∩ N

∅

for

and

⋃

. We denote

(

) :=

{

(

s,t

)

∈E|

∈N

}

as the set of all

tie lines connecting different areas. The reduced graph

(

)

under partition

is a graph obtained from

by collapsing

each area

into a “super node” and adding an edge between

super nodes

and

for each tie line connecting them. As

mentioned earlier, the redundancy-based design usually leads

to a non-simple (i.e., there may be multiple lines between two

super nodes) or cyclic reduced graph. Our method aims to

select a subset of tie lines

⊂T

(

)

to switch off, such that

the control areas of the remaining network are connected in a

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020

tree topology, i.e. the reduced graph

(

)

is a tree. This

implies that

(

)

|−

+ 1

tie lines will be switched off.

Similarly to line failures, tie line switching actions change

the system operating point as power flows redistribute in the

new network topology. Let

denote the average injections for

topology design purpose. We choose the set

of candidate

lines to minimize

network congestion level

(

)

defined as:

(

) = max

∈E\

(

)

/π

where

(

)

is the line flow on line

after the lines in

are switched off. We are therefore interested in solving the

optimization problem:

min

⊂T

(

)

(

)

(5a)

s.t.

(

E \

)

is a tree.

(5b)

The complexity of the optimization (5) originates from

finding all possible subsets

(

)

to switch off. Solving

the above optimization problem often becomes intractable for

large-scale power grids. Nevertheless, such switching actions

are only implemented occasionally, rather than continuously

in real time. An approximate but faster algorithm is proposed

in [22] where tree-connected areas are created by recursively

splitting the existing ones, yielding very good results for most

application scenarios.

We remark that it is not guaranteed that

(

∗

)

≤

, where

∗

is an optimal selection, implying that some transmission

lines may become overloaded after switching actions. This

may be alleviated if one has the flexibility to design the

control areas of the grid. We refer interested readers to [22]

for optimal partitioning of the grid using network modularity

clustering algorithms. However,

(

∗

)

indeed holds for

most practical scenarios simulated in [22], especially when the

original grid is not heavily congested.

C. Real-time Response

Once a tree-connected control area structure is created, the

unified controller (UC) is implemented as a frequency regu-

lation method to autonomously respond to disturbances such

as loss of generation/load and line failures. The closed-loop

dynamics of UC are more elaborate than (2); see [14], [18]

for details. It is shown there that, under mild conditions, the

closed-loop equilibrium under UC is globally asymptotically

stable. Moreover it is the optimal solution of the following

optimization:

min

f,d,θ

∑

∈N

(6a)

s.t.

−

= 0

(6b)

θ,

(6c)

(6d)

≤

, e

∈E

(6e)

≤

, j

∈N

(6f)

where (6b) and (6c) describe the DC power flow equa-

tions, (6d) ensures zero area control error, (6e) enforces

line flow limits, and (6f) enforces control limits. Matrix

describes the control areas

,...,

}

, namely

∈ {

}

×|N|

is defined as

(

i,j

)

= 1

if node

∈ N

and

(

i,j

)

= 0

otherwise. Therefore, the

-th row of the

constraint

ensures that the net total power

flow on tie lines connected to area

is restored to the pre-

contingency value

Explaining the design of UC (which prescribes the dynamics

of controllable power injection

(

)

) is beyond the scope of

this paper; interested readers are referred to [18]. Nevertheless,

it should be noted that the controller is derived from a variant

of primal-dual algorithm to solve (6), where the primal updates

are carried out by the network dynamics itself, while the

dual variables

are updated with only local communications.

Under mild conditions, UC converges to the optimal solution

of (6), provided that the optimization is feasible.

If the disturbances are small, (6) is likely to be feasible and

UC is then guaranteed to drive the network to an equilibrium,

where the nominal frequency and the inter-area flows are

restored (zero area control error), and more importantly the

line limits are enforced. Such failures can thus be properly

mitigated, leveraging the congestion management of UC.

However, when a

severe

disturbance occurs, (6) may no

longer be feasible, which implies that the controller is not

capable of achieving all its control goals. If such failures

happen, UC is no longer stable and can potentially lead to a

large scale outage. It is thus crucial to promptly identify such

severe failures and respond accordingly. To do so, we propose

a distributed algorithm to assess the severity of failures in

real-time, and a proactive procedure to adjust the controller to

stabilize the system if necessary; see [15] for more details.

1) Severe Failure Detection:

Severe failures can be detected

by monitoring the dual variables of UC in real-time. As shown

in the following proposition, after a severe failure at least one

dual variable grows arbitrarily large during the transient phase

and the system never reaches an equilibrium point. Therefore,

a warning can be raised whenever a dual variable exceeds a

predefined threshold.

Proposition III.1.

If the optimization problem

(6)

is infeasible

and a primal-dual update algorithm is implemented to solve

(6)

, then there exists a dual variable

such that

lim sup

→∞

(

)

∞

This approach is guaranteed to detect all severe failures, but

it may yield false alarms as some dual variables can possibly

become very large during transient even when (6) is feasible.

There is thus a trade-off between the speed and accuracy of

detection: a larger threshold would reduce the false alarm

rate, but also requires longer time to detect severe failures.

Therefore, the threshold should be selected carefully based on

system parameters and application scenarios.

If a control area is disconnected from the grid, the corresponding row of

(6d) will be lifted to balance the injections.

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020

2) Response to Severe Failures:

When severe failures hap-

pen, it is impossible for UC to enforce all the constraints

in (6). To alleviate the impact of such failures and stabilize

the system, we propose to adjust the controller by proactively

lifting constraints in (6) until it becomes feasible again.

However, not all constraints can be relaxed, as some of

them are essential to system stability. In particular, (6b) and

(6c) represents the physics of power flow, while (6e) ensures

post-contingency flows are safe. The other two constraints can

be lifted without compromising stability as follows.

First, some of the zero area control error constraints

can be lifted by setting the corresponding

dual variables to

. This action allows more control areas to

adjust their operations in the mitigation of the failure. This may

be undesirable as it is counter to our goal of failure localiza-

tion. Second, the range

[

]

for the controllable injections

can be relaxed by enlarging its width. This action usually

corresponds to shedding loads, which is clearly undesirable.

By relaxing some rows of the

con-

straints and allowing controlled load shedding, problem (6)

can always be made feasible with all line flows satisfying

their limits (more details below). Hence any severe failure

can be terminated and will not cascade out of control, though

with possible degradation on localization performance and

potential loss of load. Therefore, such a constraint lifting

procedure should be implemented carefully to prioritize the

desired objectives.

IV. D

ISCUSSION

In this section, we discuss our proposed approach and

demonstrate how it achieves all our design goals.

A. Guaranteed Mitigation Performance

Our proposed approach provably terminates line failures

and optimally drives the system to a desired equilibrium.

For these guarantees the unified frequency control is of key

importance. Assume the pre-contingency flows are within safe

limits. It can be shown that the relaxed optimization problem

in which (6d) is lifted and (6f) is enlarged by allowing all

loads to be shedded, yields a trivial feasible point by reducing

all generations and loads to zero. Therefore, regardless of

whether the failure is severe or not, we can always relax (6)

progressively to a feasible optimization, at the cost of more

affected areas and/or potential load loss. Nevertheless, given

the global stability of UC, the system is guaranteed to converge

to an equilibrium such that post-contingency flows are below

line limits, i.e., successive failures are prevented.

The objective (6a) should be interpreted as the penalty

for the control actions

. The optimal solution

∗

is ex-

actly the adjustment of power injections at the equilibrium

under the unified frequency control in response to a failure.

Therefore, UC manages to mitigate the failure optimally in

terms of minimizing the injection adjustments. We remark that

the control actions can be further separated into generations

and loads

−

, and the objective (6a) can be

designed to implement different priorities of generation/load

adjustments. For instance, if the objective (6a) is modified to

∑

∈N

(

)

2

(

)

2

, then smaller

’s can be used to prefer

generation adjustments over load adjustments.

B. Guaranteed Localization Performance

The proposed approach is guaranteed to terminate failures

with only local impact at equilibrium if the failure is not

severe. The tree structure plays a crucial role in this aspect. To

demonstrate our localization result, we clarify the meaning of

“local impact” through the notion of

associated areas

. We say

an area

is associated with failures

if there exists an edge

= (

i,j

)

∈

such that either

∈N

. Moreover,

we only study the localization when the system converges to an

equilibrium, while the non-local response during the transient

stage is ignored.

The following theorem characterizes the localization per-

formance of our proposed approach: after a non-severe failure

the system converges to an equilibrium where the failure is

mitigated with unchanged injections for non-associated areas.

This guarantee extends the results in [13] in two ways: (i) both

bridge

and non-bridge failures are localized within associated

areas; (ii) failures are terminated so that successive failures

with potential global impact are prevented.

Theorem IV.1.

Assume the control areas are tree-connected.

Given a set

of failures, if

(6)

is feasible, then

∗

= 0

for

all

∈N

is not associated with

This strong localization guarantee stems from the combi-

nation of tree-connected control areas and unified frequency

control. Without the tree structure, UC would only guarantee

the overall power imbalance is restored in each control area,

while the individual operating conditions may already vary.

C. Other Benefits

Another appealing feature of our proposed control is that it

can be implemented in an autonomous and distributed fashion,

allowing for real-time response to failures. The problem of

how to optimally shed load to mitigate failures has already

been considered in the literature e.g. [8]. The authors solve

a similar optimization problem but using a central controller

that requires global information and failure information. In

contrast, our approach operates as a closed-loop distributed

controller and drives the system to a desired equilibrium

autonomously.

Our method has also another economic benefit. In practice,

−

preventive SCOPF is solved to ensure system reliability.

Our approach, however, can serve as a corrective method to

mitigate non-severe failures. Thus SCOPF only need consider

those severe failures, potentially leading to significant savings.

V. A

LLUSTRATIVE

XAMPLE

In this section, we illustrate the dynamic response of our ap-

proach for the IEEE 39-bus network with parameters adapted

from [18]. It consists of two control areas, which are connected

by three tie lines, namely

and

(26

27)

A bridge is a cut edge whose removal disconnects the network.

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020

Figure 1. IEEE 39-bus network with two control areas from [18]. The two

blue tie lines are switched off to create a tree structure with

as bridge.

We switch off the tie lines

and

(26

27)

, chosen

heuristically as those with the smallest absolute line flow. Two

control areas are then connected in a tree structure as shown

in Figure 1. We implement the unified controller on all nodes

as follows. We only allow to adjust generations at first, but, if

a severe failure is detected, the controller is then allowed to

reduce loads. As a last resort, zero area control error can be

lifted. The threshold is set to

pu for dual variables.

As illustrated in Figure 2, in the case of the non-severe fail-

ure

14)

, the dual variables are always below the threshold

and the system quickly converges to a safe equilibrium. On

the other hand, the failure

leads to unstable oscillations

of dual variables and a severe warning is raised at

sec, as

depicted in Figure 3. The controller is then allowed to shed

loads, action that quickly re-stabilizes the system. Note that

the flow on line

(25

26)

remains unchanged at steady state for

both failures, as it belongs to a non-associated control area.

Time

-0.02

-0.01

0.01

0.02

Dual Variable (pu)

(a) Dual variable dynamics

Time

-3

-2

-1

Flow Deviation (pu)

(4,5)

(6,7)

(25,26)

(b) Line flow dynamics

Figure 2. System dynamics after the non-severe failure of line

14)

. The

controller is only allowed to reduce generations.

VI. S

IMULATIONS

A. Simulation Setup

We evaluate our integrated approach on the IEEE 118-bus,

179-bus, 200-bus and 240-bus networks. For each network, we

first partition it into two control areas using network modu-

larity clustering algorithm as proposed in [22]. The tie line

with largest absolute flow is selected as the remaining bridge,

Time

-0.06

-0.04

-0.02

0.02

0.04

Dual Variable (pu)

Severe failure detected!

(a) Dual variable dynamics

Time

-2

-1

Flow Deviation (pu)

(4,5)

(4,14)

(25,26)

(b) Line flow dynamics

Figure 3. System dynamics after the severe failure of line

. A warning

is raised at time

sec, at which point the controller is allowed to shed loads.

Table I

TATISTICS ON

IEEE

TEST NETWORKS

Network

# of Edges

Control Area Sizes

# of Tie Lines

118

186

(83, 35)

179

263

(98, 81)

200

242

(103, 94)

240

448

(143, 97)

while all other tie lines are switched off. More information on

these test networks are provided in Table I.

The simulated scenarios are created as follows. For each

network, we first adopt the nominal load profile from [23] and

obtain the generation profile through a DC OPF on the original

network. The power injections from this DC OPF problem are

used for both the original network and the network after line

switching, while the resulting nominal line flows will generally

be different. Next, we iterate over every transmission line as

the initial single-line failure (tie lines that have been switched

off are ignored) and simulate the cascading process under four

control approaches:

1) Unified controller on tree-connected areas;

2) Unified controller on mesh-connected areas;

3) Automatic generation control (AGC)

on tree-connected

areas;

4) Automatic generation control on mesh-connected areas.

For presentation simplicity, we refer these approaches as

UC+T, UC+M, AGC+T and AGC+M. Moreover, to test their

performance under various network congestion conditions, we

scale the line flow limit and generation limit by a common fac-

tor

= 0

. For a fair comparison, all strategies follow

the same relaxation procedure as described in the previous

section when the optimization is infeasible. We remark that

since AGC does not enforce line limits, the cascading failure

may proceed for multiple stages as described in Section II,

while UC always terminates the failure at the first stage.

We quantify the mitigation performance using the

load loss

rate

(LLR), defined as the ratio of the total loss of load to the

total demand. The localization performance is evaluated by

the

adjusted generator rate

(AGR) which is the ratio of the

number of generators whose operating points have changed

after the failures to the total number of generators. The range

We model AGC as an algorithm that solves (6) without flow limits (6e).

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020

for both LLR and AGR is between

100%]

, and a smaller

value indicates better mitigation/localization performance. The

complete results are summarized in Tables III and IV for

reference. In the following subsections, we highlight several

insights from these results.

B. Performance of Our Approach

We first evaluate the performance of our integrated approach

UC+T, and compare it with the traditional approach AGC+M.

Table II shows the fraction of failure scenarios with load

loss (LL) and adjusted generations (AG), as well as the

average LLR and AGR of those scenarios for these two control

approaches. Each cell contains three values, representing the

results for

= 0

respectively.

In terms of failure mitigation, our proposal prevents load

shedding from happening in more failure scenarios over 118-

bus, 179-bus, and 240-bus networks under almost all network

congestion conditions. More importantly, our approach always

achieves higher grid reliability, in the sense that the average

load loss, for failures where load shedding is inevitable,

is uniformly lower compared with the traditional AGC+M

approach. This improvement is particularly significant when

the grid becomes congested (smaller

). The average LLR,

averaged over all failure scenarios with load loss, is 3% or

lower under UC+T but can be as high as 19% under AGC+M.

Moreover, large-scale outages with load loss larger than

10%

almost never happen under our proposed UC+T approach.

In terms of failure localization, we observe that on 118-

bus, 179-bus and 240-bus networks, generators under UC+T

are adjusted for more failure scenarios when the network is

less congested, while achieving less load loss compared to the

traditional approach. When the congestion becomes worse, the

traditional AGC+M approach not only leads to more scenarios

with adjusted generators, but results in more load loss as

well. This fact indicates that generators under UC+T approach

adjust their operations more actively in response to failures.

Moreover, generators react more efficiently and locally under

UC+T, as the average AGR is always lower.

It should be noted that there are more failure scenarios

with load loss or generator adjustment for our approach over

the 200-bus network. Nevertheless, the LLR and AGR are

still uniformly lower on average under UC+T. One possible

explanation might be that the 200-bus network is very sparse

(see Table I), with limited power transfer capacity across

control areas and, hence, more prone to potential failures. By

adjusting the generators and loads more proactively, UC+T

tries to make sure the post-contingency injections are more

robust in the new topology. This, again, confirms that UC+T

tends to adjust generators more actively to failures, and the

adjustment is prescribed in a more efficient and local manner.

C. Effects of the Unified Controller

The benefits of our approach stem from both UC that en-

forces line limits at fast timescale and tree-connected topology

We highlight in bold font the approach with better performance (two values

may look the same after rounding). See Tables III and IV for more precise

results.

Table II

TATISTICS ON

UC+T (

FIRST ROW

)

AND

AGC+M (

SECOND ROW

)

Network

118

179

200

240

% of Scen.

w/ LL

% of Scen.

w/ AG

Avg.

LLR (%)

Avg.

AGR (%)

that localizes the impact of initial failures. In this subsection

and the next one we disentangle the benefit of each.

We first demonstrate the load loss reduction capability

of UC by comparing UC-based and AGC-based approaches

(UC+T vs. AGC+T and UC+M vs. AGC+M). For each pair

of approaches, we compute the difference of LLR for every

failure scenario and plot the results in Figure 4. Note that the

failure index is re-ordered so that the difference of LLR is in

a non-decreasing order for better presentation.

The main insight from Figure 4 is that UC-based approaches

(whether the control areas are tree-connected or not) almost

always reduce load loss, and such reductions are more pro-

nounced as network congestion increases. We remark that

there are indeed failure scenarios with less load loss under

AGC-based approaches. When this happens, however, the

failures always cascade through multiple stages under AGC

and, hence, the time to re-stabilize the system is significantly

longer. This suggests that UC+T prioritizes a shorter system

stabilization time over load loss rate in the relatively rare

cases where quickly stopping the cascading process leads to

increased load loss.

D. Effects of the Tree Partitioning

We now demonstrate the failure localization capability of

tree partitioning by comparing tree-based and mesh-based ap-

proaches (namely UC+T vs. UC+M and AGC+T vs. AGC+M).

Similarly, for each pair of approaches, we demonstrate in

Figure 5 the difference of AGR for every scenario.

For 118-bus, 179-bus and 240-bus networks, both UC+T

and UC+M result in similar amount of failure scenarios with

load loss, while the UC+T approach achieves slightly lower

LLR on average, as shown in Tables III and IV. However, in

terms of the localization performance, tree-based approaches

tend to adjust fewer generators as shown in Figure 5. More

importantly, generators under UC+T always try to mitigate the

failure with only local adjustments. Generation adjustments in

the control areas without any failures almost never happen.

As in the previous subsection, for the IEEE 200-bus net-

work, UC+T results in more failure scenarios with load loss or

adjusted generations than UC+M, possibly due to the sparsity

of the original network. Nevertheless, the difference in terms

of average load loss is negligible.

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020

100

150

200

Failure ID

-60

-40

-20

(a) 118-bus network

100

200

300

Failure ID

-60

-40

-20

(b) 179-bus network

100

150

200

250

Failure ID

-60

-40

-20

100

200

300

400

Failure ID

-60

-40

-20

(d) 240-bus network

100

150

200

Failure ID

-60

-40

-20

(e) 118-bus network

100

200

300

Failure ID

-60

-40

-20

(f) 179-bus network

100

150

200

250

Failure ID

-60

-40

-20

(g) 200-bus network

100

200

300

400

Failure ID

-60

-40

-20

(h) 240-bus network

Figure 4. Difference of LLR between UC-based and AGC-based approaches.

100

150

200

Failure ID

-100

-50

100

(a) 118-bus network

100

200

300

Failure ID

-100

-50

100

(b) 179-bus network

100

150

200

250

Failure ID

-100

-50

100

200

300

400

Failure ID

-100

-50

100

(d) 240-bus network

100

150

200

Failure ID

-100

-50

100

(e) 118-bus network

100

200

300

Failure ID

-100

-50

100

(f) 179-bus network

100

150

200

250

Failure ID

-100

-50

100

(g) 200-bus network

100

200

300

400

Failure ID

-100

-50

100

(h) 240-bus network

Figure 5. Difference of AGR between tree structure based and mesh structure based approaches.

VII. C

ONCLUSIONS

In this paper we propose an integrated approach to failure

mitigation and localization consisting of topology design and

real-time response. Both analytical results and case studies

validate the potential and capability of our proposed control.

There are several research directions for further exploration,

including: (i) extending the approach to non-linear system and

dynamics, (ii) understanding the optimal trade-off between lo-

calization and redundancy, and (iii) managing generation/line

reserves to avoid severe failures.

EFERENCES

[1] M. Vaiman, K. Bell, Y. Chen, B. Chowdhury, I. Dobson, P. Hines,

M. Papic, S. Miller, and P. Zhang, “Risk assessment of cascading

outages: Methodologies and challenges,”

IEEE Transactions on Power

Systems

, vol. 27, no. 2, pp. 631–641, May 2012.

[2] C. D. Brummitt, R. M. DSouza, and E. A. Leicht, “Suppressing cas-

cades of load in interdependent networks,”

Proceedings of the National

Academy of Sciences

, vol. 109, no. 12, pp. E680–E689, 2012.

[3] Z. Kong and E. M. Yeh, “Resilience to degree-dependent and cascading

node failures in random geometric networks,”

IEEE Transactions on

Information Theory

, vol. 56, no. 11, pp. 5533–5546, Nov. 2010.

[4] P. Crucitti, V. Latora, and M. Marchiori, “A topological analysis of the

Italian electric power grid,”

Physica A: Statistical mechanics and its

applications

, vol. 338, no. 1-2, pp. 92–97, 2004.

[5] P. Hines, I. Dobson, and P. Rezaei, “Cascading power outages propagate

locally in an influence graph that is not the actual grid topology,”

IEEE

Transactions on Power Systems

, vol. 32, no. 2, pp. 958–967, 2017.

21st Power Systems Computation Conference

PSCC 2020

Porto, Portugal — June 29 – July 3, 2020