of 19
Quantifying Near-Threshold CMOS Circuit
Robustness
Sean Keller
, Siddharth S. Bhargav
, Chris Moore
, Alain J. Martin
Department of Computer Science
California Institute of Technology
Pasadena, CA 91125, USA
{
sean,cc,alain
}
@async.caltech.edu
Department of Electrical Engineering
University of Southern California
Los Angeles, CA 90089, USA
ssbharga@usc.edu
Abstract
—In order to build energy efficient digital CMOS
circuits, the supply voltage must be reduced to near-threshold.
Problematically, due to random parameter variation, supply
scaling reduces circuit robustness to noise. Moreover, the effects
of parameter variation worsen as device dimensions diminish,
further reducing robustness, and making parameter variation
one of the most significant hurdles to continued CMOS scaling.
This paper presents a new metric to quantify circuit robustness
with respect to variation and noise along with an efficient method
of calculation. The method relies on the statistical analysis of
standard cells and memories resulting an an extremely compact
representation of robustness data. With this metric and method of
calculation, circuit robustness can be included alongside energy,
delay, and area during circuit design and optimization.
I. I
NTRODUCTION
It is difficult to design efficient and robust modern binary
digital systems; the sheer complexity of utilizing upwards of
a billion devices [1] necessitates the use of numerous levels
of logical abstraction throughout the design flow. Errors intro-
duced at different levels of abstraction can result in circuits
that fail to function as expected for a number of reasons (
e.g.,
timing, design, and functional failures) [2]. Understanding and
quantifying these different modes of failure is important, but
failures in the
base digital assumption
supersede all other
failures. If a gate cannot switch between logic values, then
it cannot perform computation, and assuring correctness with
respect to
e.g.,
timing, is moot. Functional failures of this sort
can be further divided into many classes [3]; the focus of this
paper is on active device parametric failures [4],
i.e.,
failures
caused by one of the most significant hurdles for the future of
CMOS scaling [5]: parameter variation.
Parameter variation is caused by stochastic process variation
and intrinsic parameter fluctuations (IPF); it is the primary
reason why modern digital circuits that function at the process
nominal supply voltage (
V
DD
) eventually fail as the supply
is lowered [6]. More importantly, parameter variation makes
functional digital circuits less robust and hence less reliable
[6]–[14]. This reduction in robustness may be of little conse-
quence at the process nominal
V
DD
, but, as
V
DD
is lowered,
it becomes a critical design concern. Problematically, in order
to minimize the power consumption and energy demands of
modern digital CMOS circuits, the supply voltage must be
scaled sub-threshold or near-threshold [2], [8], [15]–[20]. As
such, in order to build reliable low-power digital systems,
it is essential to quantify circuit robustness as a function of
parameter variation, which is the primary goal of this paper.
The prevailing trend is to perform a simple statistical
analysis of worst-case gates and to choose a minimum
V
DD
above which most (or many) gates are likely to function
despite parameter variation [6]. The problem with this type
of analysis is that it may not be sufficient in real circuits due
to the presence of electrical noise. Noise can be mitigated but
is fundamentally unavoidable and has proven to be a limiting
effect in engineering digital systems for decades [21]. This
paper proposes a metric and method with which to quantify
circuit robustness in terms of parameter variation with respect
to noise. Moreover, the method presented is efficient and
scalable. The computationally expensive component is limited
to a small set of cells that make up modern standard cell
libraries and memories, and the calculation of robustness cost
is linear in the number of instances of these cells (typically in
the range of millions to billions).
The remainder of this paper is organized as follows. Section
II reviews background material on parameter variation and
circuit noise analysis. Section III introduces the notion of
circuit robustness and static noise margins. Section IV details
the method for calculating robustness for inverters, and Section
V extends the method to a larger set of CMOS gates. Section
VI discusses related works, and finally, Section VII concludes
the paper and discusses potential future research.
II. B
ACKGROUND
A. Parameter Variation
In modern CMOS technologies, device parameters such as
channel length, oxide thickness, dopant concentration, etc.
1
can have significant deviations from their nominal values due
to process-induced and intrinsic parameter fluctuations [22].
Process variability can be considered a global, predictable,
and gradual skew in device characteristics introduced by the
complexity of manufacturing chips [23] (
e.g.,
from thermal
gradients during fabrication [24]). Intrinsic parameter fluc-
tuations are truly statistical in nature and cause significant
deviations from device to device within a chip. Intrinsic
variations can be attributed to atomistic effects (
e.g.,
random
dopant fluctuation (RDF)) and device structure variations (
e.g.,
line edge roughness (LER)) [22], [23], [25]. There are a
number of different ways to characterize and partition these
effects, and the approach used in this paper is to consider a
global component wherein all devices on a chip are affected
in the same way, and a local component wherein each device
on a chip has a number of statistical parameters drawn from
distributions with mean values set by the global skew. This
style of partitioning variation is not as accurate as a full
combined statistical model, but it is a good, albeit slightly
pessimistic approximation [23].
Considering variation in terms of a global and a local com-
ponent simplifies statistical analysis and still permits the circuit
designer to choose, for example, a worst-case
3
σ
global corner
wherein the die that fall outside of this range are assumed
not to yield and should not be optimized for. For circuits
operating subthreshold, the local component of variation is
dominated by RDF and is accurately modeled by normally
distributed uncorrelated device threshold (
V
t
) variation [26].
Near-threshold, local variation does exhibit some degree of
spatial correlation, and at the process-nominal
V
DD
spatial
correlation is significant and cannot be ignored. This increase
in the spatial correlation of local variation as a function of
V
DD
can be attributed to the fact that channel-length variation
has little effect on devices operating subthreshold but becomes
the dominant effect at approximately twice the threshold
voltage [26]. Channel length variation is spatially correlated
between devices within some radius, and is straightforward
to model [23], [26], [27]. Given that the focus of this paper
is to quantify the robustness of low-power subthreshold and
near-threshold circuits, local parameter variation is treated as
random and uncorrelated; however, the effects of spatial corre-
lation can be included. Furthermore, SPICE simulations, along
with foundry-provided statistically-extracted BSIM4 models,
are used throughout this paper as a basis for correctness;
these models are considered accurate over the entire device
operating range [28].
B. Circuit Noise
Circuit noise can be partitioned into a physical compo-
nent (
e.g.,
thermal noise) and a man-made digital switching
component [21]. The dominant sources of physical noise in
modern CMOS (which have significant impact on RF CMOS
circuits) are
1
/f
noise and thermal noise [29]. Switching
noise is caused by the rapid full-rail voltage swings typical in
digital systems, and includes cross-talk (due to capacitive and
inductive coupling), charge sharing, supply-rail and ground
noise, and substrate noise. These switching-noise sources dom-
inate physical noise by several orders of magnitude in digital
circuits, and they must be accounted for in the design margins
in order to build robust digital systems (even in the absence
of appreciable parameter variation) [30]. Accurate modeling of
each switching-noise source is possible, but highly impractical
for the simulation and analysis of large circuits (millions or bil-
lions of devices). It is, however, possible to lump all switching-
noise sources together into equivalent series voltage sources
between gates [30]. These noise voltage sources are most
accurately modeled as time-varying (
i.e.,
AC) sources [31],
but using a static DC voltage is an acceptable approximation
[21].
C. Static DC Analysis
Logic-gates in modern technologies exhibit a number of
frequency-dependent effects, and incorporating these effects
greatly increases the complexity of analysis. Fortunately, static
DC analysis has proven to be an excellent basis for a wide
range of digital circuit characterizations. The first works to dis-
cuss the requirements for functional digital circuits [32]–[34]
exclusively perform DC analysis. Numerous modern works,
e.g.,
[14], [35], [36], also rely on the DC analysis of digital
circuits, because in the context of determining functionality,
noise resilience, and reliability, it is representative. Moreover,
as discussed in Section I, timing failures (which probably
cannot be quantified with DC analysis alone) fall outside of
the scope of this work. In this paper static DC conditions are
assumed throughout, and the corresponding canonical method
of analysis, voltage transfer characteristics (VTCs)—the static
output voltage of a gate as a function of input voltage—are
used extensively.
III. D
EFINING
C
IRCUIT
R
OBUSTNESS
Parameter variation and noise have a significant impact on
circuit robustness, and the primary goal of this paper is to
quantify this impact. To that end, it is necessary to define the
notion of robustness with the intuition that increasing parame-
ter variation tends to reduce robustness to noise. Consider two
circuits,
C
1
and
C
2
, operating at the same supply voltage;
C
1
is more robust than
C
2
if and only if
C
1
can tolerate more
noise than
C
2
. That is, as the circuit noise increases,
C
2
fails
to function before
C
1
. With statistical parameter variation, the
notion of failure naturally becomes a probability. Robustness
can be defined such that
C
1
is more robust than
C
2
if and only
if for the same quantity of noise in both circuits the probability
that
C
1
fails is less than the probability that
C
2
fails.
As discussed in Section I, the failures of interest are active
device parametric failures, wherein a gate or memory erro-
neously changes state (between binary digital values) because
of parameter variation. Circuit noise acts to make these failures
more likely, and robust circuits need to function correctly
despite parameter variation and switching noise. In order to
quantify functional failures due to variation and noise it is nec-
essary to define what it means for a gate or memory to change
state. Toward this, consider the
base digital assumption
: the
2
abstraction of networks of transistors as logic-gates, and logic-
gates as Boolean functions over Boolean logic-values. This
abstraction relies on the definition of a mapping between
logic-values and a physical quantity: the electrical potential
of charge stored on capacitive gate nodes. In the simplest
mapping, nodes near the supply rail potential,
V
DD
, represent
a logic-
1
, and nodes near
GND
represent a logic-
0
; however,
it is surprisingly difficult to define
near
. That is, it is difficult
to give an exact (necessary and sufficient) mapping between
node voltages and logic values for an arbitrary network of
logic-gates, because each logic-gate
interprets
input voltages
differently.
In a real CMOS circuit, no two gates are identical. They
differ in function, topology, and sizing; and distinct instances
of the same gate differ because of parameter variation. Con-
sider an inverter; if a
0
is applied to its input, then a
1
is
produced on its output. Similarly, a
1
at the input results in a
0
at the output. The problem is that it is possible—by way of
intentional construction or parameter variation—to have two
distinct inverters,
INV
1
and
INV
2
, that behave differently.
Suppose that for input voltages near
V
DD
or
GND
,
INV
1
and
INV
2
behave logically identically and correctly (
i.e.,
they
invert), but for some input voltage,
V
X
, between
V
DD
and
GND
,
INV
1
produces a
0
on its output and
INV
2
produces a
1
. In this situation,
INV
1
and
INV
2
interpret
V
X
differently.
The situation is further complicated when the notion of the
output voltage level is considered. That is, the output of
INV
1
is really only a
0
when a subsequent gate
interprets
it as such,
and so on down a chain of gates.
Since different gates have different
interpretations
of input
voltages, the exact mapping between voltage levels and logic
values needs to be defined in terms of this
interpretation
(as
opposed to using a global bound). That is, suppose that worst-
case boundaries on voltages are defined by
V
H
and
V
L
, where
it is known that all gates in a circuit
interpret
voltages above
V
H
as a
1
and all voltages below
V
L
as
0
; then the mapping
of
V
(
G
)
> V
H
1
and
V
(
G
)
< V
L
0
is sufficient
for some notion of correct operation, but it is not necessary.
This distinction is important, because this sort of worst-case
definition is simple but not practical for the analysis of modern
low-voltage circuits.
Consider an example that demonstrates the trouble with
using the worst-case definitions for
V
H
and
V
L
in low-voltage
applications. Figure 1 depicts the VTCs for 100 instances of
a minimum-size inverter in a modern 40-nm low-power bulk
CMOS process with
V
DD
= 200
mV
; the curves vary signif-
icantly due to random parameter variation. These VTCs have
remarkably similar shapes and are nearly identical modulo
horizontal translation. As such, it is reasonable to consider
defining
V
H
= 180
mV
and
V
L
= 20
mV
as worst-case
output high and low voltages, respectively (these boundaries
are also depicted by blue and red lines respectively in Figure
1). The problem with this worst-case output mapping is that
the corresponding input voltages that yield a logical-
1
on the
output then range from
25
mV
to
150
mV
; similarly, the input
voltages that yield a logical-
0
on the output range from
65
mV
to
195
mV
. These ranges overlap, so a worst-case mapping of
input voltages to logic values cannot be defined (the nonsen-
sical worst-case mapping would be
V
(
G
)
>
65
mV
1
and
V
(
G
)
<
150
mV
0
).
0
20
40
60
80
100
120
140
160
180
200
V
in
(
mV
)
0
20
40
60
80
100
120
140
160
180
200
V
out
(
mV
)
VTC
V
H
V
L
Fig. 1: Voltage transfer characteristics for 100 Monte Carlo
trials of a minimum-size inverter in a commercial 40-nm low-
power CMOS process utilizing foundry provided statistical
models for local random parameter variation at the TT global
corner (
V
DD
= 200
mV
at
25
C
TT-Corner).
A. Static Noise Margin
IN V
b
V
noise
IN V
a
V
noise
Fig. 2: Cross-coupled inverter pair and DC noise voltage
sources.
A better approach to defining a local notion of
interpretation
stems from static noise margin (SNM) analysis. The static
noise margin of cross-coupled inverters was first presented in
[33], [34] and later clarified in [37] and [38]. Consider Figure
2; the SNM of this cross-coupled pair represents the largest
DC noise voltage,
V
noise
, that can be applied between the
bistable pair before the inverters switch state (between logic-
0
and logic-
1
). If the SNM of a cross-coupled pair is less than or
equal to zero (
e.g.,
due to parametric variation), then the pair
is not bistable;
i.e.,
it is unable to hold two distinct logic states
(a functional failure). If the SNM of the pair is infinitesimally
greater than zero, then the cell can hold two distinct logic
3
states, but a diminutive noise can act to switch these states, so
the cell is not robust. Given that noise is always present, all
cross-coupled pairs of inverters in a digital system must have
static noise margins in excess of the system noise in order to
maintain state.
1
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
.
0
1
.
1
V
in
(
V
)
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
.
0
1
.
1
V
out
(
V
)
(
V
IL
,
V
OH
)
(
V
IH
,
V
OL
)
VTC
Unity
Gain
Points
Fig. 3: Voltage transfer characteristic for a minimum-size
inverter in a commercial 40-nm low-power CMOS process
(
V
DD
= 1
.
1
V
at
25
C
). The unity gain points are used to
define the VTC parameters:
V
OH
,V
OL
,V
IH
,V
IL
.
There are several mathematically equivalent methods used
to measure static noise margins [37]. One such method
involves analyzing the unity gain points (
|
dV
out
dV
in
|
= 1
) of
the voltage transfer characteristic. Consider
INV
a
(
INV
b
)
from Figure 2: a static CMOS inverter consisting of a sin-
gle NFET and PFET, with the VTC depicted in Figure 3.
Both the functionality of the inverter and the definition of
SNM rely on two properties of the VTC holding: (1) two
unity gain points exist and (2) the slope between the unity
gain points exceeds unity in absolute value [35]. From these
unity gain points, four properties of an inverter VTC can be
defined:
V
OH
,V
OL
,V
IH
,V
IL
, as in Figure 3 (see [38] for
details). (These four points are referred to as VTC parameters
throughout.) The VTC parameters serve to demark definable
boundaries between the voltages that are
interpreted
as a logic-
1
or logic-
0
, and the undefined region of high-gain in between.
That is,
V
IH
can be considered the lowest voltage that the
inverter correctly
interprets
as a
1
and
V
IL
as the highest
voltage that it correctly
interprets
as a
0
. Similarly,
V
OH
can
be considered the lowest voltage that the inverter will output
as a
1
, and
V
OL
the highest voltage that the inverter will output
as a
0
.
In general, when one gate
drives
another gate, a static noise
1
In real memories,
e.g.,
SRAM arrays, the SNM during both reading and
writing of cells need to be considered [36]. Furthermore, ensuring a SNM of
greater than zero is necessary, but it may not be sufficient for ensuring read
stability and write-ability [11].
margin can be defined. This static noise margin can be broken
into two components: a noise margin high (
NM
H
) and a noise
margin low (
NM
L
) (one for each logic value). Consider a pair
of inverters, with
INV
x
driving
INV
y
. The two components
of the corresponding noise margin are defined as
NM
H
(
INV
x
,INV
y
) =
V
OH
(
INV
x
)
V
IH
(
INV
y
)
,
(1)
and,
NM
L
(
INV
x
,INV
y
) =
V
IL
(
INV
y
)
V
OL
(
INV
x
)
.
(2)
The static noise margin is defined as the smaller of
NM
H
or
NM
L
.
SNM
(
INV
x
,INV
y
) =
min
(
NM
L
(
INV
x
,INV
y
)
,NM
H
(
INV
x
,INV
y
))
.
(3)
These relations are implicit functions of
V
DD
.
2
For cross-coupled inverters, as in Figure 2,
INV
a
drives
INV
b
, and
INV
b
drives
INV
a
, so two different static
noise margins can be defined,
SNM
(
INV
a
,INV
b
)
and
SNM
(
INV
b
,INV
a
)
. With a few assumption about the
VTCs,
3
the condition that
SNM
(
INV
a
,INV
b
)
> V
noise
SNM
(
INV
b
,INV
a
)
> V
noise
is a necessary and sufficient
condition for differentiation of binary logic-values by way of
the electrical potential stored on the output of each inverter
[33], [34], [37]. The static noise margin of cross-coupled
inverters plays an important role in quantifying circuit robust-
ness, but the notion must be extended to incorporate parametric
variability and generalized in order to apply it to arbitrary
networks of gates.
B. Statistical Robustness
This section defines a robustness metric for cross-coupled
inverters that includes parameter variation and noise by way
of a statistical noise margin constraint. When considering two
different circuits,
C
1
and
C
2
, operating with the same supply
voltage,
C
1
is more robust than
C
2
if and only if for the same
quantity of noise in both circuits the probability that
C
1
fails is
less than the probability that
C
2
fails. That is, for two different
circuits
C
1
and
C
2
,
ROB
(
C
1
)
> ROB
(
C
2
)
P
(
FAIL
(
C
1
))
< P
(
FAIL
(
C
2
))
,
(4)
where
ROB
corresponds to circuit robustness and
FAIL
to
circuit failure.
Switching noise in digital circuits can be estimated with
known-methods [21], [30], and, as with other common metrics,
e.g.,
power and cycle time, it can be reduced and optimized for
(typically at some cost;
e.g.,
spreading wires reduces coupling
noise at the expense of area). As such, the circuit designer can
choose a noise margin target,
NM
T
: a minimum noise margin
constraint for all gates.
4
If any gate has a noise margin less
2
Equations 1, 2, and 3 (and all dependent equations) are actually implicit
functions of all operating parameters,
e.g.,
temperature, body potentials, etc.
3
The VTCs must be monotonic and have a single inflection point.
4
A unique noise margin target can be chosen for each gate (if desired). In
this way,
noisy
gates can be assigned larger targets than
quiet
gates.
4
than or equal to the
NM
T
, then the gate is said to fail, as is
the entire circuit containing the failing gate. Consider a cross-
coupled inverter-pair,
INV
a
and
INV
b
, (as in Figure 2 with
V
noise
= 0
V
) operating at a particular
V
DD
. The probability
of failure for a pair can then be defined such that
P
(
FAIL
(
INV
a
,NM
T
)
FAIL
(
INV
b
,NM
T
))
=
P
(
SNM
(
INV
a
,INV
b
)
NM
T
SNM
(
INV
b
,INV
a
)
NM
T
)
.
(5)
For a circuit,
C
a
, consisting of
n
cross-coupled inverter-pairs,
i.e.,
C
a
= (
INV
i
a
,INV
i
b
)
for
i
∈{
1
,
2
,...,n
}
,
P
(
FAIL
(
C
a
,NM
T
)) =
P
i
∈{
1
,
2
,...,n
}
FAIL
(
INV
i
a
,NM
T
)
FAIL
(
INV
i
b
,NM
T
)
.
(6)
These two relations treat both the probability of failure and
SNM as random variables (RVs). In order to compute these
quantities, the corresponding distributions and the effects of
correlation are considered in Section IV. These two relations
are generalized for application to arbitrary networks of gates
in Section V.
IV. C
ALCULATING
R
OBUSTNESS
One of the goals of this paper is to define a method for
calculating robustness in such a way that it can be feasibly
computed for large circuits (billions of gates), and which also
fits in with the most prevalent method of system design,
i.e.,
standard-cell hierarchical digital circuit design. This necessi-
tates the construction of a new compact model for statistical
robustness with parameters that can be stored alongside timing
and energy data in standard cell libraries. Moreover, the model
must be defined such that the compact data is composable;
i.e.,
the robustness of an arbitrary network of standard cells must
be computable by the composition of robustness data from
member cells. In this way, the robustness of a large circuit
(built out of standard cells) can be readily calculated.
A. Statistical VTC Parameters
Device parameter variation results in variation in the static
noise margins of gates; the precise relationship depends on
the type of parameter variation and the device operating
regime (subthreshold see [14], [36], and above threshold
see [39], [40]). The variation in
SNM
can be analyzed
in terms of
NM
H
and
NM
L
variation (see Equation 3).
Similarly,
NM
H
and
NM
L
can be considered in terms of the
corresponding constituent VTC parameters,
V
OH
,V
IH
, and
V
OL
,V
IL
, respectively (see Equations 1 and 2). In modern
bulk CMOS technologies, the output VTC parameters of a
gate,
V
OH
and
V
OL
, can be considered regular (not random)
variables.
5
The input VTC parameters,
V
IH
and
V
IL
, are
5
First-order analysis in [14] finds
V
OH
and
V
OL
to be global constants
dependent only on temperature when operating in the subthreshold regime.
Including second order affects and near-threshold operation induces a depen-
dence on
V
DD
and gate topology, so
V
OH
and
V
OL
are treated as regular
variables.
normal random variables [36]. Consider Figure 1 (in Section
III): for a particular gate (an inverter) operating at a particular
supply voltage (200mV) the output VTC parameters,
V
OH
and
V
OL
, are nearly constant and close to
V
DD
and
GND
,
respectively (consider the blue and red lines). The horizontal
translation between this family of VTC curves—due to random
parameter variation—corresponds to shifts in the input VTC
parameters,
V
IH
and
V
IL
.
50
0
50
100
150
200
250
V
DD
(
mV
)
0
.
000
0
.
005
0
.
010
0
.
015
0
.
020
0
.
025
Probability Density
V
IL
V
IH
Fig. 4:
V
IH
and
V
IL
distributions for a minimum-size inverter
in a commercial 40-nm low-power CMOS process at the TT-
Corner (
V
DD
= 200
mV
at
25
C
).
100
150
200
250
300
350
400
450
V
DD
(
mV
)
0
.
000
0
.
005
0
.
010
0
.
015
0
.
020
Probability Density
V
IL
V
IH
Fig. 5:
V
IH
and
V
IL
distributions for a minimum-size inverter
in a commercial 40-nm low-power CMOS process at the TT-
Corner (
V
DD
= 600
mV
at
25
C
).
The input VTC parameter are normally distributed with
mean and standard deviation determined by the supply voltage,
5