snm.pdf

Quantifying Near-Threshold CMOS Circuit

Robustness

Sean Keller

∗

, Siddharth S. Bhargav

†

, Chris Moore

∗

, Alain J. Martin

∗

Department of Computer Science

California Institute of Technology

Pasadena, CA 91125, USA

{

sean,cc,alain

}

@async.caltech.edu

†

Department of Electrical Engineering

University of Southern California

Los Angeles, CA 90089, USA

ssbharga@usc.edu

Abstract

—In order to build energy efficient digital CMOS

circuits, the supply voltage must be reduced to near-threshold.

Problematically, due to random parameter variation, supply

scaling reduces circuit robustness to noise. Moreover, the effects

of parameter variation worsen as device dimensions diminish,

further reducing robustness, and making parameter variation

one of the most significant hurdles to continued CMOS scaling.

This paper presents a new metric to quantify circuit robustness

with respect to variation and noise along with an efficient method

of calculation. The method relies on the statistical analysis of

standard cells and memories resulting an an extremely compact

representation of robustness data. With this metric and method of

calculation, circuit robustness can be included alongside energy,

delay, and area during circuit design and optimization.

I. I

NTRODUCTION

It is difficult to design efficient and robust modern binary

digital systems; the sheer complexity of utilizing upwards of

a billion devices [1] necessitates the use of numerous levels

of logical abstraction throughout the design flow. Errors intro-

duced at different levels of abstraction can result in circuits

that fail to function as expected for a number of reasons (

e.g.,

timing, design, and functional failures) [2]. Understanding and

quantifying these different modes of failure is important, but

failures in the

base digital assumption

supersede all other

failures. If a gate cannot switch between logic values, then

it cannot perform computation, and assuring correctness with

respect to

e.g.,

timing, is moot. Functional failures of this sort

can be further divided into many classes [3]; the focus of this

paper is on active device parametric failures [4],

i.e.,

failures

caused by one of the most significant hurdles for the future of

CMOS scaling [5]: parameter variation.

Parameter variation is caused by stochastic process variation

and intrinsic parameter fluctuations (IPF); it is the primary

reason why modern digital circuits that function at the process

nominal supply voltage (

) eventually fail as the supply

is lowered [6]. More importantly, parameter variation makes

functional digital circuits less robust and hence less reliable

[6]–[14]. This reduction in robustness may be of little conse-

quence at the process nominal

, but, as

is lowered,

it becomes a critical design concern. Problematically, in order

to minimize the power consumption and energy demands of

modern digital CMOS circuits, the supply voltage must be

scaled sub-threshold or near-threshold [2], [8], [15]–[20]. As

such, in order to build reliable low-power digital systems,

it is essential to quantify circuit robustness as a function of

parameter variation, which is the primary goal of this paper.

The prevailing trend is to perform a simple statistical

analysis of worst-case gates and to choose a minimum

above which most (or many) gates are likely to function

despite parameter variation [6]. The problem with this type

of analysis is that it may not be sufficient in real circuits due

to the presence of electrical noise. Noise can be mitigated but

is fundamentally unavoidable and has proven to be a limiting

effect in engineering digital systems for decades [21]. This

paper proposes a metric and method with which to quantify

circuit robustness in terms of parameter variation with respect

to noise. Moreover, the method presented is efficient and

scalable. The computationally expensive component is limited

to a small set of cells that make up modern standard cell

libraries and memories, and the calculation of robustness cost

is linear in the number of instances of these cells (typically in

the range of millions to billions).

The remainder of this paper is organized as follows. Section

II reviews background material on parameter variation and

circuit noise analysis. Section III introduces the notion of

circuit robustness and static noise margins. Section IV details

the method for calculating robustness for inverters, and Section

V extends the method to a larger set of CMOS gates. Section

VI discusses related works, and finally, Section VII concludes

the paper and discusses potential future research.

II. B

ACKGROUND

A. Parameter Variation

In modern CMOS technologies, device parameters such as

channel length, oxide thickness, dopant concentration, etc.

can have significant deviations from their nominal values due

to process-induced and intrinsic parameter fluctuations [22].

Process variability can be considered a global, predictable,

and gradual skew in device characteristics introduced by the

complexity of manufacturing chips [23] (

e.g.,

from thermal

gradients during fabrication [24]). Intrinsic parameter fluc-

tuations are truly statistical in nature and cause significant

deviations from device to device within a chip. Intrinsic

variations can be attributed to atomistic effects (

e.g.,

random

dopant fluctuation (RDF)) and device structure variations (

e.g.,

line edge roughness (LER)) [22], [23], [25]. There are a

number of different ways to characterize and partition these

effects, and the approach used in this paper is to consider a

global component wherein all devices on a chip are affected

in the same way, and a local component wherein each device

on a chip has a number of statistical parameters drawn from

distributions with mean values set by the global skew. This

style of partitioning variation is not as accurate as a full

combined statistical model, but it is a good, albeit slightly

pessimistic approximation [23].

Considering variation in terms of a global and a local com-

ponent simplifies statistical analysis and still permits the circuit

designer to choose, for example, a worst-case

global corner

wherein the die that fall outside of this range are assumed

not to yield and should not be optimized for. For circuits

operating subthreshold, the local component of variation is

dominated by RDF and is accurately modeled by normally

distributed uncorrelated device threshold (

) variation [26].

Near-threshold, local variation does exhibit some degree of

spatial correlation, and at the process-nominal

spatial

correlation is significant and cannot be ignored. This increase

in the spatial correlation of local variation as a function of

can be attributed to the fact that channel-length variation

has little effect on devices operating subthreshold but becomes

the dominant effect at approximately twice the threshold

voltage [26]. Channel length variation is spatially correlated

between devices within some radius, and is straightforward

to model [23], [26], [27]. Given that the focus of this paper

is to quantify the robustness of low-power subthreshold and

near-threshold circuits, local parameter variation is treated as

random and uncorrelated; however, the effects of spatial corre-

lation can be included. Furthermore, SPICE simulations, along

with foundry-provided statistically-extracted BSIM4 models,

are used throughout this paper as a basis for correctness;

these models are considered accurate over the entire device

operating range [28].

B. Circuit Noise

Circuit noise can be partitioned into a physical compo-

nent (

e.g.,

thermal noise) and a man-made digital switching

component [21]. The dominant sources of physical noise in

modern CMOS (which have significant impact on RF CMOS

circuits) are

noise and thermal noise [29]. Switching

noise is caused by the rapid full-rail voltage swings typical in

digital systems, and includes cross-talk (due to capacitive and

inductive coupling), charge sharing, supply-rail and ground

noise, and substrate noise. These switching-noise sources dom-

inate physical noise by several orders of magnitude in digital

circuits, and they must be accounted for in the design margins

in order to build robust digital systems (even in the absence

of appreciable parameter variation) [30]. Accurate modeling of

each switching-noise source is possible, but highly impractical

for the simulation and analysis of large circuits (millions or bil-

lions of devices). It is, however, possible to lump all switching-

noise sources together into equivalent series voltage sources

between gates [30]. These noise voltage sources are most

accurately modeled as time-varying (

i.e.,

AC) sources [31],

but using a static DC voltage is an acceptable approximation

[21].

C. Static DC Analysis

Logic-gates in modern technologies exhibit a number of

frequency-dependent effects, and incorporating these effects

greatly increases the complexity of analysis. Fortunately, static

DC analysis has proven to be an excellent basis for a wide

range of digital circuit characterizations. The first works to dis-

cuss the requirements for functional digital circuits [32]–[34]

exclusively perform DC analysis. Numerous modern works,

e.g.,

[14], [35], [36], also rely on the DC analysis of digital

circuits, because in the context of determining functionality,

noise resilience, and reliability, it is representative. Moreover,

as discussed in Section I, timing failures (which probably

cannot be quantified with DC analysis alone) fall outside of

the scope of this work. In this paper static DC conditions are

assumed throughout, and the corresponding canonical method

of analysis, voltage transfer characteristics (VTCs)—the static

output voltage of a gate as a function of input voltage—are

used extensively.

III. D

EFINING

IRCUIT

OBUSTNESS

Parameter variation and noise have a significant impact on

circuit robustness, and the primary goal of this paper is to

quantify this impact. To that end, it is necessary to define the

notion of robustness with the intuition that increasing parame-

ter variation tends to reduce robustness to noise. Consider two

circuits,

and

, operating at the same supply voltage;

is more robust than

if and only if

can tolerate more

noise than

. That is, as the circuit noise increases,

fails

to function before

. With statistical parameter variation, the

notion of failure naturally becomes a probability. Robustness

can be defined such that

is more robust than

if and only

if for the same quantity of noise in both circuits the probability

that

fails is less than the probability that

fails.

As discussed in Section I, the failures of interest are active

device parametric failures, wherein a gate or memory erro-

neously changes state (between binary digital values) because

of parameter variation. Circuit noise acts to make these failures

more likely, and robust circuits need to function correctly

despite parameter variation and switching noise. In order to

quantify functional failures due to variation and noise it is nec-

essary to define what it means for a gate or memory to change

state. Toward this, consider the

base digital assumption

: the

abstraction of networks of transistors as logic-gates, and logic-

gates as Boolean functions over Boolean logic-values. This

abstraction relies on the definition of a mapping between

logic-values and a physical quantity: the electrical potential

of charge stored on capacitive gate nodes. In the simplest

mapping, nodes near the supply rail potential,

, represent

a logic-

1

, and nodes near

GND

represent a logic-

0

; however,

it is surprisingly difficult to define

near

. That is, it is difficult

to give an exact (necessary and sufficient) mapping between

node voltages and logic values for an arbitrary network of

logic-gates, because each logic-gate

interprets

input voltages

differently.

In a real CMOS circuit, no two gates are identical. They

differ in function, topology, and sizing; and distinct instances

of the same gate differ because of parameter variation. Con-

sider an inverter; if a

0

is applied to its input, then a

1

produced on its output. Similarly, a

1

at the input results in a

0

at the output. The problem is that it is possible—by way of

intentional construction or parameter variation—to have two

distinct inverters,

INV

and

INV

, that behave differently.

Suppose that for input voltages near

GND

INV

and

INV

behave logically identically and correctly (

i.e.,

they

invert), but for some input voltage,

, between

and

GND

INV

produces a

0

on its output and

INV

produces a

1

. In this situation,

INV

and

INV

interpret

differently.

The situation is further complicated when the notion of the

output voltage level is considered. That is, the output of

INV

is really only a

0

when a subsequent gate

interprets

it as such,

and so on down a chain of gates.

Since different gates have different

interpretations

of input

voltages, the exact mapping between voltage levels and logic

values needs to be defined in terms of this

interpretation

(as

opposed to using a global bound). That is, suppose that worst-

case boundaries on voltages are defined by

and

, where

it is known that all gates in a circuit

interpret

voltages above

as a

1

and all voltages below

0

; then the mapping

(

)

> V

↔

1

and

(

)

< V

↔

0

is sufficient

for some notion of correct operation, but it is not necessary.

This distinction is important, because this sort of worst-case

definition is simple but not practical for the analysis of modern

low-voltage circuits.

Consider an example that demonstrates the trouble with

using the worst-case definitions for

and

in low-voltage

applications. Figure 1 depicts the VTCs for 100 instances of

a minimum-size inverter in a modern 40-nm low-power bulk

CMOS process with

= 200

; the curves vary signif-

icantly due to random parameter variation. These VTCs have

remarkably similar shapes and are nearly identical modulo

horizontal translation. As such, it is reasonable to consider

defining

= 180

and

= 20

as worst-case

output high and low voltages, respectively (these boundaries

are also depicted by blue and red lines respectively in Figure

1). The problem with this worst-case output mapping is that

the corresponding input voltages that yield a logical-

1

on the

output then range from

150

; similarly, the input

voltages that yield a logical-

0

on the output range from

195

. These ranges overlap, so a worst-case mapping of

input voltages to logic values cannot be defined (the nonsen-

sical worst-case mapping would be

(

)

↔

1

and

(

)

150

↔

0

100

120

140

160

180

200

(

)

100

120

140

160

180

200

out

(

)

VTC

Fig. 1: Voltage transfer characteristics for 100 Monte Carlo

trials of a minimum-size inverter in a commercial 40-nm low-

power CMOS process utilizing foundry provided statistical

models for local random parameter variation at the TT global

corner (

= 200

◦

TT-Corner).

A. Static Noise Margin

IN V

noise

IN V

noise

Fig. 2: Cross-coupled inverter pair and DC noise voltage

sources.

A better approach to defining a local notion of

interpretation

stems from static noise margin (SNM) analysis. The static

noise margin of cross-coupled inverters was first presented in

[33], [34] and later clarified in [37] and [38]. Consider Figure

2; the SNM of this cross-coupled pair represents the largest

DC noise voltage,

noise

, that can be applied between the

bistable pair before the inverters switch state (between logic-

and logic-

). If the SNM of a cross-coupled pair is less than or

equal to zero (

e.g.,

due to parametric variation), then the pair

is not bistable;

i.e.,

it is unable to hold two distinct logic states

(a functional failure). If the SNM of the pair is infinitesimally

greater than zero, then the cell can hold two distinct logic

states, but a diminutive noise can act to switch these states, so

the cell is not robust. Given that noise is always present, all

cross-coupled pairs of inverters in a digital system must have

static noise margins in excess of the system noise in order to

maintain state.

(

)

out

(

)

(

)

(

)

VTC

Unity

Gain

Points

Fig. 3: Voltage transfer characteristic for a minimum-size

inverter in a commercial 40-nm low-power CMOS process

(

= 1

◦

). The unity gain points are used to

define the VTC parameters:

There are several mathematically equivalent methods used

to measure static noise margins [37]. One such method

involves analyzing the unity gain points (

out

= 1

) of

the voltage transfer characteristic. Consider

INV

(

INV

)

from Figure 2: a static CMOS inverter consisting of a sin-

gle NFET and PFET, with the VTC depicted in Figure 3.

Both the functionality of the inverter and the definition of

SNM rely on two properties of the VTC holding: (1) two

unity gain points exist and (2) the slope between the unity

gain points exceeds unity in absolute value [35]. From these

unity gain points, four properties of an inverter VTC can be

defined:

, as in Figure 3 (see [38] for

details). (These four points are referred to as VTC parameters

throughout.) The VTC parameters serve to demark definable

boundaries between the voltages that are

interpreted

as a logic-

1

or logic-

0

, and the undefined region of high-gain in between.

That is,

can be considered the lowest voltage that the

inverter correctly

interprets

as a

1

and

as the highest

voltage that it correctly

interprets

as a

. Similarly,

can

be considered the lowest voltage that the inverter will output

as a

, and

the highest voltage that the inverter will output

as a

0

In general, when one gate

drives

another gate, a static noise

In real memories,

e.g.,

SRAM arrays, the SNM during both reading and

writing of cells need to be considered [36]. Furthermore, ensuring a SNM of

greater than zero is necessary, but it may not be sufficient for ensuring read

stability and write-ability [11].

margin can be defined. This static noise margin can be broken

into two components: a noise margin high (

) and a noise

margin low (

) (one for each logic value). Consider a pair

of inverters, with

INV

driving

INV

. The two components

of the corresponding noise margin are defined as

(

INV

,INV

) =

(

INV

)

−

(

INV

)

(1)

and,

(

INV

,INV

) =

(

INV

)

−

(

INV

)

(2)

The static noise margin is defined as the smaller of

SNM

(

INV

,INV

) =

min

(

INV

,INV

)

,NM

(

INV

,INV

))

(3)

These relations are implicit functions of

For cross-coupled inverters, as in Figure 2,

INV

drives

INV

, and

INV

drives

INV

, so two different static

noise margins can be defined,

SNM

(

INV

,INV

)

and

SNM

(

INV

,INV

)

. With a few assumption about the

VTCs,

the condition that

SNM

(

INV

,INV

)

> V

noise

∩

SNM

(

INV

,INV

)

> V

noise

is a necessary and sufficient

condition for differentiation of binary logic-values by way of

the electrical potential stored on the output of each inverter

[33], [34], [37]. The static noise margin of cross-coupled

inverters plays an important role in quantifying circuit robust-

ness, but the notion must be extended to incorporate parametric

variability and generalized in order to apply it to arbitrary

networks of gates.

B. Statistical Robustness

This section defines a robustness metric for cross-coupled

inverters that includes parameter variation and noise by way

of a statistical noise margin constraint. When considering two

different circuits,

and

, operating with the same supply

voltage,

is more robust than

if and only if for the same

quantity of noise in both circuits the probability that

fails is

less than the probability that

fails. That is, for two different

circuits

and

ROB

(

)

> ROB

(

)

↔

(

FAIL

(

))

< P

(

FAIL

(

))

(4)

where

ROB

corresponds to circuit robustness and

FAIL

circuit failure.

Switching noise in digital circuits can be estimated with

known-methods [21], [30], and, as with other common metrics,

e.g.,

power and cycle time, it can be reduced and optimized for

(typically at some cost;

e.g.,

spreading wires reduces coupling

noise at the expense of area). As such, the circuit designer can

choose a noise margin target,

: a minimum noise margin

constraint for all gates.

If any gate has a noise margin less

Equations 1, 2, and 3 (and all dependent equations) are actually implicit

functions of all operating parameters,

e.g.,

temperature, body potentials, etc.

The VTCs must be monotonic and have a single inflection point.

A unique noise margin target can be chosen for each gate (if desired). In

this way,

noisy

gates can be assigned larger targets than

quiet

gates.

than or equal to the

, then the gate is said to fail, as is

the entire circuit containing the failing gate. Consider a cross-

coupled inverter-pair,

INV

and

INV

, (as in Figure 2 with

noise

= 0

) operating at a particular

. The probability

of failure for a pair can then be defined such that

(

FAIL

(

INV

,NM

)

∪

FAIL

(

INV

,NM

))

(

SNM

(

INV

,INV

)

≤

∪

SNM

(

INV

,INV

)

≤

)

(5)

For a circuit,

, consisting of

cross-coupled inverter-pairs,

i.e.,

= (

INV

,INV

)

for

∈{

,...,n

}

(

FAIL

(

,NM

)) =





⋃

∈{

,...,n

}

FAIL

(

INV

,NM

)

∪

FAIL

(

INV

,NM

)





(6)

These two relations treat both the probability of failure and

SNM as random variables (RVs). In order to compute these

quantities, the corresponding distributions and the effects of

correlation are considered in Section IV. These two relations

are generalized for application to arbitrary networks of gates

in Section V.

IV. C

ALCULATING

OBUSTNESS

One of the goals of this paper is to define a method for

calculating robustness in such a way that it can be feasibly

computed for large circuits (billions of gates), and which also

fits in with the most prevalent method of system design,

i.e.,

standard-cell hierarchical digital circuit design. This necessi-

tates the construction of a new compact model for statistical

robustness with parameters that can be stored alongside timing

and energy data in standard cell libraries. Moreover, the model

must be defined such that the compact data is composable;

i.e.,

the robustness of an arbitrary network of standard cells must

be computable by the composition of robustness data from

member cells. In this way, the robustness of a large circuit

(built out of standard cells) can be readily calculated.

A. Statistical VTC Parameters

Device parameter variation results in variation in the static

noise margins of gates; the precise relationship depends on

the type of parameter variation and the device operating

regime (subthreshold see [14], [36], and above threshold

see [39], [40]). The variation in

SNM

can be analyzed

in terms of

and

variation (see Equation 3).

Similarly,

and

can be considered in terms of the

corresponding constituent VTC parameters,

, and

, respectively (see Equations 1 and 2). In modern

bulk CMOS technologies, the output VTC parameters of a

gate,

and

, can be considered regular (not random)

variables.

The input VTC parameters,

and

, are

First-order analysis in [14] finds

and

to be global constants

dependent only on temperature when operating in the subthreshold regime.

Including second order affects and near-threshold operation induces a depen-

dence on

and gate topology, so

and

are treated as regular

variables.

normal random variables [36]. Consider Figure 1 (in Section

III): for a particular gate (an inverter) operating at a particular

supply voltage (200mV) the output VTC parameters,

and

, are nearly constant and close to

and

GND

respectively (consider the blue and red lines). The horizontal

translation between this family of VTC curves—due to random

parameter variation—corresponds to shifts in the input VTC

parameters,

and

−

100

150

200

250

(

)

000

005

010

015

020

025

Probability Density

Fig. 4:

and

distributions for a minimum-size inverter

in a commercial 40-nm low-power CMOS process at the TT-

Corner (

= 200

◦

100

150

200

250

300

350

400

450

(

)

000

005

010

015

020

Probability Density

Fig. 5:

and

distributions for a minimum-size inverter

in a commercial 40-nm low-power CMOS process at the TT-

Corner (

= 600

◦

The input VTC parameter are normally distributed with

mean and standard deviation determined by the supply voltage,