of 23
arXiv:1112.0371v1 [cs.IT] 2 Dec 2011
1
Zigzag Codes: MDS Array Codes with Optimal
Rebuilding
Itzhak Tamo
, Zhiying Wang
, and Jehoshua Bruck
Electrical Engineering Department, California Institute
of Technology, Pasadena, CA 91125, USA
Electrical and Computer Engineering, Ben-Gurion Universi
ty of the Negev, Beer Sheva 84105, Israel
{
tamo, zhiying, bruck
}
@caltech.edu
Abstract
—MDS array codes are widely used in storage systems
to protect data against erasures. We address the
rebuilding ratio
problem, namely, in the case of erasures, what is the fractio
n
of the remaining information that needs to be accessed in ord
er
to rebuild
exactly
the lost information? It is clear that when the
number of erasures equals the maximum number of erasures
that an MDS code can correct then the rebuilding ratio is
1
(access all the remaining information). However, the inter
esting
and more practical case is when the number of erasures is smal
ler
than the erasure correcting capability of the code. For exam
ple,
consider an MDS code that can correct two erasures: What is
the smallest amount of information that one needs to access i
n
order to correct a single erasure? Previous work showed that
the rebuilding ratio is bounded between
1
2
and
3
4
, however, the
exact value was left as an open problem. In this paper, we solv
e
this open problem and prove that for the case of a single erasu
re
with a
2
-erasure correcting code, the rebuilding ratio is
1
2
. In
general, we construct a new family of
r
-erasure correcting MDS
array codes that has optimal rebuilding ratio of
e
r
in the case of
e
erasures,
1
e
r
. Our array codes have efficient encoding
and decoding algorithms (for the case
r
=
2
they use a finite field
of size
3
) and an optimal update property.
I. I
NTRODUCTION
Erasure-correcting codes are the basis of the ubiquitous
RAID schemes for storage systems, where disks correspond
to symbols in the code. Specifically, RAID schemes are
based on MDS (maximum distance separable) array codes that
enable optimal storage and efficient encoding and decoding
algorithms. With
r
redundancy symbols, an MDS code is
able to reconstruct the original information if no more than
r
symbols are erased. An array code is a two dimensional array,
where each column corresponds to a symbol in the code and
is stored in a disk in the RAID scheme. We are going to refer
to a disk/symbol as a node or a column interchangeably, and
an entry in the array as an element. Examples of MDS array
codes are EVENODD [1], [2], B-code [3], X-code [4], RDP
[5], and STAR-code [6].
Suppose that some nodes are erased in a systematic MDS
array code, we will rebuild them by accessing (reading) some
information in the surviving nodes, all of which are assumed
to be accessible. The fraction of the accessed information i
n
the surviving nodes is called the
rebuilding ratio
. If
r
nodes
are erased, then the rebuilding ratio is
1
since we need to read
all the remaining information. Is it possible to lower this r
atio
for less than
r
erasures? Apparently, it is possible: Figure 1
shows an example of our new MDS code with
2
information
nodes and
2
redundancy nodes, every node has
2
elements,
and operations are over a finite field of size
3
. Consider the
rebuilding of the first information node, it requires access
to
3
elements out of
6
(a rebuilding ratio of
1
2
), because
a
= (
a
+
b
)
b
and
c
= (
c
+
b
)
b
. In practice, there is a difference
between erasures of the information (also called systemati
c)
and the parity nodes. An erasure of the former will affect
the information access time since part of the raw informatio
n
is missing, however erasure of the latter does not have such
effect, since the entire information is accessible. Moreov
er, in
most storage systems the number of parity nodes is negligibl
e
compared to the number of systematic nodes. Therefore our
constructions focus on the optimally of the rebuilding rati
o
related to the systematic nodes.
Figure 1
. Rebuilding of a
(
4, 2
)
MDS array code over
F
3
. Assume the first
node (column) is erased.
In [7], [8], a related problem is discussed: The nodes are
assumed to be distributed and fully connected in a network,
and the concept of a
repair bandwidth
is defined as the
minimum amount of data that needs to be transmitted in the
network in order to rebuild the erased nodes. In contrast to o
ur
concept of the
rebuilding ratio
a transmitted element of data
can be a function of a number of elements that are accessible
on the same node. In addition, in their general framework, an
acceptable rebuilding is one that retains the MDS property a
nd
not necessarily rebuilds the original erased node, whereas
, we
restrict our solutions to
exact
rebuilding. It is clear that our
framework is a special case of the general framework, hence,
the repair bandwidth is a lower bound on the rebuilding ratio
.
What is known about lower bounds on the repair bandwidth?
In [7] it was proved that a lower bound on the repair bandwidth
for an
(
n
,
k
)
MDS code is:
M
k
·
n
1
n
k
.
(1)
Here the code has a total of
n
nodes with
k
nodes of
information and
r
=
n
k
nodes of redundancy/parity, where
M
is the total amount of information. Also all the surviving
2
0
1
2
R
Z
0
1
2
3
Figure 2
. Permutations for zigzag sets in a
(
5, 3
)
code with
4
rows. Columns
0, 1, and 2 are systematic nodes and columns R, and Z are parity
nodes. Each
element in column R is a linear combination of the systematic
elements in the
same row. Each element in column Z is a linear combination of t
he systematic
elements with the same symbol. The shaded elements are acces
sed to rebuild
column 1.
nodes are assumed to be accessible. It can be verified that
Figure 1 matches this lower bound. Note that Equation (1)
represents the amount of information, it should be normaliz
ed
to reach the ratio. A number of researchers addressed the
repair bandwidth problem [7]–[16], however the constructe
d
code achieving the lower bound all have low code rate, i.e.,
k
/
n
<
1/2
. And it was shown by interference alignment in
[14], [15] that this bound is asymptotically achievable for
exact
repair.
Instead of trying to construct MDS codes that can be easily
rebuilt, a different approach [17], [18] was used by trying t
o
find ways to rebuild existing families of MDS array codes.
The ratio of rebuilding a single systematic node was shown
to be
3
4
+
o
(
1
)
for EVENODD or RDP codes [1], [5], both
of which have 2 parities. However, based on the lower bound
of (1) the ratio can be as small as
1/2
. Moreover, related
work on constructing codes with optimal rebuilding appeare
d
independently in [19], [20]. Their constructions are simil
ar to
this work, but only single erasure is considered.
Our main goal in this paper is to design
(
n
,
k
)
MDS array
codes with
optimal rebuilding ratio, for arbitrary number of
parities.
We first consider the case of
2
parities. We assume
that the code is systematic. In addition, we consider codes w
ith
optimal update
, namely, when an information element is writ-
ten, only the element itself and one element from each parity
node needs update, namely, there is optimal reading/writin
g
during writing of information. Hence, in the case of a code
with
2
parities only
3
elements are updated. Under such
assumptions, we will prove that every parity element is a lin
ear
combination of exactly one information element from each
systematic column. We call this set of information elements
a
parity set
. Moreover, the parity sets of a parity node form a
partition of the information array.
For example, Figure 2 shows a code with
3
systematic
nodes and
2
parity nodes. The parity sets corresponding to
parity node
R
are the sets of information elements in the same
row. The parity sets that correspond to the parity node
Z
are
the sets of information elements with the same symbol. For
instance the first element in column
R
is a linear combination
of the elements in the first row and in columns
0, 1
, and
2
.
And the
in column Z is a linear combination of all the
elements in columns
0, 1
, and
2
. We can see that each
systematic column corresponds to a permutation of the four
symbols. In general, we will show that each parity relates to
a
set of a permutations of the systematic columns. Without los
s
of generality, we assume that the first parity node correspon
ds
to identity permutations, namely, it is linear combination
of
rows. In the case of codes with
2
parities, we call the first
parity the
row parity
and the second parity the
zigzag parity
.
The corresponding sets of information elements for a parity
element are called
row
and
zigzag sets
, respectively.
It should be noted that in contrast to existing MDS array
codes such as EVENODD and X-code, the parity sets in our
codes are not limited to elements that correspond to straigh
t
lines in the array, but can also include elements that corres
pond
to zigzag lines. We will demonstrate that this property is
essential for achieving an optimal rebuilding ratio.
If a single systematic node is erased, we will rebuild each
element in the erased node either by its corresponding row
parity or zigzag parity, referred to as
rebuild by row (or
by zigzag)
. In particular, we access the row (zigzag) parity
element, and all the elements in this row (zigzag) set, excep
t
the erased element. For example, consider Figure 2, suppose
that the column labeled
1
is erased, one can access the
8
shaded elements and rebuild its first two elements by rows,
and the rest by zigzags. Namely, only half of the remaining
elements are accessed. It can be verified that for the code in
Figure 2, all the three systematic columns can be rebuilt by
accessing half of the remaining elements. Thus the rebuildi
ng
ratio is
1/2
, which is the lower bound expressed in (1).
The key idea in our construction is that for each erased
node, the row sets and the zigzag sets have a large inter-
section - resulting in a small number of accesses. So the
question is: How do we find permutations such that the row
sets and zigzag sets intersect as much as possible? In this
paper, we will present an optimal solution to this question
by constructing permutations that are derived from binary
vectors. This construction provides an optimal rebuilding
ratio
of
1/2
for any erasure of a systematic node. How do we define
permutations on integers from a binary vector? We simply add
to each integer the binary vector and use the sum as the image
of this integer. Here each integer is expressed as its binary
expansion. For example, in order to define the permutation
on integers
{
0, 1, 2, 3
}
from the binary vector
v
= (
1, 1
)
,
we express each integer in binary:
(
0, 0
)
,
(
0, 1
)
,
(
1, 0
)
,
(
1, 1
)
.
Then add (mod
2
) the vector
v
= (
1, 1
)
to each integer,
and get
(
1, 1
)
,
(
1, 0
)
,
(
0, 1
)
,
(
0, 0
)
. At last change each binary
expansion back to integer and define it as the image of the per-
mutation:
3, 2, 1, 0
. Hence,
(
0, 1, 2, 3
)
are mapped to
(
3, 2, 1, 0
)
in this permutation, respectively. This simple technique f
or
generating permutations is the key in our construction. We c
an
generalize our construction for arbitrary
r
(number of parity
nodes) by generating permutations using
r
-ary vectors. Our
constructions are optimal in the sense that we can construct
codes with
r
parities and a rebuilding ratio of
1/
r
.
So far we focused on the optimal rebuilding ratio, however,
a code with two parity nodes should be able to correct two
erasures, namely, it needs to be an MDS code. We will present
that for large enough field size the code can be made MDS.
In particular, another key result in this paper is that for th
e
case of a code with two parity nodes, the field size is
3
, and
this field size is optimal.
In addition, our codes have an optimal array size in the
sense that for a given number of rows, we have the maximum
3
number of columns among all systematic codes with optimal
ratio and update. However, the length of the array is expo-
nential in the width. We introduce techniques for making the
array wider while having a rebuilding ratio that is very clos
e
to
1/
r
.
We also considered the following generalization: Suppose
that we have an MDS code with three parity nodes, if we have
a single erasure, using our codes, we can rebuild the erasure
with rebuilding ratio of
1/3
. What happens if we have two
erasures? What is the rebuilding ratio in this case? Our code
s
can achieve the optimal rebuilding ratio of
2/3
. In general, if
we have
r
3
parity nodes and
e
erasures happen,
1
e
r
,
we will prove that the lower bound of repair bandwidth is
e
/
r
(normalized by the size of the remaining array), and so is the
rebuilding ratio. And the code we constructed achieves this
lower bound for any
e
.
In summary, the main contribution of this paper is the first
explicit construction of systematic
(
n
,
k
)
MDS array codes for
any constant
r
=
n
k
, which achieves optimal rebuilding
ratio of
1
r
. Moreover, our codes achieve optimal rebuilding
ratio of
e
r
when
e
systematic erasures occur,
1
e
r
.
The parity symbols are constructed by linear combinations
of a set of information symbols, such that each information
symbol is contained exactly once in each parity node. These
codes have a variety of advantages: 1) they are systematic
codes, hence it is easy to retrieve information; 2) they have
high code rate
k
/
n
, which is commonly required in storage
systems; 3) the encoding and decoding of the codes can be
easily implemented (for
r
=
2
, the code uses finite field of size
3); 4) they match the lower bound of the ratio when rebuilding
e
systematic nodes; 5) the rebuilding of a failed node require
s
simple computation and access to only
1/
r
of the data in each
node (no linear combination of data); 6) they have
optimal
update
, namely, when an information element is updated, only
r
+
1
elements in the array need update; and 7) they have
optimal array size.
The remainder of the paper is organized as follows. Section
II constructs
(
k
+
2,
k
)
MDS array codes with optimal rebuild-
ing ratio. Section III gives formal definitions and some gene
ral
observations on MDS array codes. Section IV introduces code
duplication and thus generates
(
k
+
2,
k
)
MDS array codes for
arbitrary number of columns. We discuss the size of the finite
field needed for these constructions in Section V. Decoding
algorithms for erasures and errors are discussed in Section
VI. Section VII generalizes the MDS code construction to
arbitrary number of parity columns. These generalized code
s
have properties that are similar to the
(
k
+
2,
k
)
MDS array
codes, likewise some of them has optimal rebuilding ratio.
Rebuilding of multiple erasures and generalization of the
rebuilding algorithms are presented in Section VIII. Final
ly
we provide concluding remarks in Section IX.
II.
(
k
+
2,
k
)
MDS
ARRAY CODE CONSTRUCTIONS
In the rest of the paper, we are going to use
[
i
,
j
]
to denote
{
i
,
i
+
1, . . . ,
j
}
and
[
i
]
to denote
{
1, 2, . . .,
i
}
, for integers
i
j
. And denote the complement of a subset
X
M
as
X
=
M
\
X
. For a matrix
A
,
A
T
denotes the transpose of
A
. For a binary vector
v
= (
v
1
, ...,
v
n
)
we denote by
v
=
(
v
1
+
1 mod 2, ...,
v
n
+
1 mod 2
)
its complement vector.
The standard vector basis of dimension
m
will be denoted
as
{
e
i
}
m
i
=
1
and the zero vector will be denoted as
e
0
.
For an
integer
n
denote by
S
n
the set of permutations over the integers
[
0,
n
1
]
, namely the symmetric group. For two functions
f
,
g
,
denote their composition by
f g
or
f
g
.
Recall that
rebuilding ratio
is the average fraction of ac-
cesses in the surviving systematic and parity nodes while
rebuilding one systematic node. More specific definition wil
l
be given in the next section. In this section we give the
construction of MDS array code with two parities and optimal
rebuilding ratio
1/2
for one erasure, which uses an optimal
finite field of only size
3
.
We mentioned in the introduction that each
(
k
+
2,
k
)
MDS
array code with optimal update can be constructed by defining
the row and zigzag parities (proofs are given in Section
III). More specifically, the row parity corresponds to ident
ity
permutation in each systematic column, and the zigzag parit
y
corresponds to a set of permutations
{
f
0
,
f
1
, . . . ,
f
k
1
}
for the
systematic columns
{
0, 1, . . . ,
k
}
. From the example in Figure
2, we know that in order to get low rebuilding ratio, we need
to find
f
0
, ...,
f
k
1
such that the row and zigzag sets used in
rebuilding intersect as much as possible. In addition, sinc
e
each parity element is a linear combination of elements in
its parity set, we need to define the coefficients of the linear
combination such that the code is MDS. Noticing that all
elements and all coefficients are from some finite field, we
would like to choose the coefficients such that the finite field
size is as small as possible. So our construction of the code
includes two steps:
1) Find zigzag permutations to minimize the ratio.
2) Assign the coefficients such that the code is MDS.
The following construction constructs a family of MDS
array codes with
2
parities using binary vectors. From any
set
T
F
m
2
,
|
T
|
=
k
, we construct a
(
k
+
2,
k
)
MDS array
code of size
2
m
×
(
k
+
2
)
.
We will show that some of these
codes have the optimal ratio of
1
2
.
In this section all the calculations are done over
F
2
. By
abuse of notation we use
x
[
0, 2
m
1
]
both to represent the
integer and its binary representation. It will be clear from
the
context which meaning is in use.
Construction 1
Let
A
= (
a
i
,
j
)
be an array of size
2
m
×
k
for
some integers
k
,
m
and
k
<
2
m
. Let
T
F
m
2
be a set of vectors
of size
k
which does not contain the zero vector. For
v
T
we define the permutation
f
v
:
[
0, 2
m
1
]
[
0, 2
m
1
]
by
f
v
(
x
) =
x
+
v
, where
x
is represented in its binary represen-
tation. One can check that this is actually a permutation. Fo
r
example when
m
=
2,
v
= (
1, 0
)
,
x
=
3
,
f
(
1,0
)
(
3
) =
3
+ (
1, 0
) = (
1, 1
) + (
1, 0
) = (
0, 1
) =
1.
One can check that the permutation
f
v
in vector notation is
[
2, 3, 0, 1
]
. In addition, we define
X
v
=
{
x
[
0, 2
m
1
]
:
x
·
v
=
0
}
as the set of integers orthogonal to
v
. For example,
X
(
1,0
)
=
{
0, 1
}
. The construction of the two parity columns
is as follows: The first parity is simply the row sums. The
second parity is the linear combination of elements in the
4
zigzag set. The zigzag sets
Z
0
, ...,
Z
2
m
1
are defined by the
permutations
{
f
v
j
:
v
j
T
}
as
a
i
,
j
Z
l
if
f
v
j
(
i
) =
l
.
We
will denote the permutation
f
v
j
as
f
j
and the set
X
v
j
as
X
j
.
Assume column
j
is erased, and define
S
r
=
{
a
i
,
j
:
i
X
j
}
and
S
z
=
{
a
i
,
j
:
i
/
X
j
}
. Rebuild the elements in
S
r
by rows
and the elements in
S
z
by zigzags.
Theorem 1
Construct
permutations
f
0
, ...,
f
m
and
sets
X
0
, ...,
X
m
by the vectors
{
e
i
}
m
i
=
0
as in Construction
1
, where
X
0
is modified to be
X
0
=
{
x
F
m
2
:
x
·
(
1, 1, ..., 1
) =
0
}
.
Then the corresponding
(
m
+
3,
m
+
1
)
code has
optimal
ratio
of
1
2
.
Before proving the theorem, we first give an example.
Actually, this example is the code in Figure 2 with more
details.
Example 1
Let
A
be an array of size
4
×
3
. We construct a
(
5, 3
)
MDS array code for
A
as in Theorem
1
that accesses
1
2
of the remaining information in the array to rebuild any
systematic node (see Figure
3
). For example,
X
1
=
{
0, 1
}
, and
for rebuilding of node
1
(column
C
1
) we access the elements
a
0,0
,
a
0,2
,
a
1,0
,
a
1,2
, and the following four parity elements
r
0
=
a
0,0
+
a
0,1
+
a
0,2
r
1
=
a
1,0
+
a
1,1
+
a
1,2
z
f
1
(
2
)
=
z
0
=
a
0,0
+
2
a
2,1
+
2
a
1,2
z
f
1
(
3
)
=
z
1
=
a
1,0
+
2
a
3,1
+
a
0,2
.
It is trivial to rebuild node
1
from the accessed information.
Note that each of the surviving node accesses exactly
1
2
of
its elements. It can be easily verified that the other systema
tic
nodes can be rebuilt the same way. Rebuilding a parity node is
easily done by accessing all the information elements.
In order to prove Theorem 1, we first prove the following
lemma. We use a binary vector to represent its corresponding
systematic node. And define
|
v
\
u
|
=
i
:
v
i
=
1,
u
i
=
0
1
as the
number of coordinates at which
v
has a
1
but
u
has a
0
.
Lemma 2
(i) For any
v
,
u
T
, to rebuild node
v
, the number
of accessed elements in node
u
is
2
m
1
+
|
f
v
(
X
v
)
f
u
(
X
v
)
|
.
(ii) For any
0
6
=
v
,
u
T
,
|
f
v
(
X
v
)
f
u
(
X
v
)
|
=
(
|
X
v
|
,
|
v
\
u
|
mod 2
=
0
0,
|
v
\
u
|
mod 2
=
1.
(2)
Proof:
(i) In rebuilding of node
v
we rebuild the elements
in rows
X
v
by rows, thus the row parity column accesses
the values of the sum of rows
X
v
. Moreover, the surviving
node
u
also accesses its elements in rows
X
v
. Hence, by now
|
X
v
|
=
2
m
1
elements are accessed. The elements of node
v
in
rows
X
v
are rebuilt by zigzags, thus the zigzag parity column
accesses the values of the zigzags sums
{
z
f
v
(
l
)
:
l
X
v
}
, and
each surviving systematic node accesses the elements of the
se
zigzags from its column, unless these elements are already
included in the rebuilding by rows. The zigzag elements in
{
Z
f
v
(
l
)
:
l
X
v
}
of node
u
are in rows
f
1
u
(
f
v
(
X
v
))
, where
f
1
u
is the inverse function of
f
u
. Thus the extra elements node
u
needs to access are in rows
f
1
u
(
f
v
(
X
v
))
\
X
v
.
But,
|
f
1
u
(
f
v
(
X
v
))
\
X
v
|
=
|
f
1
u
(
f
v
(
X
v
))
X
v
|
=
|
f
1
u
(
f
v
(
X
v
))
X
v
|
=
2
m
− |
f
1
u
(
f
v
(
X
v
))
X
v
|
=
2
m
(
|
f
1
u
(
f
v
(
X
v
))
|
+
|
X
v
| − |
f
1
u
(
f
v
(
X
v
))
X
v
|
)
=
|
f
1
u
(
f
v
(
X
v
))
X
v
|
=
|
f
v
(
X
v
)
f
u
(
X
v
)
|
,
where we used the fact that
f
v
,
f
u
are bijections, and
|
X
v
|
=
2
m
1
.
(ii) Consider the group
(
F
m
2
,
+)
. Recall that
f
v
(
X
) =
X
+
v
=
{
x
+
v
:
x
X
}
. The sets
f
v
(
X
v
) =
X
v
+
v
and
f
u
(
X
v
) =
X
v
+
u
are cosets of the subgroup
X
v
=
{
w
F
m
2
:
w
·
v
=
0
}
, and they are either identical or disjoint.
Moreover, they are identical iff
v
u
X
v
, namely
(
v
u
)
·
v
=
i
:
v
i
=
1,
u
i
=
0
1
0 mod 2.
However, by definition
|
v
\
u
| ≡
i
:
v
i
=
1,
u
i
=
0
1 mod 2
, and the result follows.
Let
{
f
0
, ...,
f
k
1
}
be a set of permutations over the set
[
0, 2
m
1
]
with associated subsets
X
0
, ...,
X
k
1
[
0, 2
m
1
]
,
where each
|
X
i
|
=
2
m
1
. We say that this set is a set of
orthogonal permutations
if for any
i
,
j
[
0,
k
1
]
,
|
f
i
(
X
i
)
f
j
(
X
i
)
|
2
m
1
=
δ
i
,
j
,
where
δ
i
,
j
is the Kronecker delta. For a set of orthogonal
permutations, in order to rebuild any systematic node, only
2
m
1
elements are accessed from each surviving systematic
node by Lemma 2. And only
2
m
1
elements are accessed from
each parity node, too. Hence codes generated by orthogonal
permutations has optimal rebuilding ratio
1/2
. Now we are
ready to prove Theorem 1.
Proof of Theorem
1
:
Since
|
e
i
\
e
j
|
=
1
for any
i
6
=
j
6
=
0
,
we get by lemma 2
f
i
(
X
i
)
f
j
(
X
i
) =
.
Now consider
e
i
and
e
0
, for
i
6
=
0
. Note that
f
i
(
X
i
) =
{
x
+
e
i
:
x
·
e
i
=
0
}
=
{
y
:
y
·
e
i
=
1
}
,
so
f
i
(
X
i
)
f
0
(
X
i
) =
{
y
:
y
·
e
i
=
1
} ∩ {
x
:
x
·
e
i
=
0
}
=
.
Similarly,
f
i
(
X
0
) =
{
x
+
e
i
:
x
·
(
1, 1, . . ., 1
) =
0
}
=
{
y
:
y
·
(
1, 1,
· · ·
, 1
) =
1
}
,
and
f
0
(
X
0
)
f
i
(
X
0
)
=
{
x
:
x
·
(
1,
· · ·
, 1
) =
0
} ∩ {
y
:
y
·
(
1,
· · ·
, 1
) =
1
}
=
.
Hence the permutations
f
0
, . . . ,
f
m
are orthogonal permuta-
tions, and the ratio is
1/2
.
Note that the optimal code can be shortened by removing
some systematic columns and still retain an optimal ratio, i
.e.,
for any
k
m
+
1
we have a code with optimal rebuilding.
Having found the set of orthogonal permutations, we need
to specify the coefficients in the parities such that the code
is
MDS.
5
Figure 3
.
(
a
)
The set of orthogonal permutations as in Theorem 1 with sets
X
0
=
{
0, 3
}
,
X
1
=
{
0, 1
}
,
X
2
=
{
0, 2
}
.
(
b
)
A
(
5, 3
)
MDS array code generated
by the orthogonal permutations. The first parity column
C
3
is the row sum and the second parity column
C
4
is generated by the zigzags. For example, zigzag
z
0
contains the elements
a
i
,
j
that satisfy
f
j
(
i
) =
0
.
Consider the
(
m
+
3,
m
+
1
)
code
C
constructed by Theo-
rem 1 and the vectors
{
e
i
}
m
i
=
0
. Let
F
be the finite field we use.
Let the information in row
i
, column
j
be
a
i
,
j
F
. Let its row
and zigzag coefficients be
α
i
,
j
,
β
i
,
j
F
. For a row set
R
u
=
{
a
u
,0
,
a
u
,1
, . . . ,
a
u
,
m
}
, the row parity is
r
u
=
m
j
=
0
α
u
,
j
a
u
,
j
.
For a zigzag set
Z
u
=
{
a
u
,0
,
a
u
+
e
1
,1
, . . . ,
a
u
+
e
m
,
m
}
, the zigzag
parity is
z
u
=
m
j
=
0
β
u
+
e
j
,
j
a
u
+
e
j
,
j
.
Recall that the
(
m
+
3,
m
+
1
)
code is MDS iff we can
recover the information from up to
2
columns erasures. It
is clear that none of the coefficients
α
i
,
j
,
β
i
,
j
can be zero.
Moreover, if we assign all the coefficients as
α
i
,
j
=
β
i
,
j
=
1
we get that in an erasure of two systematic columns the set
of equations derived from the parity columns are linearly
dependent and thus not solvable (the sum of the equations fro
m
the row parity and the sum of those from the zigzag parity will
both be the sum of the entire information array). Therefore
the coefficients need to be from a field with more than
1
nonzero element, thus a field of size at least
3
is necessary.
The construction below surprisingly shows that in fact
F
3
is
sufficient.
Construction 2
For the code
C
in Theorem
1
over
F
3
, define
u
j
=
j
l
=
0
e
l
for
0
j
m
. Assign row coefficients as
α
i
,
j
=
1
for all
i
,
j
, and zigzag coefficients as
β
i
,
j
=
2
i
·
u
j
where
i
= (
i
1
, . . . ,
i
m
)
is represented in binary and the calcula-
tion in the exponent is done over
F
2
.
The coefficients in Figure 3 are assigned by Construction
2. The following theorem shows that the code is MDS.
Theorem 3
Construction
2
is an
(
m
+
3,
m
+
1
)
MDS code
with optimal finite field size of
3
.
Proof:
It is easy to see that if at least one of the two
erased columns is a parity column then we can recover the
information. Hence we only need to show that we can recover
from any erasure of two systematic columns. In an erasure
of two systematic columns
i
,
j
[
0,
m
]
,
i
<
j
, we access the
entire remaining information in the array. For
r
[
0, 2
m
1
]
set
r
=
r
+
e
i
+
e
j
, and recall that
a
r
,
i
Z
l
iff
l
=
r
+
e
i
, thus
a
r
,
i
,
a
r
,
j
Z
r
+
e
i
and
a
r
,
j
,
a
r
,
i
Z
r
+
e
j
.
From the two parity
columns we need to solve the following equations (for some
y
1
,
y
2
,
y
3
,
y
4
F
3
)
1 1
0
0
0 0
1
1
β
r
,
i
0
0
β
r
,
j
0
β
r
,
j
β
r
,
i
0
a
r
,
i
a
r
,
j
a
r
,
i
a
r
,
j
=
y
1
y
2
y
3
y
4
.
This set of equations is solvable iff
β
r
,
i
β
r
,
i
6
=
β
r
,
j
β
r
,
j
.
(3)
Note that the multiplicative group of
F
3
\
0
is isomorphic to
the additive group of
F
2
, hence multiplying two elements in
F
3
\
0
is equivalent to summing up their exponent in
F
2
when
they are represented as a power of the primitive element of th
e
field
F
3
. For columns
0
i
<
j
m
and rows
r
,
r
defined
above, we have
β
r
,
i
β
r
,
i
=
2
r
·
u
i
+
r
·
u
i
=
2
(
r
+
r
)
·
u
i
=
2
(
e
i
+
e
j
)
·
i
l
=
0
e
l
=
2
e
2
i
=
2.
However in the same manner we derive that
β
r
,
j
β
r
,
j
=
2
(
r
+
r
)
·
u
j
=
2
(
e
i
+
e
j
)
·
j
l
=
0
e
l
=
2
e
2
i
+
e
2
j
=
2
0
=
1.
Hence (3) is satisfied and the code is MDS.
Remark:
The above proof shows that
β
r
,
i
6
=
β
r
,
i
, and
β
r
,
j
=
β
r
,
j
for
i
<
j
. And (3) is a necessary and sufficient
condition for a MDS code for vectors
v
i
6
=
v
j
.
In addition to optimal ratio and optimal field size, we will
show in the next section that the code in Theorem 1 is also
of optimal array size, namely, it has the maximum number of
columns, given the number of rows.
III. F
ORMAL
P
ROBLEM
S
ETTINGS AND
C
ONSTRUCTIONS
In this section, we first give some observations of an
arbitrary MDS array code with optimal update. Then we
prove some properties and give some examples of our code in
Construction 1.
Let us define an MDS array code with 2 parities. Let
A
= (
a
i
,
j
)
be an array of size
p
×
k
over a finite field
F
, where
i
[
0,
p
1
]
,
j
[
0,
k
1
]
, and each of its
entry is an information element. We add to the array two
parity columns and obtain an
(
n
=
k
+
2,
k
)
MDS code of
array size
p
×
n
. Each element in these parity columns is
a linear combination of elements from
A
. More specifically,
let the two parity columns be
C
k
= (
r
0
,
r
1
, ...,
r
p
1
)
T
and
6
C
k
+
1
= (
z
0
,
z
1
...,
z
p
1
)
T
. Let
R
=
{
R
0
,
R
1
, ...,
R
p
1
}
and
Z
=
{
Z
0
,
Z
1
, ...,
Z
p
1
}
be two sets such that
R
l
,
Z
l
are
subsets of elements in
A
for all
l
[
0,
p
1
]
. Then for all
l
[
0,
p
1
]
, define
r
l
=
a
R
l
α
a
a
and
z
l
=
a
Z
l
β
a
a
, for
some sets of coefficients
{
α
a
}
,
{
β
a
} ⊆
F
. We call
R
and
Z
as the sets that generate the parity columns.
We assume the code has optimal update, meaning that only
3
elements in the code are updated when an information element
is updated. Under this assumption, the following theorem
characterizes the sets
R
and
Z
.
Theorem 4
For an
(
k
+
2,
k
)
MDS code with optimal update,
the sets
R
and
Z
are partitions of
A
into
p
equally sized sets of
size
k
, where each set in
R
or
Z
contains exactly one element
from each column.
Proof:
Since the code is a
(
k
+
2,
k
)
MDS code, each
information element should appear at least once in each pari
ty
column
C
k
,
C
k
+
1
.
However, since the code has optimal update,
each element appears exactly once in each parity column.
Let
X
R
, note that if
X
contains two entries of
A
from the
systematic column
C
i
,
i
[
0,
k
1
]
, then rebuilding is impos-
sible if columns
C
i
and
C
k
+
1
are erased. Thus
X
contains at
most one entry from each column, therefore
|
X
| ≤
k
.
However
each element of
A
appears exactly once in each parity column,
thus if
|
X
|
<
k
,
X
R
, there is
Y
R
, with
|
Y
|
>
k
, which
leads to a contradiction. Therefore,
|
X
|
=
k
for all
X
R
.
As each information element appears exactly once in the first
parity column,
R
=
{
R
0
, . . . ,
R
p
1
}
is a partition of
A
into
p
equally sized sets of size
k
. Similar proof holds for the sets
Z
=
{
Z
0
, . . . ,
Z
p
1
}
.
By the above theorem, for the
j
-th systematic column
(
a
0
, . . . ,
a
p
1
)
T
, its
p
elements are contained in
p
distinct
sets
R
l
,
l
[
0,
p
1
]
. In other words, the membership of the
j
-th column’s elements in the sets
{
R
l
}
defines a permutation
g
j
:
[
0,
p
1
]
[
0,
p
1
]
, such that
g
j
(
i
) =
l
iff
a
i
R
l
.
Similarly, we can define a permutation
f
j
corresponding to
the second parity column, where
f
j
(
i
) =
l
iff
a
i
Z
l
. For
example, in Figure 2 each systematic column corresponds to
a permutation of the four symbols.
Observing that there is no importance of the elements’
ordering in each column, w.l.o.g. we can assume that the first
parity column contains the sum of each row of
A
and
g
j
’s
correspond to identity permutations, i.e.
r
i
=
k
1
j
=
0
α
i
,
j
a
i
,
j
for some coefficients
{
α
i
,
j
}
. We refer to the first and the
second parity columns as the row parity and the zigzag parity
respectively, likewise
R
l
and
Z
l
,
l
[
0,
p
1
]
, are referred
to as row sets and zigzag sets respectively. We will call
f
j
,
j
[
0,
k
1
]
, zigzag permutations. By assuming that the first
parity column contains the row sums, we want to (1) find
zigzag permutations to minimize the rebuilding ratio; and (
2)
assign the coefficients such that the code is MDS.
First we show that any set of zigzag sets
Z
=
{
Z
0
, ...,
Z
p
1
}
defines a
(
k
+
2,
k
)
MDS array code over a
field
F
large enough.
Theorem 5
Let
A
= (
a
i
,
j
)
be an array of size
p
×
k
and the
zigzag sets be
Z
=
{
Z
0
, ...,
Z
p
1
}
, then there exists a
(
k
+
2,
k
)
MDS array code for
A
with
Z
as its zigzag sets over the field
F
of size greater than
p
(
k
1
) +
1
.
The proof is shown in Appendix A. The above theorem
states that there exist coefficients such that the code is MDS
,
and thus we will focus first on finding proper zigzag per-
mutations
{
f
j
}
. The idea behind choosing the zigzag sets
is as follows: assume a systematic column
(
a
0
,
a
1
, ...,
a
p
1
)
T
is erased. Each element
a
i
is contained in exactly one row
set and one zigzag set. For rebuilding of element
a
i
, access
the parity of its row set or zigzag set. Moreover access the
values of the remaining elements in that set, except
a
i
. We
say that an element
a
i
is rebuilt by a row (zigzag) if the
parity of its row set (zigzag set) is accessed. For example,
in Figure 2 supposing column
1
is erased, one can access the
shaded elements and rebuild its first two elements by rows, an
d
the rest by zigzags. The set
S
=
{
S
0
,
S
1
, ...,
S
p
1
}
is called
a rebuilding set for column
(
a
0
,
a
1
, ...,
a
p
1
)
T
if for each
i
,
S
i
R
Z
and
a
i
S
i
. In order to minimize the number of
accesses to rebuild the erased column, we need to minimize
the size of
| ∪
p
1
i
=
0
S
i
|
,
(4)
which is equivalent to maximizing the number of intersectio
ns
between the sets
{
S
i
}
p
1
i
=
0
. More specifically, the intersections
between the row sets in
S
and the zigzag sets in
S
.
For a
(
k
+
2,
k
)
MDS code
C
with
p
rows define the
rebuilding ratio
R
(
C
)
as the average fraction of accesses in
the surviving systematic and parity nodes while rebuilding
one
systematic node, i.e.,
R
(
C
) =
j
min
S
0
,...,
S
p
1
rebuilds
j
| ∪
p
1
i
=
0
S
i
|
p
(
k
+
1
)
k
.
Notice that in the two parity nodes, we access
p
elements
because each erased element must be rebuilt either by row
or by zigzag. And
p
1
i
=
0
S
i
contains
p
elements in the erased
column. Thus the above expression is exactly the rebuilding
ratio. Define the
ratio function
for all
(
k
+
2,
k
)
MDS codes
with
p
rows as
R
(
k
) =
min
C
R
(
C
)
,
which is the minimal average portion of the array needed to
be accessed in order to rebuild one erased column.
Theorem 6
R
(
k
)
is no less than
1
2
and is a monotone nonde-
creasing function.
The proof is given in Appendix B. For example, the code in
Figure 3 achieves the lower bound of ratio
1/2
, and therefore
R
(
3
) =
1/2
. Moreover, We will see in Corollary 10 that
R
(
k
)
is almost
1/2
for all
k
and
p
=
2
m
, where
m
is large enough.
So far we have discussed the characteristics of an arbitrary
MDS array code with optimal update. Next, let us look at our
code in Construction 1.
Recall that by Theorem 5 this code can be an MDS code
over a field large enough. The ratio of the constructed code
will be proportional to the size of the union of the elements i
n
the rebuilding set in (4). The following theorem gives the ra
tio
for Construction 1 and can be easily derived from Lemma 2
part (i).
7
Theorem 7
The code described in Construction
1
and gener-
ated by the vectors
v
0
,
v
1
, ...,
v
k
1
is a
(
k
+
2,
k
)
MDS array
code with ratio
R
=
1
2
+
k
1
i
=
0
j
6
=
i
|
f
i
(
X
i
)
f
j
(
X
i
)
|
2
m
k
(
k
+
1
)
.
(5)
Next we show the optimal code in Theorem 1 is optimal in
size, namely, it has the maximum number of columns given
the number of rows.
Theorem 8
Let
F
be an orthogonal set of permutations over the
integers
[
0, 2
m
1
]
, then the size of
F
is at most
m
+
1.
Proof:
We will prove it by induction on
m
. For
m
=
0
there is nothing to prove. Let
F
=
{
f
0
,
f
1
, ...,
f
k
1
}
be a
set of orthogonal permutations over the set
[
0, 2
m
1
]
.
We
only need to show that
|
F
|
=
k
m
+
1.
It is trivial to
see that for any permutations
g
,
h
on
[
0, 2
m
1
]
, the set
hFg
=
{
h f
0
g
,
h f
1
g
, ...,
h f
k
1
g
}
is also a set of orthogo-
nal permutations with sets
g
1
(
X
0
)
,
g
1
(
X
1
)
, ...,
g
1
(
X
k
1
)
.
Thus w.l.o.g. we can assume that
f
0
is the identity permutation
and
X
0
= [
0, 2
m
1
1
]
. From the orthogonality we get that
k
1
i
=
1
f
i
(
X
0
) =
X
0
= [
2
m
1
, 2
m
1
]
.
We claim that for any
i
6
=
0,
|
X
i
X
0
|
=
|
X
0
|
2
=
2
m
2
.
Assume the contrary, thus w.l.o.g we can assume that
|
X
i
X
0
|
>
2
m
2
, otherwise
|
X
i
X
0
|
>
2
m
2
.
For any
j
6
=
i
6
=
0
we get that
f
j
(
X
i
X
0
)
,
f
i
(
X
i
X
0
)
X
0
,
(6)
|
f
j
(
X
i
X
0
)
|
=
|
f
i
(
X
i
X
0
)
|
>
2
m
2
=
|
X
0
|
2
.
(7)
From equations (6) and (7) we conclude that
f
j
(
X
i
X
0
)
f
i
(
X
i
X
0
)
6
=
, which contradicts the orthogonality prop-
erty. Define the set of permutations
F
=
{
f
i
}
k
1
i
=
1
over the
set of integers
[
0, 2
m
1
1
]
by
f
i
(
x
) =
f
i
(
x
)
2
m
1
, which
is a set of orthogonal permutations with sets
{
X
i
X
0
}
k
1
i
=
1
.
By induction
k
1
m
and the result follows.
The above theorem implies that the number of rows has to
be exponential in the number of columns in any systematic
code with optimal ratio and optimal update. Notice that the
code in Theorem 1 achieves the
maximum
possible number of
columns,
m
+
1
. Besides, an exponential number of rows is
still practical in storage systems, since they are composed
of
dozens of nodes (disks) each of which has size in an order
of gigabytes. In addition, increasing the number of columns
can be done using duplication (Theorem 9) or a larger set of
vectors (the following example) with a cost of a small increa
se
in the ratio.
Example 2
Let
T
=
{
v
F
m
2
:
k
v
k
1
=
3
}
be the set of
vectors with weight 3 and length
m
. Notice that
|
T
|
=
(
m
3
)
.
Construct the code
C
by
T
according to Construction
1
. Given
v
T
,
|{
u
T
:
|
v
\
u
|
=
3
}|
=
(
m
3
3
)
, which is the number
of vectors with 1’s in different positions as
v
. Similarly,
|{
u
T
:
|
v
\
u
|
=
2
}|
=
3
(
m
3
2
)
and
|{
u
T
:
|
v
\
u
|
=
1
}|
=
3
(
m
3
)
. By Theorem
7
and Lemma
2
, for large
m
the ratio is
1
2
+
2
m
1
(
m
3
)
3
(
m
3
2
)
2
m
(
m
3
)
(
(
m
3
)
+
1
)
1
2
+
9
2
m
.
Note that this code reaches the lower bound of the ratio as
m
tends to infinity, and has
O
(
m
3
)
columns.
IV. C
ODE
D
UPLICATION
In this section, we are going to duplicate the code to increas
e
the number of columns in the constructed
(
k
+
2,
k
)
MDS
codes, such that
k
does not depend on the number of rows,
and ratio is approximately
1
2
. Then we will show the optimality
of the duplication code based on the standard basis.
Let
C
be a
(
k
+
2,
k
)
array code with
p
rows, where the
zigzag sets
{
Z
l
}
p
1
l
=
0
are defined by the set of permutations
{
f
i
}
k
1
i
=
0
on
[
0,
p
1
]
. For an integer
s
, an
s
-
duplication code
C
is an
(
sk
+
2,
sk
)
MDS code with zigzag permutations
defined by duplicating the
k
permutations
s
times each, and
the first parity column is the row sums. In order to make the
code MDS, the coefficients in the parities may be different
from the code
C
. For an
s
-duplication code, denote the column
corresponding to the
t
-th
f
j
as column
j
(
t
)
,
0
t
s
1
.
Call the columns
{
j
(
t
)
:
j
[
0,
k
1
]
}
the
t
-th copy of the
original code. An example of a
2
-duplication of the code in
Figure 3 is illustrated in Figure 4.
Theorem 9
If a
(
k
+
2,
k
)
code
C
has ratio
R
(
C
)
, then its
s
-
duplication code
C
has ratio
R
(
C
)(
1
+
s
1
sk
+
1
)
.
Proof:
We propose a rebuilding algorithm for
C
with
ratio of
R
(
C
)(
1
+
s
1
sk
+
1
)
, which will be shown to be optimal.
Suppose in the optimal rebuilding algorithm of
C
, for
column
i
, elements of rows
J
=
{
j
1
,
j
2
, . . . ,
j
u
}
are rebuilt
by zigzags, and the rest by rows. In
C
, all the
s
columns
corresponding to
f
i
are rebuilt in the same way: the elements
in rows
J
are rebuilt by zigzags.
W.l.o.g. assume column
i
(
0
)
is erased. Since column
i
(
t
)
,
t
[
1,
s
1
]
corresponds to the same zigzag permutation
as the erased column, for the erased element in the
l
-th
row, no matter if it is rebuilt by row or by zigzag, we
have to access the element in the
l
-th row and column
i
(
t
)
(e.g. permutations
f
(
0
)
0
,
f
(
1
)
0
and the corresponding columns
0
(
0
)
, 0
(
1
)
in Figure 4). Hence all the elements in column
i
(
t
)
must be accessed. Moreover, the optimal way to access the
other surviving columns can not be better than the optimal
way to rebuild in the code
C
. Thus the proposed algorithm
has optimal rebuilding ratio.
When column
i
(
0
)
is erased, the average (over all
i
[
0,
k
1
]
) of the number of elements needed to be accessed
in columns
l
(
t
)
, for all
l
[
0,
k
1
]
,
l
6
=
i
and
t
[
0,
s
1
]
is
R
(
C
)
p
(
k
+
1
)
p
.
Here the term
p
corresponds to the access of the parity nodes
in
C
. Moreover, we need to access all the elements in columns
i
(
t
)
, 0
<
t
s
1
, and access
p
elements in the two parity
columns. Therefore, the rebuilding ratio is
R
(
C
) =
s
(
R
(
C
)
p
(
k
+
1
)
p
) + (
s
1
)
p
+
p
p
(
sk
+
1
)
=
R
(
C
)
s
(
k
+
1
)
sk
+
1
=
R
(
C
)(
1
+
s
1
sk
+
1
)
8
Figure 4
.
A
2
-duplication of the code in Figure 3. The code has
6
information nodes and
2
parity nodes. The ratio is
4/7
.
and the proof is completed.
Theorem 9 gives us the ratio of the
s
- duplication of a code
C
as a function of its ratio
R
(
C
)
. As a result, for the optimal-
ratio code in Theorem 1, the ratio of its duplication code is
slightly more than
1/2
, as the following corollary suggests.
Corollary 10
The
s
-duplication of the code in Theorem
1
has
ratio
1
2
(
1
+
s
1
s
(
m
+
1
)+
1
)
, which is
1
2
+
1
2
(
m
+
1
)
for large
s
.
For example, we can rebuild the column
1
(
0
)
in Figure
4 by accessing the elements in rows
{
0, 1
}
and in columns
0
(
0
)
, 2
(
0
)
, 0
(
1
)
, 2
(
1
)
,
R
,
Z
, and all the elements in column
1
(
1
)
.
The rebuilding ratio for this code is
4/7
.
Using duplication we can have
arbitrarily large number of
columns
, independent of the number of rows. Moreover the
above corollary shows that it also has an almost optimal rati
o.
Next we will show that if we restrict ourselves to codes con-
structed using Construction 1 and duplication, the code usi
ng
the standard basis and duplication has optimal asymptotic r
ate.
In order to show that, we define a related graph. Define the
directed graph
D
m
=
D
m
(
V
,
E
)
as
V
=
{
w
F
m
2
:
w
6
=
0
}
,
and
E
=
{
(
w
1
,
w
2
)
:
|
w
2
\
w
1
|
=
1 mod 2
}
. Hence the
vertices are the nonzero binary vectors of length
m
, and
there is a directed edge from
w
1
to
w
2
if
|
w
2
\
w
1
|
is odd
size. From any induced subgraph
H
of
D
m
, we construct
the code
C
(
H
)
from the vertices of
H
using Construction
1. By Lemma 2 we know that a directed edge from
w
1
to
w
2
in
H
means
f
w
2
(
X
w
2
)
f
w
1
(
X
w
2
) =
, so only half
of the information from the column corresponding to
w
1
is
accessed while rebuilding the column corresponding to
w
2
.
For a directed graph
D
=
D
(
V
,
E
)
, let
S
and
T
be two
disjoint subsets of its vertices. We define the density of the
set
S
to be
d
S
=
E
S
|
S
|
2
and the density between
S
and
T
to be
d
S
,
T
=
E
S
,
T
2
|
S
||
T
|
, where
E
S
is the number of edges with both of
its endpoints in
S
, and
E
S
,
T
is the number of edges incident
with a vertex in
S
and a vertex in
T
. The following theorem
shows that the asymptotic ratio of any code constructed usin
g
Construction 1 and duplication is a function of the density o
f
the corresponding graph
H
.
Theorem 11.
Let
H
be an induced subgraph of
D
m
. Let
C
s
(
H
)
be the
s
-duplication of the code constructed using the vertices
of
H
and Construction
1
. Then the asymptotic ratio of
C
s
(
H
)
is
lim
s
R
(
C
s
(
H
)) =
1
d
H
2
Proof:
Let the set of vertices and edges of
H
be
V
(
H
) =
{
v
i
}
and
E
(
H
)
respectively. Denote by
v
l
i
,
v
i
V
(
H
)
,
l
[
0,
s
1
]
, the
l
-th copy of the column corresponding to the
vector
v
i
. In the rebuilding of column
v
l
i
,
l
[
0,
s
1
]
each
remaining systematic column
v
k
j
,
k
[
0,
s
1
]
, needs to access
all of its
2
m
elements unless
|
v
i
\
v
j
|
is odd, and in that case
it only has to access
2
m
1
elements. Hence the total amount
of accessed information for rebuilding this column is
(
s
|
V
(
H
)
| −
1
)
2
m
deg
+
(
v
i
)
s
2
m
1
,
where
deg
+
is the indegree of
v
i
in the induced subgraph
H
.
Averaging over all the columns in
C
s
(
H
)
we get the ratio:
R
(
C
s
(
H
))
=
v
l
i
∈C
s
(
H
)
(
s
|
V
(
H
)
| −
1
)
2
m
deg
+
(
v
i
)
s
2
m
1
s
|
V
(
H
)
|
(
s
|
V
(
H
)
|
+
1
)
2
m
=
s
|
V
(
H
)
|
(
s
|
V
(
H
)
| −
1
)
2
m
s
2
v
i
V
(
H
)
deg
+
(
v
i
)
2
m
1
s
|
V
(
H
)
|
(
s
|
V
(
H
)
|
+
1
)
2
m
=
s
|
V
(
H
)
|
(
s
|
V
(
H
)
| −
1
)
2
m
s
2
|
E
(
H
)
|
2
m
1
s
|
V
(
H
)
|
(
s
|
V
(
H
)
|
+
1
)
2
m
.
Hence
lim
s
R
(
C
s
(
H
)) =
1
|
E
(
H
)
|
2
|
V
(
H
)
|
2
=
1
d
H
2
.
We conclude from Theorem 11 that the asymptotic ratio of
any code using duplication and a set of binary vectors
{
v
i
}
is a
function of the density of the corresponding induced subgra
ph
of
D
m
with
{
v
i
}
as its vertices. Hence the induced subgraph
of
D
m
with maximal density corresponds to the code with
optimal asymptotic ratio. It is easy to check that the induce
d
subgraph with its vertices as the standard basis
{
e
i
}
m
i
=
1
has
density
m
1
m
. In fact this is the maximal possible density
among all the induced subgraph as Theorem 13 suggests, but
in order to show it we will need the following technical lemma
.