of 23
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
SIAM J. C
OMPUT
.
c

2008 Society for Industrial and Applied Mathematics
Vol. 37, No. 6, pp. 1842–1864
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
CRISTOPHER MOORE
, ALEXANDER RUSSELL
,
AND
LEONARD J. SCHULMAN
§
Abstract.
The dramatic exponential speedups of quantum algorithms over their best existing
classical counterparts were ushered in by the technique of
Fourier sampling
, introduced by Bernstein
and Vazirani and developed by Simon and Shor into an approach to the hidden subgroup problem.
This approach has proved successful for abelian groups, leading to efficient algorithms for factoring,
extracting discrete logarithms, and other number-theoretic problems. We show, however, that this
method cannot resolve the hidden subgroup problem in the symmetric groups, even in the weakest,
information-theoretic sense. In particular, we show that the
Graph Isomorphism
problem cannot be
solved by this approach. Our work implies that any quantum approach based upon the measurement
of coset states must depart from the original framework by using entangled measurements on multiple
coset states.
Key words.
quantum computing, hidden subgroup problem, graph isomorphism, Fourier sam-
pling
AMS subject classifications.
68Q17, 81R05, 43A65
DOI.
10.1137/050644896
1. Introduction: The hidden subgroup problem.
Many problems of in-
terest in quantum computing can be reduced to an instance of the
hidden subgroup
problem
(HSP). We are given a group
G
and a function
f
with the promise that, for
some subgroup
H
G
,
f
is invariant precisely under translation by
H
; that is,
f
is
constant on the left cosets of
H
and takes distinct values on distinct cosets. We then
wish to determine the subgroup
H
by querying
f
. Most algorithms for the HSP use
the following approach, referred to as the
standard method
or
Fourier sampling
[5].
Step 1.
Prepare two registers, the first in a uniform superposition over the elements
of
G
and the second with the value zero, yielding the state
|
ψ
1

=
1
|
G
|
g
G
|
g
⊗|
0

.
Step 2.
Query (or calculate) the function
f
defined on
G
and XOR it with the second
register. This entangles the two registers and results in the state
|
ψ
2

=
1
|
G
|
g
G
|
g
⊗|
f
(
g
)

.
Step 3.
Measure the second register. This puts the first register in a uniform su-
perposition over one of
f
’s level sets, i.e., one of the left cosets of
H
, and
Received by the editors November 11, 2005; accepted for publication (in revised form) Novem-
ber 12, 2007; published electronically March 26, 2008. This work was supported by NSF grants CCR-
0093065, PHY-0200909, PHY-0456720, EIA-0218443, EIA-0218563, CCR-0220070, CCR-0220264,
CCF-0524828, and CCF-0524613, and ARO grants W911NF-04-R-0009 and W911NF-05-1-0294.
http://www.siam.org/journals/sicomp/37-6/64489.html
Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, and the
Santa Fe Institute, Santa Fe, NM 87501 (moore@cs.unm.edu).
Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269
(acr@cse.uconn.edu).
§
Computer Science Department, California Institute of Technology, Pasadena, CA 91125
(schulman@cs.caltech.edu).
1842
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
1843
disentangles it from the second register. If we observe the value
f
(
c
), we have
the state
ψ
3
⊗|
f
(
c
)

, where
|
ψ
3

=
|
cH

=
1
|
H
|
h
H
|
ch

.
Alternately, we can view the first register as being in a mixed state with
density matrix
ρ
=
1
|
G
|
g
G
|
cH

cH
|
.
Step 4.
Carry out the quantum Fourier transform on
|
ψ
3

and measure the result;
that is, observe the “frequency” corresponding to one of the Fourier basis
functions.
For example, in Simon’s problem [35],
G
=
Z
n
2
and
f
is an oracle such that, for
some
y
,
f
(
x
)=
f
(
x
+
y
) for all
x
; in this case
H
=
{
0
,y
}
and we wish to identify
y
.In
Shor’s factoring algorithm [34],
G
is essentially the group
Z
n
, where
n
is the number
we wish to factor,
f
(
x
)=
c
x
mod
n
for a random
c<n
, and
H
is the subgroup
of
Z
n
whose index is the multiplicative order of
c
. (However, Shor’s algorithm does
not operate on
Z
n
directly—indeed, knowing
|
Z
n
|
would provide an efficient classical
algorithm. Instead, it performs the quantum Fourier transform over
Z
q
for some
q
= poly(
n
); see [34] or [13, 14].)
In both Simon’s and Shor’s algorithms, the group
G
is abelian and finite. It is
not hard to see that, in this case, a polynomial number (i.e., polynomial in log
|
G
|
)of
experiments of this type determine
H
. In a cyclic group, for instance, the observed
frequency is a random multiple of the index of
H
, so we can determine this index with
high probability by taking the greatest common divisor of these frequencies. More
generally, each experiment yields a random element of the dual space
H
perpen-
dicular to
H
’s characteristic function. After
O
(log
|
G
|
) such experiments, with high
probability these elements span
H
, and we can determine
H
via linear algebra.
While the
nonabelian
HSP appears to be much more difficult, it has very attrac-
tive applications. In particular, solving the HSP for the symmetric group
S
n
would
provide an efficient quantum algorithm for the
Graph Automorphism
and
Graph
Isomorphism
problems (see, e.g., Jozsa [21] for a review). Another important mo-
tivation is the relationship between the HSP over the dihedral group with hidden
shift problems [7] and cryptographically important cases of the shortest lattice vector
problem [29].
So far, algorithms for the HSP are known for only a few families of nonabelian
groups, including groups whose commutator subgroup is of polynomial size [30, 20];
“smoothly solvable” groups [10]; and some semidirect products of abelian groups [28,
18, 3]. Ettinger and Høyer [8] provided another type of result by showing that Fourier
sampling can solve the HSP for the dihedral groups
D
n
in an
information-theoretic
sense. That is, a polynomial number of experiments gives enough information to
reconstruct the subgroup, though it is unfortunately not known how to determine
H
from this information in polynomial time.
Extending the notion of Fourier sampling to nonabelian groups requires that
we define a nonabelian version of the Fourier transform. For abelian groups, the
Fourier basis functions are simply the homomorphisms
φ
:
G
C
such as the familiar
exponential function
φ
k
(
x
)=
e
2
πikx/n
for the cyclic group
Z
n
. In the nonabelian case,
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
1844
C. MOORE, A. RUSSELL, AND L. J. SCHULMAN
there are not enough such homomorphisms to provide a basis for all
C
-valued functions
on
G
. To create such a basis, we generalize to the
representations
of the group,
namely, homomorphisms
ρ
:
G
U
(
V
), where
U
(
V
) is the group of unitary matrices
acting on some
C
-vector space
V
of dimension
d
ρ
. It suffices to consider
irreducible
representations, namely, those for which no nontrivial subspace of
V
is fixed by the
various operators
ρ
(
g
); Fourier analysis over abelian groups then corresponds to the
special case where all irreducible representations have dimension one, the single entry
in these 1
×
1 matrices being the values of the Fourier basis functions. In general, once
a basis for each irreducible
ρ
is chosen, the matrix elements
ρ
ij
provide an orthogonal
basis for the vector space of all
C
-valued functions on
G
.
The quantum Fourier transform then consists of transforming (unit-length) vec-
tors in
C
[
G
]=
{
g
G
α
g
|
g
|
α
g
C
}
from the basis
{|
g
|
g
G
}
to the basis
{|
ρ, i, j
}
, where
ρ
is the name of an irreducible representation and 1
i, j
d
ρ
index a row and a column (in a chosen basis for
V
). Note, however, that a repre-
sentation
ρ
:
G
U
(
V
) does not intrinsically distinguish any specific basis for the
underlying space
V
and, for high-dimensional representations, this appears to require
a rather dramatic choice on the part of the transform designer. For instance, in a
group such as
S
n
, in most bases a typical representation
ρ
(
g
) is a dense matrix of
exponential size, but for a carefully chosen basis it is sparse and highly structured.
Making such choices of bases allows us to efficiently carry out the quantum Fourier
transform for a wide variety of groups [4, 17, 27].
Since the work of [15, 12], the most fundamental question concerning the HSP has
been whether there is a basis for the irreducible representations of a given group such
that measuring coset states in this basis provides enough information to determine
H
and, if so, whether this information can be extracted by an efficient algorithm.
This framework is known as
strong Fourier sampling
. In this article, we answer
this question in the negative for the symmetric group
S
n
, showing that this process
cannot distinguish relevant subgroups from each other, or from the trivial subgroup,
even information-theoretically. Indeed, we show that no measurement whatsoever,
including arbitrary positive operator-valued measurements (POVMs), on single coset
states can succeed. We remark that the subgroups on which we focus are among
the most important special cases of the HSP, as they are those to which
Graph
Isomorphism
naturally reduces.
Related work
. The terminology “strong Fourier sampling” [12] was invented to
distinguish this approach from the natural variant, called
weak Fourier sampling
,
where one only measures the name of the representation
ρ
and ignores the row and
column information. Weak Fourier sampling is basis-independent, making it attractive
from the standpoint of analysis; however, it cannot distinguish conjugate subgroups
from each other, and Hallgren, Russell, and Ta-Shma [15] showed that it cannot
distinguish the trivial subgroup from an order-2 subgroup of
S
n
consisting of
n/
2
disjoint transpositions. Specifically, they used character bounds to show that the
probability distributions obtained on representation names for the trivial and order-
2 subgroups are exponentially close in total variation distance: thus one needs an
exponential number of such experiments to distinguish them. Kempe and Shalev [22]
have generalized this result to other conjugacy classes and conjectured that one can
do no better than classical computation with this approach.
In an effort to shed light on the power of strong Fourier sampling, Grigni et al. [12]
showed that, for groups such as
S
n
, measuring in a
random
basis yields an exponen-
tially small amount of information. This can be explained, roughly, by the fact that
projecting a vector into a sufficiently high-dimensional random subspace results in
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
1845
tightly concentrated length. On the other hand, Moore et al. [28] showed that for the
affine groups and some
q
-hedral groups, measuring in a well-chosen basis can solve
the HSP in cases where random bases cannot.
Our contribution
. In this paper we show that strong Fourier sampling, in an
arbitrary basis of the algorithm designer’s choice, cannot solve the HSP for
S
n
.As
in [15] we focus on order-2 subgroups of the form
{
1
,m
}
, where
m
is an involution
consisting of
n/
2 disjoint transpositions. We show that strong Fourier sampling—
and more generally, arbitrary measurements of single coset states—cannot distinguish
most subgroups of this form from each other, or from the trivial subgroup, without
an exponential number of experiments.
The motivation for looking at this case of the HSP is as follows. If we fix two rigid
connected graphs of size
n
, then the automorphism group
H
of their disjoint union
is a subgroup of
S
2
n
. If they are isomorphic, then
H
is of the form
{
1
,m
}
, where
m
is the involution that swaps the two graphs, while if they are nonisomorphic, then
H
is trivial. This yields a classical reduction from
Graph Isomorphism
to
Graph
Automorphism
, and our results preclude a quantum algorithm for the latter problem
that works by reducing to the HSP on the symmetric group.
However, the involutions
m
which switch the two graphs are not generic elements
of the conjugacy class in
S
2
n
consisting of
n
disjoint transpositions, since they switch
the first
n
vertices with the last
n
vertices. The set of such elements forms a conjugacy
class in the
wreath product
S
n
Z
2
S
2
n
, and it is the HSP on this group, rather
than all of
S
2
n
, to which
Graph Isomorphism
naturally reduces. To address the
possibility of a quantum algorithm that uses this reduction, we present an additional
result showing that this case of the HSP also requires an exponential number of
experiments.
We remark that our results do not preclude the existence of an efficient quantum
algorithm for the HSP on
S
n
or
S
n
Z
2
. Rather, they force us to either abandon
coset states or consider
multiregister
algorithms, in which we prepare multiple coset
states and subject them to entangled measurements, rather than performing a product
measurement where each coset state is measured independently. Some progress in
this direction has been made: Ettinger, Høyer, and Knill [9] showed that the HSP on
arbitrary groups can be solved information-theoretically with a polynomial number
of coset states, and two of the present authors have shown how to carry out such
a measurement in the Fourier basis [25]. Kuperberg [24] devised a subexponential
(2
O
(
log
n
)
) algorithm for the HSP on the dihedral group
D
n
that uses entangled
measurements, and Alagic, Moore, and Russell [1] obtained a similar algorithm for
the HSP on groups of the form
G
n
. Bacon, Childs, and van Dam determined the
optimal
multiregister measurement for the dihedral group [2] (see also [26]) and used
this approach to construct an algorithm for a class of semidirect product groups [3].
Whether a similar approach can be applied to the symmetric group is a major open
question. Hallgren et al. [16] have shown, however, that no family of measurements
across
o
(
n
log
n
) coset states can distinguish
H
=
{
1
,m
}
from the trivial group in
S
n
with a polynomial number of repetitions. We remark that in light of the upper bounds
of [9, 25],
O
(
n
log
n
) coset states do, at least information-theoretically, determine the
answer. Constructing such highly entangled measurements poses a major conceptual
challenge, and it is far from clear in what cases they can be carried out efficiently.
Of course, it is also possible that a completely different approach—one which does
not use coset states, or which does not start by reducing to the HSP—will provide an
efficient quantum algorithm for
Graph Isomorphism
.
The paper is organized as follows. In section 2 we give a brief introduction to
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
1846
C. MOORE, A. RUSSELL, AND L. J. SCHULMAN
representation theory and nonabelian Fourier analysis. In section 3 we discuss the
general structure of quantum measurements on coset states and show that the optimal
measurement takes the form of strong Fourier sampling. In section 4 we show how
to bound the variance of the resulting probability distributions with respect to the
choice of hidden subgroup. In section 5 we record some specific facts about the rep-
resentations of the symmetric group, and in section 6 we use these facts to show that
an exponential number of measurements are necessary. Finally, in section 7 we adapt
the argument for the specific family of involutions relevant to
Graph Isomorphism
.
2. Fourier analysis over finite groups.
We briefly discuss the elements of the
representation theory of finite groups. Our treatment is primarily for the purposes of
setting down notation; we refer the reader to [11, 33] for complete accounts.
Let
G
be a finite group. A
representation
ρ
of
G
is a homomorphism
ρ
:
G
U
(
V
), where
V
is a finite-dimensional Hilbert space and
U
(
V
) is the group
of unitary operators on
V
. The
dimension
of
ρ
, denoted
d
ρ
, is the dimension of the
vector space
V
. By choosing a basis for
V
, we can then identify
ρ
(
g
) with a unitary
d
ρ
×
d
ρ
matrix so that for every
g, h
G
,
ρ
(
gh
)=
ρ
(
g
)
·
ρ
(
h
).
Fixing a representation
ρ
:
G
U
(
V
), we say that a subspace
W
V
is
invariant
if
ρ
(
g
)
W
W
for all
g
G
.Wesay
ρ
is
irreducible
if it has no invariant subspaces
other than the trivial space
{
0
}
and
V
. If two representations
ρ
and
σ
are the same
up to a unitary change of basis, we say that they are
equivalent
. It is a fact that
any finite group
G
has a finite number of distinct irreducible representations up to
equivalence, and, for a group
G
, we let
̂
G
denote a set of representations containing
exactly one from each equivalence class. The irreducible representations of
G
give
rise to the Fourier transform. Specifically, for a function
f
:
G
C
and an element
ρ
̂
G
, define the
Fourier transform of
f
at
ρ
to be
ˆ
f
(
ρ
)=
d
ρ
|
G
|
g
G
f
(
g
)
ρ
(
g
)
.
The leading coefficients are chosen to make the transform unitary, so that it preserves
inner products:

f
1
,f
2

=
g
f
1
(
g
)
f
2
(
g
)=
ρ
̂
G
tr
(
ˆ
f
1
(
ρ
)
·
ˆ
f
2
(
ρ
)
)
.
Given a representation
ρ
and pair of integers 1
i, j
d
ρ
, we can associate a basis
vector
|
ρ, i, j

, which assigns the matrix entry
ρ
(
g
)
i,j
to each element
g
. As described
above, these form an orthonormal basis for
C
[
G
], which implies
ρ
̂
G
d
2
ρ
=
|
G
|
.
In the case when
ρ
is
not
irreducible, it can be decomposed into a direct sum
of irreducible representations, each of which operates on an invariant subspace. We
write
ρ
=
σ
1
⊕···⊕
σ
k
and, for the
σ
i
appearing at least once in this decomposition,
σ
i
ρ
. In general, a given
σ
can appear multiple times, in the sense that
ρ
can have
an invariant subspace isomorphic to the direct sum of
a
ρ
σ
copies of
σ
. In this case
a
ρ
σ
is called the
multiplicity
of
σ
in
ρ
, and we write
ρ
=
σ
ρ
a
ρ
σ
σ
.
For a representation
ρ
we define its
character
as the trace
χ
ρ
(
g
)=
tr
ρ
(
g
). Since
the trace is invariant under conjugation, characters are constant on the conjugacy
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
1847
classes, and if
m
is a conjugacy class, we write
χ
ρ
(
m
)=
χ
ρ
(
m
), where
m
is any
element of
m
. Characters are a powerful tool for reasoning about the decomposition
of reducible representations. In particular, for
ρ, σ
̂
G
, we have the orthogonality
conditions

χ
ρ
σ

G
=
1
|
G
|
g
G
χ
ρ
(
g
)
χ
σ
(
g
)
=
{
1
=
σ,
0
=
σ.
If
ρ
is reducible, we have
χ
ρ
=
σ
ρ
a
ρ
σ
χ
σ
i
, and so the multiplicity
a
ρ
σ
is given by
a
ρ
σ
=

χ
ρ
σ

G
.
If
ρ
is irreducible,
Schur’s lemma
asserts that the only matrices which commute
with
ρ
(
g
) for all
g
are the scalars,
{
c
1
|
c
C
}
. Therefore, for any
A
we have
(2.1)
1
|
G
|
g
G
ρ
(
g
)
(
g
)=
tr
A
d
ρ
1
d
ρ
since conjugating this sum by
ρ
(
g
) simply permutes its terms. In particular, consider
the average of
ρ
over a conjugacy class
m
, which we denote
ρ
(
m
):
ρ
(
m
) = Exp
m
m
ρ
(
m
) = Exp
g
ρ
(
g
1
mg
) = Exp
g
ρ
(
g
)
ρ
(
m
)
ρ
(
g
)
.
Then since
tr
ρ
(
m
)=
χ
ρ
(
m
), we have
(2.2)
ρ
(
m
)=
χ
(
m
)
d
ρ
1
d
ρ
.
Similarly, if
ρ
is reducible,
ρ
(
m
) is scalar in each irreducible subspace, giving
(2.3)
ρ
(
m
)=
σ
ρ
χ
σ
(
m
)
d
σ
Π
ρ
σ
,
where Π
ρ
σ
projects onto the subspace
a
ρ
σ
σ
spanned by copies of
σ
. We use these facts
below.
There is a natural product operation on representations: if
ρ
:
G
U
(
V
) and
σ
:
G
U
(
W
) are representations of
G
, we may define a new representation
ρ
σ
:
G
U
(
V
W
) by extending the rule (
ρ
σ
)(
g
):
u
v

ρ
(
g
)
u
σ
(
g
)
v
. In general,
the representation
ρ
σ
is not irreducible, even when both
ρ
and
σ
are. This leads
to the
Clebsch–Gordan problem
, that of decomposing
ρ
σ
into irreducibles. For
example, since
χ
ρ
σ
(
g
)=
χ
ρ
(
g
)
·
χ
σ
(
g
), the multiplicity of
τ
in
ρ
σ
is

χ
τ
ρ
χ
σ

G
.
Group elements can act on each other on the left or right. Thus we can consider
subspaces of
C
[
G
] that are invariant under left multiplication, right multiplication, or
both; these subspaces are called
left-
,
right-
,or
bi-invariant
, respectively. Each
ρ
̂
G
corresponds to a
d
2
ρ
-dimensional bi-invariant subspace of
C
[
G
]. We can think of the bi-
invariant subspace as a single
d
2
ρ
-dimensional representation, consisting of the space of
d
ρ
×
d
ρ
matrices
A
.If
ρ
(
g
) acts on
A
by left or right multiplication, the left- and right-
invariant subspaces correspond to
A
’s columns and rows, respectively; for instance,
each column of
A
is acted on independently by left multiplication by
ρ
(
g
), and the
space of matrices
A
which are nonzero only in this column form a
d
ρ
-dimensional
left-invariant subspace. Thus, each bi-invariant subspace can be decomposed into
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
1848
C. MOORE, A. RUSSELL, AND L. J. SCHULMAN
d
ρ
d
ρ
-dimensional left-invariant subspaces, or (transversely)
d
ρ
d
ρ
-dimensional right-
invariant subspaces.
However, this decomposition is not unique. If we think of
A
as the space of linear
operators on the same
d
ρ
-dimensional vector space
V
on which
ρ
acts, changing the
orthonormal basis for
V
transforms the matrices
A
. Thus, each orthonormal basis
B
of
V
gives a way to divide the bi-invariant subspace into left-invariant columns
and right-invariant rows, and each such subspace is associated with some basis vector
b
B
.
3. The structure of the optimal measurement.
In this section we show that
starting with a single coset state, the optimal measurement for the HSP is precisely
an instance of strong Fourier sampling (possibly in an overcomplete basis). This has
been pointed out several times in the past, at varying levels of explicitness [19, 24];
we state it here for completeness. Everything we say in this section is true for the
HSP in general. However, for simplicity we focus on the special case of the HSP called
the
hidden conjugate problem
in [28]: there is a (nonnormal) subgroup
H
, and we are
promised that the hidden subgroup is one of its conjugates,
H
g
=
g
1
Hg
for some
g
G
.
We may treat the states arising after Step 3 of the procedure above as elements
of the group algebra
C
[
G
]. We use the notation
|
g

=1
·
g
C
[
G
] so that the vectors
|
g

form an orthonormal basis for
C
[
G
]. Given a set
S
G
,
|
S

denotes a uniform
superposition over the elements of
S
,
|
S

=(1
/
|
S
|
)
s
S
|
s

.
3.1. The optimal POVM consists of strong Fourier sampling.
The most
general type of measurement allowed in quantum mechanics is a POVM. A POVM
with a set of possible outcomes
J
consists of a set of positive operators
{
M
j
|
j
J
}
subject to the completeness condition,
(3.1)
j
M
j
=
1
.
Since positive operators are self-adjoint, they can be orthogonally diagonalized, and
since their eigenvalues are positive, they can be written as a positive linear combina-
tion of projection operators (see, e.g., [32, sect. 10]). Any POVM may thus be refined
so that each
M
j
=
a
j
μ
j
, where
μ
j
is a projection operator and
a
j
is positive and real.
The result of this refined measurement on the state
|
ψ

is a random variable,
taking values in
J
, that is equal to
j
J
with probability
P
j
=
a
j

ψ
|
μ
j
|
ψ

. Note
that the outcomes
j
need not correspond to subgroups directly; the algorithm designer
is free to carry out
t
of these experiments (where
t
is, ideally, polynomial), observing
outcomes
j
1
,...,j
t
, and then apply some additional computation to find the most
likely subgroup given these observations.
If
g
is chosen from
G
uniformly so that the hidden subgroup is a uniformly random
conjugate of
H
, we wish to find a POVM that maximizes the probability of correctly
identifying
g
from the coset state
|
H
g

. (Of course, to identify a conjugate
H
g
,we
need only specify
g
up to an element of the normalizer of
H
.) Since a random left
coset of
H
g
can be written
cgH
g
=
cHg
for a random
c
G
, the probability we
observe outcome
j
is
(3.2)
P
j
=
a
j
1
|
G
|
c
G

cHg
|
μ
j
|
cHg

.
Ip [19] observed that in the special case that each outcome
j
corresponds to a sub-
group, maximizing the probability that
j
is correct subject to the constraint (3.1)
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
1849
gives a semidefinite program. Since such programs are convex, the optimum is unique
and is a fixed point of any symmetries possessed by the problem.
However, our proof relies on an elementary “symmetrization” argument. Given
a group element
x
G
, let
L
x
|
g

=
|
xg

denote the unitary matrix corresponding to
left group multiplication by
x
. In particular, applying
L
x
maps one left coset onto
another:
|
cHg

=
L
c
|
Hg

. Writing
P
j
=
a
j
1
|
G
|
c
G

cHg
|
μ
j
|
cHg

=
a
j
Hg
1
|
G
|
c
G
L
c
μ
j
L
c
Hg
,
we conclude that replacing
μ
j
for each
j
with the symmetrization
μ
j
=
1
|
G
|
g
G
L
g
μ
j
L
g
does not change the resulting probability distribution
P
j
. Since
μ
j
commutes with
L
x
for every
x
G
and provides exactly the same information as the original
μ
j
,we
may assume without loss of generality that the optimal POVM commutes with
L
x
for
every
x
G
.
It is easy to see that any projection operator that commutes with left multipli-
cation projects onto a left-invariant subspace of
C
[
G
], and we can further refine the
POVM so that each
μ
j
projects onto an
irreducible
left-invariant subspace. Each such
space is contained in the bi-invariant subspace corresponding to some irreducible rep-
resentation
ρ
, in which case we write
im
μ
j
ρ
. As discussed in section 2, a given
irreducible left-invariant subspace corresponds to some unit vector
b
in the vector
space
V
on which
ρ
acts. Thus we can write
μ
j
=
|
b
j

b
j
|⊗
1
d
ρ
,
where
1
d
ρ
is the identity operator on that left-invariant subspace. Let
B
=
{
b
j
|
im
μ
j
ρ
}
; then (3.1) implies a completeness condition for each
ρ
̂
G
,
(3.3)
b
j
B
a
j
|
b
j

b
j
|
=
1
d
ρ
,
and so
B
is a (possibly overcomplete) basis for
V
. In other words, the optimal POVM
consists of first measuring the representation name
ρ
and then performing a POVM
on the vector space
V
with the set of possible outcomes
B
. Another way to see this is
to regard the choice of coset as a mixed state; then its density matrix is block-diagonal
in the Fourier basis, and so as Kuperberg puts it [24], measuring the representation
name “sacrifices no entropy.”
We note that in the special case that this POVM is a von Neumann measurement—
that is, when
B
is an orthonormal basis for
V
—it corresponds to measuring the column
of
ρ
in that basis, which is how strong Fourier sampling is usually defined. (As pointed
out in [12], nothing is gained by measuring the row, since we have a random left coset
cHg
and left-multiplying by a random element
c
in an irreducible representation com-
pletely mixes the probability across the rows in each column. Here this is reflected
by the fact that each
μ
j
is a scalar in its left-invariant subspace.) However, in general
the optimal measurement might consist of an overcomplete basis, or
frame
, in each
ρ
,
consisting of vectors
b
j
with weights
a
j
.
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
1850
C. MOORE, A. RUSSELL, AND L. J. SCHULMAN
Now that we know
μ
j
takes this form, let us change notation. Given
ρ
̂
G
acting on a vector space
V
and a unit vector
b
V
, let Π
ρ
b
=
|
b

b
|⊗
1
d
ρ
denote
the projection operator onto the left-invariant subspace corresponding to
b
. Then
μ
j
ρ
b
j
, and (3.2) becomes
(3.4)
P
j
=
a
j
1
|
G
|
c
G
Π
ρ
b
j
|
cHg

2
=
a
j
Π
ρ
b
j
|
Hg

2
.
We can write this as the product of the probability
P
(
ρ
) that we observe
ρ
, times the
conditional probability
P
(
ρ,
b
j
) that we observe
b
j
. Note that by (3.3),
Π
ρ
=
b
j
B
a
j
Π
ρ
b
j
is the projection operator onto the bi-invariant subspace corresponding to
ρ
. Then
P
j
=
P
(
ρ
)
P
(
ρ,
b
j
)
,
where
P
(
ρ
)=

Π
ρ
|
H

2
,
(3.5)
P
(
ρ,
b
j
)=
a
j
Π
ρ
b
j
|
Hg

2
/
P
(
ρ
)
.
(3.6)
Note that
P
(
ρ,
b
j
) depends on
g
, but
P
(
ρ
) does not, which is why weak sampling is
incapable of distinguishing conjugate subgroups.
3.2. The probability distribution for a conjugate subgroup.
Now let us
use the fact that
|
H

is a superposition over a subgroup and calculate
P
(
ρ
) and
P
(
ρ,
b
j
) as defined in (3.5) and (3.6). This will set the stage for asking whether we
can distinguish different conjugates of
H
from each other or from the trivial subgroup.
Fix an irreducible representation
ρ
that acts on a vector space
V
. Then Fourier
transforming the state
|
H

=
1
|
H
|
h
H
|
h

yields the coefficient
̂
H
(
ρ
)=
d
ρ
|
H
||
G
|
h
H
ρ
(
h
)=
d
ρ
|
H
|
|
G
|
Π
H
,
where Π
H
=(1
/
|
H
|
)
h
H
ρ
(
h
) is a projection operator onto a subspace of
V
. The
probability that we observe
ρ
is then the norm squared of this coefficient,
(3.7)
P
(
ρ
)=
̂
H
(
ρ
)
2
=
d
ρ
|
H
|
|
G
|
rk
Π
H
,
and, as stated above, this is the same for all conjugates
H
g
. The conditional proba-
bility that we observe the vector
b
j
, given that we observe
ρ
, is then
(3.8)
P
(
ρ,
b
j
)=
a
j
Π
ρ
b
j
|
H

2
P
(
ρ
)
=
a
j
̂
H
(
ρ
)
b
j
2
P
(
ρ
)
=
a
j

Π
H
b
j

2
rk
Π
H
.
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
1851
In the case where
H
is the trivial subgroup, Π
H
=
1
d
ρ
and
P
(
ρ,
b
j
) is given by
(3.9)
P
(
ρ,
b
j
)=
a
j
d
ρ
.
We call this the
natural distribution
on the frame
B
=
{
b
j
}
. In the case that
B
is an
orthonormal basis,
a
j
= 1 and
P
(
ρ,
b
j
) is simply the uniform distribution on
B
.
This probability distribution over
B
changes for a conjugate
H
g
in the following
way. The Fourier transform of
|
Hg

is
̂
Hg
(
ρ
)=
d
ρ
|
H
|
|
G
|
Π
H
ρ
(
g
)
,
and we have
(3.10)
P
(
ρ,
b
j
)=
a
j

Π
H
(
g
b
j
)

2
rk
Π
H
,
where we write
g
b
for
ρ
(
g
)
b
.
Our goal is to understand, for each fixed
b
, to what extent
P
(
ρ,
b
) varies with
g
, and so to what extent measurements of this type can distinguish the conjugates
H
g
from each other. Regarding this as a random variable over the choice of
g
, its
expectation is easy to calculate: we have
Exp
g

Π
H
(
g
b
)

2
= Exp
g
b
(
g
)
Π
H
ρ
(
g
)
b
=
b
,
(
Exp
g
ρ
(
g
)
Π
H
ρ
(
g
)
)
b
=
rk
Π
H
d
ρ
,
where we used (2.1),

b

2
= 1, and the fact that the trace of a projection operator is
its rank. Combining this with (3.10), the expected probability is simply the natural
distribution (3.9),
Exp
g
P
(
ρ,
b
j
)=
a
j
d
ρ
.
We wish to show that

Π
H
(
g
b
)

2
, and therefore
P
(
ρ,
b
j
), is in fact very close
to its expectation for most conjugates. In the next section, we present our primary
technical contribution, which is a method for establishing concentration results for
this random variable.
4. The variance of projection through a random involution.
In this sec-
tion we focus on the case where
H
=
{
1
,m
}
for an element
m
chosen uniformly at
random from a fixed conjugacy class
m
of involutions. (Observe that order is pre-
served under conjugation so that if
m
is an involution, then so are all elements of
m
.)
Given an irreducible representation
ρ
:
G
U
(
V
) and a vector
b
V
, we bound the
variance, over the choice of
m
m
, of the probability
P
m
(
ρ,
b
) that
b
is observed
given that we observed
ρ
. Our key insight is that this variance depends on how the
tensor product representation
ρ
ρ
decomposes into irreducible representations
σ
,
and how the vector
b
b
projects into these constituent subspaces.
Recall that, if a representation
ρ
is reducible, it can be written as an orthogonal
direct sum of irreducibles
ρ
=
σ
ρ
a
ρ
σ
σ
, where
a
ρ
σ
is the multiplicity of
σ
. We let
Π
ρ
σ
denote the projection operator whose image is
a
ρ
σ
σ
, that is, the span of all the
irreducible subspaces isomorphic to
σ
.
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
1852
C. MOORE, A. RUSSELL, AND L. J. SCHULMAN
Lemma 4.1.
Let
ρ
be a representation of a group
G
acting on a space
V
and let
b
V
.Let
m
be an element chosen uniformly from a conjugacy class
m
of involutions.
If
ρ
is irreducible, then
Exp
m
m

b
,m
b

=
χ
ρ
(
m
)
d
ρ

b

2
.
If
ρ
is reducible, then
Exp
m
m

b
,m
b

=
σ
ρ
χ
σ
(
m
)
d
σ

Π
ρ
σ
b

2
.
Proof
. Let
ρ
(
m
) denote the average of
ρ
over the conjugacy class
m
. Using (2.2),
we have
Exp
m

b
,m
b

=

b
(
m
)
b

=
χ
ρ
(
m
)
d
ρ

b

2
.
Similarly, if
ρ
is reducible, by (2.3) we have
Exp
m

b
,m
b

=

b
(
m
)
b

=
σ
ρ
χ
σ
(
m
)
d
σ

b
,
Π
ρ
σ
b

=
σ
ρ
χ
σ
(
m
)
d
σ

Π
ρ
σ
b

2
.
Turning now to the second moment of

b
,m
b

, we observe that
|
b
,m
b
|
2
=

b
,m
b

b
,m
b

=

b
b
,m
b
m
b

=

b
b
,m
(
b
b
)

,
where the action of
m
on the vector
b
b
is precisely given by the action of
G
in the
representations
ρ
ρ
. This will allow us to express the second moment of the inner
product

b
,m
b

in terms of the projections of
b
b
into the irreducible constituents
of the tensor product representation
ρ
ρ
.
Lemma 4.2.
Let
ρ
be a representation of a group
G
acting on a space
V
and let
b
V
.Let
m
be an element chosen uniformly at random from a conjugacy class
m
of involutions. Then
Exp
m
m
|
b
,m
b
|
2
=
σ
ρ
ρ
χ
σ
(
m
)
d
σ
Π
ρ
ρ
σ
(
b
b
)
2
.
Proof
. We write the second moment as a first moment over the product represen-
tation
ρ
ρ
: as above,
|
b
,m
b
|
2
=

b
b
,m
(
b
b
)

so that
Exp
m
|
b
,m
b
|
2
= Exp
m

b
b
,m
(
b
b
)

,
and applying Lemma 4.1 completes the proof.
Now let Π
m
H
denote the projection operator given by
Π
m
v
=
v
+
m
v
2
.
For a given vector
b
B
, we will focus on the expectation and variance of

Π
m
b

2
.
These are given by the following lemma.
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
1853
Lemma 4.3.
Let
ρ
be an irreducible representation acting on a space
V
and let
b
V
.Let
m
be an element chosen uniformly at random from a conjugacy class
m
of involutions. Then
Exp
m
m

Π
m
b

2
=
1
2

b

2
(
1+
χ
ρ
(
m
)
d
ρ
)
,
(4.1)
Var
m
m

Π
m
b

2
1
4
σ
ρ
ρ
χ
σ
(
m
)
d
σ
Π
ρ
ρ
σ
(
b
b
)
2
.
(4.2)
Proof
. For the expectation,
Exp
m

Π
m
b

2
= Exp
m

b
,
Π
m
b

=
1
2
Exp
m
(

b
,
b

+

b
,m
b

)
=
1
2

b

2
(
1+
χ
ρ
(
m
)
d
ρ
)
,
where the last equality follows from Lemma 4.1.
For the variance, we first calculate the second moment,
Exp
m

Π
m
b

4
= Exp
m
|
b
,
Π
m
b
|
2
=
1
4
Exp
m
|
b
,
b

+

b
,m
b
|
2
=
1
4
Exp
m
(
|
b
,
b
|
2
+2Re

b
,
b

b
,m
b

+
|
b
,m
b
|
2
)
=
1
4
(

b

4
+2

b

4
χ
ρ
(
m
)
d
ρ
+
σ
ρ
ρ
χ
σ
(
m
)
d
σ
Π
ρ
ρ
σ
(
b
b
)
2
)
,
where in the last line we applied Lemmas 4.1 and 4.2 and the fact that any character
evaluated at an involution is real. Then
Var
m

Π
m
b

2
= Exp
m

Π
m
b

4
(
Exp
m

Π
m
b

2
)
2
=
1
4
[
σ
ρ
ρ
χ
ρ
(
m
)
d
ρ
Π
ρ
ρ
σ
(
b
b
)
2
−
b

4
(
χ
ρ
(
m
)
d
ρ
)
2
]
.
(4.3)
Ignoring the second term, which is negative, gives the stated result.
Finally, we point out that since
Exp
m

Π
m
b

2
=

b

2
rk
Π
m
d
ρ
,
we have
(4.4)
rk
Π
m
d
ρ
=
1
2
(
1+
χ
ρ
(
m
)
d
ρ
)
,
a fact which we will use below.
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
1854
C. MOORE, A. RUSSELL, AND L. J. SCHULMAN
5. The representation theory of the symmetric group.
In this section we
record the particular properties of
S
n
and its representation theory which we apply
in the proofs of our main results. The irreducible representations of
S
n
are labeled
by Young diagrams or, equivalently, by integer partitions of
n
,
λ
=(
λ
1
,...,λ
t
)
,
where
i
λ
i
=
n
and
λ
i
λ
i
+1
for all
i
. The number of Young diagrams, equal to
the number of conjugacy classes in
S
n
, is the partition number
p
(
n
), which obeys
(5.1)
p
(
n
)=(1+
o
(1))
1
4
3
·
n
e
δ
n
<e
δ
n
,
where
δ
=
π
2
/
3
.
We identify each irreducible representation with its Young diagram
λ
, and denote its
character
χ
λ
and its dimension
d
λ
. In particular,
λ
is the trivial or parity represen-
tation if
λ
is a single row (
n
) or a single column (1
,...,
1), respectively. Given
λ
, its
conjugate
λ
is obtained by flipping
λ
about the diagonal:
λ
=(
λ
1
,...,λ
λ
1
), where
λ
j
=
|{
i
|
λ
i
j
}|
. In particular,
λ
1
=
t
. The representation
λ
is the (tensor)
product of
λ
with the parity representation.
The dimension of
λ
is given by the remarkable
hook length formula
:
d
λ
=
n
!
c
hook(
c
)
,
where this product runs over all cells of the Young diagram associated with
λ
and
hook(
c
) is the number of cells appearing in either the same column or row as
c
,
excluding those that are above or to the left of
c
.
For example, the partition
λ
=(
λ
1
2
3
4
)=(6
,
5
,
3
,
2) is associated with the
diagram shown in Figure 5.1 below. The hook associated with the cell (2
,
2) in this
diagram appears in Figure 5.2; it has length 6.

1

2

3

4
Fig. 5.1
.
The Young diagram for
λ
=
(6
,
5
,
3
,
2)
.

1

2

3

4
Fig. 5.2
.
A hook of length
6
.
The symmetric groups have the property that every representation
λ
possesses a
basis in which its matrix elements are real, and so all its characters are real. However,
in a given basis
λ
might be complex, so we will refer below to its complex conjugate,
the representation
λ
(not to be confused with
λ
).
The study of the asymptotic properties of the representations of
S
n
typically
focuses on the
Plancherel
distribution (see, e.g., Kerov’s monograph [23]). For a
general group
G
, this is the probability distribution obtained on
̂
G
by assigning
ρ
the
probability density
d
2
ρ
/
|
G
|
. One advantage of this distribution is that the density at
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
THE SYMMETRIC GROUP DEFIES STRONG FOURIER SAMPLING
1855
ρ
is proportional to its contribution, dimensionwise, to the group algebra
C
[
G
]. Note
that in the context of the HSP, the Plancherel distribution is exactly the one obtained
by performing weak Fourier sampling on the trivial hidden subgroup.
In the symmetric groups a fair amount is known about representations chosen
according to the Plancherel distribution. In particular, Vershik and Kerov [36] have
given the following result, showing that with high probability they have dimension
equal to
e
Θ(
n
)
n
!.
Theorem 5.1 (
see [36]
).
Let
λ
be chosen from
̂
S
n
according to the Plancherel
distribution. Then there exist positive constants
c
1
and
c
2
for which
lim
n
→∞
Pr
[
e
c
1
n
n
!
d
λ
e
c
2
n
n
!
]
=1
.
Vershik and Kerov have also obtained estimates for the
maximum
dimension of a
representation in
̂
S
n
.
Theorem 5.2 (
see [36]
).
There exist positive constants
ˇ
c
and
ˆ
c
such that for all
n
1
,
e
ˇ
c
n
n
!
max
λ
̂
S
n
d
λ
e
ˆ
c
n
n
!
.
Along with these estimates, we will use the following (one-sided) large-deviation
versions of Theorem 5.1.
Lemma 5.3.
Let
λ
be chosen according to the Plancherel distribution on
̂
S
n
.
1.
Let
δ
=
π
2
/
3
as in
(5.1)
. Then for sufficiently large
n
,
Pr
[
d
λ
e
δ
n
n
!
]
<e
δ
n
.
2.
Let
0
<c<
1
/
2
. Then there is a constant
γ>
0
such that
Pr[
d
λ
n
cn
]
<n
γn
.
Proof
. For the first bound, setting
d
=
e
δ
n
n
! and using (5.1), we have
λ
:
d
λ
d
d
2
λ
n
!
p
(
n
)
d
2
n
!
<e
δ
n
.
For the second bound, recalling Stirling’s approximation
n
!
>n
n
e
n
,wehave
λ
:
d
λ
n
cn
d
2
λ
n
!
p
(
n
)
n
2
cn
n
!
=
n
(1
2
c
)
n
e
O
(
n
)
,
and setting
γ<
1
2
c
completes the proof.
Finally, we will also apply Roichman’s estimates [31] for the characters of the
symmetric group.
Definition 5.4.
For a permutation
π
S
n
, define the
support
of
π
, denoted
supp(
π
)
, to be the cardinality of the set
{
k
[
n
]
|
π
(
k
)
=
k
}
.
Theorem 5.5 (
see [31]
).
There exist constants
b>
0
and
0
<q<
1
so that for
n>
4
, for every conjugacy class
C
of
S
n
, and for every irreducible representation
λ
of
S
n
,
χ
λ
(
C
)
d
λ
(
max
(
q,
λ
1
n
,
λ
1
n
)
)
b
·
supp(
C
)
,
Copyright
©
by SIAM. Unauthorized reproduction
of this article is prohibited.
1856
C. MOORE, A. RUSSELL, AND L. J. SCHULMAN
where
supp(
C
) = supp(
π
)
for any
π
C
.
In our application, we take
n
to be even and consider involutions
m
in the
conjugacy class of elements consisting of
n/
2 disjoint transpositions,
m
=
m
n
=
{
σ
((12)(34)
···
(
n
1
n
))
σ
1
|
σ
S
n
}
. Note that each
m
m
n
is associated with
one of the (
n
1)!! = (
n
1)(
n
3)(
n
5)
···
1 perfect matchings of
n
things, and
that supp(
m
)=
n
.
6. Strong Fourier sampling over
S
n
.
We consider the hidden subgroup
H
=
{
1
,m
}
, where
m
is chosen uniformly from
m
=
m
n
S
n
, the conjugacy class
{
π
1
((1 2)(3 4)
···
(
n
1
n
))
π
|
π
S
n
}
;
we assume throughout that
n
is even. We start by performing weak sampling, i.e.,
measuring the name of an irreducible representation
λ
; the resulting probability distri-
bution on
̂
S
n
is the same for all
m
M
n
, and Hallgren, Russell, and Ta-Shma [15] es-
tablished that this probability distribution on
λ
is exponentially close to the Plancherel
distribution in total variation. We continue on to strong sampling, by allowing the
algorithm designer to choose an arbitrary POVM with a frame
B
=
{
b
j
}
and weights
{
a
j
}
obeying the completeness condition (3.3). We will show that with high proba-
bility (over
m
and
λ
), the conditional distribution induced on the vectors
B
is expo-
nentially close to the natural distribution (3.9) on
B
. It will follow by the triangle
inequality that it requires an exponential number of experiments of this type to dis-
tinguish two involutions from each other or, in fact, to distinguish
H
from the trivial
subgroup.
For simplicity, and to illustrate our techniques, we first prove this for a von
Neumann measurement, i.e., where
B
is an orthonormal basis for
λ
. In this case,
we show that the probability distribution on
B
is exponentially close to the uniform
distribution.
6.1. Von Neumann measurements.
Theorem 6.1.
Let
B
be an orthonormal basis for an irreducible representation
λ
. Given the hidden subgroup
H
=
{
1
,m
}
, where
m
is chosen uniformly at random
from
m
, let
P
m
(
b
)=
P
m
(
λ,
b
)
be the probability that we observe the vector
b
B
conditioned on having observed the representation name
λ
, and let
U
(
b
)=
U
(
λ,
b
)
be the uniform distribution on
B
. Then there is a constant
β>
0
such that for
sufficiently large
n
, with probability at least
1
e
βn
in
m
and
λ
, we have

P
m
U

1
<e
βn
.
Proof
. First, recall from (3.8) in section 3 that the conditional distribution on
B
is given by (since
a
j
=1)
(6.1)
P
m
(
b
)=
P
m
(
λ,
b
)=

Π
m
b

2
rk
Π
m
.
Our strategy will be to bound Var
m

Π
m
b

2
using Lemma 4.3 and apply Chebyshev’s
inequality to conclude that

Π
m
b

2
is almost certainly close to its expectation (4.1).
Recall, however, that our bounds on the variance of

Π
m
b

2
depend on the decom-
position of
λ
λ
into irreducibles and, furthermore, on the projection of
b
b
into these irreducible subspaces. Matters are somewhat complicated by the fact that
certain irreducibles
μ
appearing in
λ
λ
may contribute more to the variance than
others. Specifically, while Theorem 5.5 allows us to bound the contribution of those
μ