arXiv:2202.05339v2 [econ.TH] 23 May 2022
A Mathematical Theory of Classifiers;
Representations and Applications
1
Hamed Hamze Bajgiran
Caltech
Federico Echenique
Caltech
May 25, 2022
1
The first author gratefully acknowledges support from Beyon
d Limits (Learn-
ing Optimal Models) through CAST (The Caltech Center for Aut
onomous Systems
and Technologies). The second author gratefully acknowled
ges support from the
National Science Foundation under grant number SES-155875
7. We thank Chris
Chambers for many comments and suggestions. We would like to
thank our re-
viewers for their constructive feedback.
Abstract
We study the complexity of closure operators, with applications to m
achine
learning and decision theory. In machine learning, closure operator
s emerge
naturally in data classification and clustering. In decision theory, th
ey can
model equivalence of choice menus, and therefore situations with a
preference
for flexibility. Our contribution is to formulate a notion of complexity o
f
closure operators, which translate into the complexity of a classifie
r in ML,
or of a utility function in decision theory.
1 Introduction
We study the complexity of closure operators, with applications to m
achine
learning and decision theory in economics. Closure operators can be
used
to model a coarsening of reality. For example, the set of all 16 subs
ets
of the set
t
a, b, c, d
u
may be viewed as a fine-grained description of some
physical system. A coarser view could regard some subsets as equ
ivalent,
just like in the usual topology on the real line, the rational numbers
are
“indistinguishable” from the real numbers in the sense that their clo
sure
equals the reals. For example, all subsets of
t
a, c
u
in
t
a, b, c, d
u
could be seen
as equivalent, which a closure operator may capture by mapping the
sets
t
a
u
,
t
c
u
, and
t
a, c
u
into
t
a, c
u
.
Such coarsening of reality emerges naturally in machine learning and in
decision theory. In machine learning, a
classifier
will decide that some in-
dividual data-points should be lumped together. For example, the c
lassifier
can be applied to images of household pets, and classify them accord
ing to
species, without regard for color or size. An Irish Setter is indisting
uishable
from a Dachshund: both are mapped into the class of dogs; while a ca
nary
and a parrot may be seen as distinct from any dogs, but grouped to
gether
as birds.
In economic decision theory, an agent may be choosing a menu, or a s
tore,
from which to make an ultimate choice. Someone who is in the market fo
r
a particular item, say a Canon 5D Mark III, will regard any photogra
phy
store that carries this model as equivalent: one store indistinguish
able from
another as long as they have what the consumer is looking for. An ag
ent
choosing items in
t
a, b, c, d
u
, with preferences
a
ą
b
ą
c
ą
d
, will consider
all menus that contain
a
as indifferent; any menu that contains
b
but not
a
is also indifferent, and so on.
Closure operators are ideally suited to analyze these situations. In
par-
ticular, we shall see that closure operators can capture a desire f
or flexibility
stemming from agents who are uncertain about their preferences
. If our
1
agent is unsure of whether they rank
a
over
b
, or the other way around, they
may have a strict preference for a larger menu containing both opt
ions. For
example they may prefer
t
a
u
over
t
b
u
(as menus), but strictly prefer
t
a, b
u
over
t
a
u
. A branch of decision theory (one that started with the seminal
paper of Kreps, 1979) relies on closure operators as a modeling dev
ice. In
particular, the work of 2001 uses the operator that maps a set
A
Ď
R
n
to
its convex hull; and their key axiom is that agents are indifferent betw
een
A
and its convex hull.
Our starting point is a representation theorem for closure operat
ors. We
shall see that a closure operator may be built up from more basic, or
simpler,
operators. In machine learning, it is natural to think of a classifier a
s resulting
from the composition of simple linear classifiers. We show that a similar
representation is possible, even in discrete settings where there is
no language
for talking about linearity. In decision theory, some agents may hav
e complex
preferences over menus that result from a desire for flexibility in ch
oice. We
shall see that such complex preferences can be constructed fro
m simpler
operators that do not exhibit such a desire for flexibility.
Our representation theorem contains those of 2015 and 2020 as s
pecial
cases, and relies on similar ideas. The main novelty in our work lies in con-
necting the topologies arising from a closure operator, to those ar
ising from
its simplest constituent operators; and in using this connection to f
ormulate
a notion of complexity. It is indeed natural in machine learning to ident
ify
the complexity of a classifier with the number of different simple classifi
ers
needed to represent it: our paper formalizes this connection in the
abstract
model of discrete data.
For decision theory, we shall see that the complexity of an operato
r is
tied to the size of the (subjective) state-space needed to repre
sent an agent’s
choices. As we wrote above, a preference for flexibility emerges fr
om uncer-
tainty about future preferences. Such uncertainty can be repr
esented with a
state space that is obtained (as in Kreps, 1979) from the relevant
operator.
2
Indeed, 2001 write:
“the size of the subjective state space provides a measure of the
agent’s uncertainty about future contingencies.”
We show how to extend the approach in Dekel et. al, so that if their k
ey axiom
is applied to an arbitrary closure operator, even in the discrete set
ting of our
paper, a representation with a preference for flexibility may be obt
ained. In
this representation, we can bound the size of the state-space th
rough the
representation of a closure operator in terms of simpler constitue
nts.
2 Literature Review
Closure operators and their connections to topologies are a subje
ct of study
in mathematics (Ward, 1942), but the literature that is closest to o
ur work
pertains to applications in economics and computer science. Some of
these
applications are related to convex geometries and their represent
ations.
The basic concepts of abstract convex geometry and combinator
ial con-
vex hull operator are developed in Edelman and Jamison, 1985. Conv
ex hull
operators are a class of closure operators with an extra
anti-exchange
prop-
erty
1
. In decision theory, Koshevoy, 1999 studies the connection betw
een the
combinatorial convex hull operators and the path-independent c
hoice func-
tions. The closest papers to ours are Richter et al., 2015 and Chamb
ers et
al., 2020. Richter et al., 2015 provide a characterization of a combina
torial
convex hull operator through a set of primitive orderings. Using th
eir repre-
sentation, Richter et al., 2015 propose a notion of competitive equilib
rium in
an abstract environment. Chambers et al., 2020 extends the resu
lt of Richter
et al., 2015 and relates it to the Kreps, 1979 model of preferences
for flexi-
bility in decision theory. They provide the main characterization of clo
sure
operators by weak orders.
1
We say that
f
satisfies the anti-exchange property if given any closed set
A
and two
unequal points
x, y
P
X
z
A
, then
x
P
f
p
A
Y
y
q
implies that
y
R
f
p
A
Y
x
q
.
3
In the context of dynamic choice, and preferences for flexibility, f
ollow-
ing Kreps, 1979, 2001, and 2001, a rather extensive literature in e
conomics
studies different aspects of choice over menus. The closest relate
d papers
are those of Klaus Nehring and Puppe, 1999, Kopylov, 2009, Kopylo
v, 2018,
and Gorno, 2016. Klaus Nehring et al., 1999 provides a more refined P
areto
representation of the Krepsian model. They provide the connectio
n to the
representation by Aizerman and Malishevski, 1981 of choice corres
pondences
satisfying the heritage and outcast properties, which is another r
oute for prov-
ing Claim 1 in Richter et al., 2015; that every convex geometry is gener
ated
by a set of primitive ordering.
2
Kopylov, 2009 determines the number of pos-
itive and negative states in the setting of 2001. Gorno, 2016 shows
that any
preference ordering in Kreps’ setting has a representation as a D
ekel-Lipman-
Rustichini representation. Finally, Kopylov, 2018 proposes a comb
inatorial
model of subjective states. By relaxing the axioms of Kreps, he pr
esents a
weaker model of coherent aggregation.
Our application to classification problems in machine learning is related
to the link between closure operators and Galois connections, and t
heir appli-
cations in formal concept theory. Generally, there are different a
pplications of
lattice theory for studying general hierarchical concepts. For a
n overview of
representations and applications to closure systems and formal c
oncept the-
ory see Caspard and Monjardet, 2003, Domenach and Leclerc, 20
04, Bertet,
Demko, Viaud, and Gu ́erin, 2018, and Davey, 2002. In our paper,
we focus
on a simple model with a new complexity notion, its decision-theoretic a
nd
conceptual interpretations, and how it may be computed efficiently
.
2
In a nutshell, the heritage property states that if
A
Ď
B
, then
f
p
A
q Ě
f
p
B
qX
A
. When
imposed on a choice function, this property is the
α
property of Sen, 1977. The outcast
property states that if
f
p
A
q Ď
B
Ď
A
, then
f
p
A
q “
f
p
B
q
. This property is a weaker
form of property
β
of Sen, 1977. Note, however, that our paper takes closure oper
ators as
primitive, not choice.
4
3 Model and preliminaries
3.1 Preliminary definitions and notational conventions
Let
X
be a set. We denote the set of all subsets of
X
by 2
X
. A binary relation
on
X
is 1) a
weak order
on
X
if it is complete and transitive (meaning that
it orders all elements of
X
); 2) a
partial order
if it is reflexive, transitive
and anti-symmetric (
@
x, y
P
X
if
x
ľ
y
and
y
ľ
x
then
x
“
y
); 3) a
total
order
if it is a complete partial order.
Let
ľ
be a partial order on
X
. We say that two elements
x, y
with
x
ľ
y
or
y
ľ
x
are
ordered
, or
comparable
, by
ľ
. A subset
Y
of
X
is a
chain
if it is totally ordered by
ľ
, and an
anti-chain
if no two of its elements
are ordered by
ľ
. An element
x
P
X
is said to be an
upper bound
of
Y
if
x
ľ
a
for every
a
P
Y
. We denote the
least upper bound
, or
join
, of
Y
(if
it exists) by
Ž
Y
. Similarly, we denote the
greatest lower bound
or
meet
of
Y
by
Ź
Y
. The partially ordered set
X
is a
lattice
if, for all
x, y
P
X
,
the set
t
x, y
u
has a join and a meet: denoted by
x
_
y
and
x
^
y
respectively.
We introduce some concepts from discrete geometry that may be v
iewed
as analogues to concepts in convex analysis: see Richter et al., 2015
for an
earlier application of these ideas to economics. Denote the set of all
weak
orders on
X
by
R
. The
support function
of
A
Ď
X
is the function
h
A
:
R
Ñ
2
X
defined by
h
A
p
ľ
q “ t
x
P
A
|
x
ľ
y
@
y
P
A
u
. Similarly, we
may define the
support half-space
of
A
‰ H
with respect to
ľ
P
R
by
H
p
ľ
, h
A
q “ t
x
P
X
|
h
A
p
ľ
q
ľ
x
u
. By definition, we set
H
p
ľ
, h
H
q “ H
.
Observe the analogy with convex analysis. Here the set
R
serves the same
role as the dual of
X
(the set of continuous linear functionals), when
X
is a
Euclidean space.
One final notational convention is that we denote the indicator fun
ction
for a set
C
Ď
X
by
1
C
. So
1
C
p
x
q “
1 if
x
P
C
and
1
C
p
x
q “
0 otherwise.
5
3.2 Closure operators
The paper is a study of a special kind of functions defined on the sub
sets of
a finite set
X
; functions of the form
f
: 2
X
Ñ
2
X
that have the properties of
closure operators in topology (Kuratowski, 1955):
Definition 1.
A
closure operator
on
X
is a map
f
: 2
X
Ñ
2
X
that
satisfies the following properties:
1.
Extensivity
:
A
Ď
f
p
A
q
and
f
pHq “ H
.
2.
Idempotence
:
f
p
f
p
A
qq “
f
p
A
q
.
3.
Monotonicity
:
A
Ď
B
implies
f
p
A
q Ď
f
p
B
q
.
3
As applications of our theory, we shall discuss in detail two concret
e in-
terpretations of closure operators below, but in the abstract on
e may think
of the operator as a model of imperfect perception. The closure o
perator de-
scribes a coarse perception of reality: one that does not distinguis
h between
A
and
f
p
A
q
. The use of closure operators in topology follows this interpreta-
tion, as
f
p
A
q
consists of the points that are arbitrarily close to the elements
of
A
.
The following are examples of closure operators:
1. The
identity operator
is the closure operator
I
: 2
X
Ñ
2
X
such that
I
p
A
q “
A
for every
A
P
2
X
.
2. The
trivial closure operator
is defined as the closure operator
f
:
2
X
Ñ
2
X
such that
f
p
A
q “
X
for every nonempty
A
P
2
X
.
3. A
binary classifier
is an operator
f
C
, associated to a set
C
Ď
X
.
So that
f
C
pHq “ H
, and for nonempty
A
,
f
C
p
A
q “
C
if
A
Ď
C
and
f
C
p
A
q “
X
otherwise.
3
Kuratowski imposes
f
p
A
Y
B
q “
f
p
A
q Y
f
p
B
q
, a stronger property than monotonicity.
6
4. Given a function
u
:
X
Ñ
R
,
f
u
p
A
q “ t
x
P
X
:
u
p
x
q ď
max
t
u
p
x
1
q
:
x
1
P
A
uu
is the
strategically rational
operator defined from the function
u
.
Of the preceding examples, binary classifiers (3) and strategically r
a-
tional (4) operators will play an important role in the paper. It is wor
th
discussing these two classes of closure operators in some detail be
fore pro-
ceeding.
First note that a binary classifier gives rise to closed sets
S
p
f
C
q “
tH
, C, X
u
, which are the smallest kind of non-trivial topology possible. In
our paper, we shall think of these as simple classifiers. Binary classifi
ers are,
moreover, a special case of strategically rational operators: In
deed for a given
C
Ď
X
we may define
u
“
1
X
z
C
, and observe that
f
C
“
f
u
.
Turning to strategically rational operators, one may interpret
u
:
X
Ñ
R
as a utility function, and max
t
u
p
x
q
:
x
P
A
u
as the best utility achievable
from a set
A
Ď
X
of possible choices. Then
f
u
p
A
q
is the largest set of choices
that an agent with utility
u
would consider to be as good as choosing from
A
. In particular, if
f
u
p
A
q “
f
u
p
B
q
then the agent is equally happy choosing
an item from
A
as from
B
.
The “strategic rationality” terminology is borrowed from Kreps, 19
79,
and will be useful when we talk about applications to decision theory.
It
suggests an agent who is forward looking, and identifies a menu with t
he ele-
ment they plan to choose from the menu. Alternatively, strategica
lly rational
operators can be seen as abstract counterparts to the simple line
ar classifiers
used in machine learning: we shall also emphasize this interpretation w
hen
we talk about applications to machine learning.
Finally note that all that matters about
u
in the definition of
f
u
is the
weak order represented by
u
; that is,
x
ľ
y
iff
u
p
x
q ě
u
p
y
q
. In other words,
if
u
“
h
̋
v
, for a strictly monotone increasing function
h
:
R
Ñ
R
, then
7
f
u
“
f
v
. In the rest of the paper, we shall therefore write
f
ľ
for strategically
rational closure operators, where it will be understood that
ľ
is a weak
order over
X
. Observe that these coincide with the supporting half-spaces
we introduced above.
Definition 2.
A set
A
Ď
X
is
closed
with respect to a closure operator
f
: 2
X
Ñ
2
X
, if
A
“
f
p
A
q
. The set
f
p
2
X
q
of all closed sets with respect to
the closure operator
f
is the
topology
defined by
f
, and is denoted by
S
p
f
q
.
The terminology of closed sets and topology is justified by the followin
g
well-known result, which we state as Lemma 1.
4
Lemma 1.
Let
f
: 2
X
Ñ
2
X
be a closure operator on
X
, then the set of
closed sets
S
p
f
q
is closed under intersection and contains
H
and
X
. Indeed,
S
p
f
q
endowed with the meet and join operators
A
^
B
“
A
X
B
and
A
_
B
“
f
p
A
Y
B
q
is a lattice that has
H
and
X
as its (respectively) smallest and
largest elements.
Moreover, if
S
is any subset of
2
X
that is closed under intersection and
contains
H
and
X
, then there is a unique closure operator
f
S
: 2
X
Ñ
2
X
such that
S
p
f
S
q “
S
.
3.3 Application 1: A theory of classifiers
Interpret the elements of
X
as data, and suppose given a finite set
L
of
labels
. A
labeling correspondence
is a set-valued function Φ :
X
Ñ
L
that associates with each data point
x
P
X
a subset of labels Φ
p
x
q
.
Given a labeling correspondence Φ :
X
Ñ
L
, we define a
classifier
as a
function
f
: 2
X
Ñ
2
X
with
f
p
A
q “ t
x
|
x
P
X,
Ş
y
P
A
Φ
p
y
q Ď
Φ
p
x
qu
for every
A
Ď
X
.
4
In the context of convex geometry, which is most relevant for our
paper, the result
has been noticed in Edelman et al., 1985. But the result is well known; s
ee, for example,
Ward, 1942 and Kuratowski, 1955.
8
The interpretation of a labeling correspondence is straightforwar
d. It
attaches a set of labels to each data point. We interpret each label
as a
single feature or property attached to each data point. Hence, a
ttaching two
different labels
l
1
, l
2
P
L
to a data point
x
P
X
, Φ
p
x
q “ t
l
1
, l
2
u
, is interpreted
as if the data point
x
has both of those properties.
To understand the definition of a classifier, assume that the classifi
er
f
,
associated with a labeling correspondence Φ. Given a data point
x
Ď
X
,
Φ
p
x
q
is the set of all labels associated with the point
x
. To find the set of
data points in the same class (or category) as
x
, we need to consider all data
points with at least all the labels of the data point
x
. This is precisely the
definition of
f
p
x
q
.
More generally, for a given set of data points
A
Ď
X
,
f
p
A
q
is the set of
all data points that at least have all the labels in common with all points
in
A
. The idea is that if a decision-maker wants to find all data points that
are in the same class as the observed data points in
A
(without any other
information), she should consider all points
f
p
A
q
.
Remark.
A classifier derived from a labeling function is a closure operator:
The extensivity and monotonicity properties are simple to verify. To
show
idempotence, notice that by monotonicity
f
p
A
q Ď
f
p
f
p
A
qq
. Hence, we only
need to show that
f
p
f
p
A
qq Ď
f
p
A
q
. Assume that
x
P
f
p
f
p
A
qq
. By definition,
Ş
y
P
f
p
A
q
Φ
p
y
q Ď
Φ
p
x
q
. We know that for every
y
P
f
p
A
q
we have
Ş
z
P
A
Φ
p
z
q Ď
Φ
p
y
q
. Hence,
Ş
z
P
A
Φ
p
z
q Ď
Ş
y
P
f
p
A
q
Φ
p
y
q Ď
Φ
p
x
q
. Thus,
x
P
f
p
A
q
. As a
result,
f
satisfies the idempotence property.
The ideas may be illustrated by means of an example.
Example 1.
Consider a set of four data points
X
“ t
a, b, c, d
u
. The set
of labels is defined as
L
“ t
dog
,
cat
,
black
,
white
,
female
,
male
,
car
u
. Assume
that the labeling correspondence Φ :
X
Ñ
L
is as follows:
9
Φ
p
a
q “ t
dog
,
black
,
female
u
Φ
p
b
q “ t
dog
,
black
,
male
u
Φ
p
c
q “ t
cat
,
white
,
female
u
Φ
p
d
q “ t
car
,
black
u
The classifier associated with the above labeling correspondence ha
s
eight classes. Class1=
{
a
}
associated with the labels
{
dog, black, female
}
,
Class2=
{
b
}
associated with the labels
{
dog, black, male
}
, Class3=
{
c
}
as-
sociated with the labels
{
cat, white, female
}
, Class4=
{
d
}
associated with
labels
{
car, black
}
, Class5=
{
a,b
}
associated with the labels
{
dog, black
}
,
Class6=
{
a,b,d
}
associated with the labels
{
black
}
, Class7=
{
a,c
}
associated
with the labels
{
female
}
, and the last class is Class8=
{
a,b,c,d
}
associated
with all data points in the set
X
.
Figure 1 depicts the structure of the classifier and the associated
lattice
structure.
The topology
S
p
f
q
, represents all possible different
classes
of data
points, and
f
p
A
q
is the smallest class that contains
A
. By Lemma 1, we
know that the set of classes is closed under intersections. Moreov
er, there is
a lattice structure associated with the labeling correspondence. I
ndeed, we
may interpret a classifier through the topology it induces: Let
x
P
X
be in
two classes
A, B
P
S
, which means that it has the properties of the classes
A
and
B
. Then, there is a class
C
P
S
, with the properties of the classes
A
and
B
, such that
x
P
C
. The trivial class
X
represents the set of all data points,
having every possible label.
These observations may be summed up in the following proposition.
Proposition 2.
Let
X
be a set of data points. We have the followings:
1. Let
Φ :
X
Ñ
L
be a labeling correspondence. The classifier
f
: 2
X
Ñ
2
X
10
t
a, b, c, d
u
t
a, b, d
u
t
a, b
u
t
a, c
u
t
a
u
t
b
u
t
c
u
t
d
u
H
Figure 1: The lattice associated with the labeling correspondence Φ in
Ex-
ample 1.
associated with
Φ
is a closure operator.
2. Let
f
:
X
Ñ
X
be a closure operator on
X
. Then, there exists a set
of labels
L
, and a labeling correspondence
Φ :
X
Ñ
L
such that the
classifier associated with
Φ
is
f
. Moreover, one choice of
L
and
Φ
is
achieved by defining
L
“
S
p
f
q
and
Φ :
X
Ñ
L
with
Φ
p
x
q “ t
s
|
s
P
S
p
f
q
, x
P
s
u
.
The representation in the second part of the above proposition is n
ot
unique. However, section 5 provide the minimum number of labels need
ed
to represent a given classifier.
3.4 Application 2: Decision theory
Consider a decision-maker who is going to ultimately choose an element
x
P
X
, but who is first presented with a choice among possible
menus
A
Ď
X
. After selecting the menu
A
, the consumer chooses an alternative
from
A
.
11
We may think of a consumer, for example, who first decides between
retail shops, or restaurants, and then chooses an item from the
shop, or
a meal on the menu. But the situation is quite general: Most problems
in dynamic choice involve first selecting an alternative that will determ
ine
what may be chosen in the future: think, for example, of choosing h
ow much
to consume and how much to save, which will determine what is afforda
ble
in the future. Or making a career choice, which may determine the se
t of
geographical location one may ultimately be able to choose among to liv
e in.
We capture the situation in the abstract as a problem of choosing a s
et in
X
.
In particular, we shall focus on agents who have a
preference for flexibility.
Consider a decision maker, say Alice, who chooses according to a pre
fer-
ence relation
ľ
on
X
“ t
x, y, z
u
. If Alice can perfectly predict her choice, she
will only care about the best option available in a menu. So if, for examp
le,
x
ą
y
ą
z
are Alice’s preferences, then she will be indifferent between any
menu that contains
x
, and between any menu that does not contain
x
but
that contains
y
. Alice’s situation may be described through a strategically
rational operator, with
t
x, y, z
u “
f
ľ
pt
x
uq “
f
ľ
pt
x, y
uq “
f
ľ
pt
x, z
uq “
f
ľ
pt
x, y, z
uq
.
Bob, in contrast, may be unsure about his ultimate choice from a men
u.
He might have the same preference as Alice, or he might actually have
the
preference
y
ą
1
x
ą
1
z
that ranks
y
first. Observe that, as a consequence,
Bob will strictly prefer the menu
t
x, y, z
u
to
t
x
u
, or to
t
x, z
u
, as the larger
menu does not close any doors for Bob. If, in the end, Bob’s prefer
ence is
ľ
1
, then
t
x, y, z
u
lets him get his favorite choice. Indeed for Bob we need an
operator with
t
x, y, z
u “
f
pt
x, y, z
uq “
f
pt
x, y
uq ‰ t
x, z
u “
f
pt
x
uq “
f
pt
x, z
uq
.
Bob’s preferences cannot be captured through a single strategic
ally ratio-
12
nal operator. Actually for him we have that
f
“
f
ľ
X
f
ľ
1
, and thus Bob’s
operator
f
is derived from
two
strategically rational operators.
Now we see, as a preview of our results, that Bob’s behavior is
more com-
plex
than Alice’s because we need two basic strategically rational operat
ors
to capture Bob’s choice, while one suffices for Alice.
In all, Bob has a “preference for flexibility,” a model that was first pr
o-
posed by Kreps, 1979.
5
In Section 7 below we briefly recap Kreps’ model and
show what our results have to say about the resulting representa
tion.
4 A general representation of closure opera-
tors
A key motivation for the study in our paper is the next result, due to
2020,
which shows that a closure operator may be found as the intersect
ion of
supporting half-spaces.
6
Theorem 3.
A function
f
: 2
X
Ñ
2
X
is a closure operator iff there exist
weak orders
ľ
1
, . . . ,
ľ
k
on
X
, such that
f
p
A
q “
č
i
Pt
1
,...,k
u
H
p
ľ
i
, h
A
q
.
(1)
Using the language introduced above, we may rephrase this result a
s
follows:
Corollary 4.
A function
f
: 2
X
Ñ
2
X
is a closure operator iff there exist
5
Kreps’ model motivated a sizable literature in decision theory; see,
for example, 2001,
2007, Gul et al., 2001, K. Nehring, 2001, Klaus Nehring et al., 1999, C
hateauneuf and
Jaffray, 1989, and 2020
6
Note also that Richter et al., 2015 contains a similar result for linear or
ders.
13
strategically rational operators
g
u
i
,
1
ď
i
ď
k
, so that
f
p
A
q “
č
i
Pt
1
,...,k
u
g
u
i
p
A
q
.
(2)
Our first result is an elaboration on Theorem 3. It serves to introdu
ce
the notion that operators generate more complex operators, an
d how the
resulting topologies are connected.
Theorem 5.
Let
g
1
, . . . , g
k
be closure operators on
X
, then
f
: 2
X
Ñ
2
X
defined by
f
p
A
q “
č
i
Pt
1
,...,k
u
g
i
p
A
q
is a closure operator. We say that
f
is
generated
from
g
1
, . . . , g
k
.
Moreover, if
f, g
1
, . . . , g
k
are closure operators on
X
then
f
is generated
from
g
1
, . . . , g
k
iff
1.
S
p
g
i
q Ď
S
p
f
q
for all
i
P t
1
,
̈ ̈ ̈
, k
u
;
7
2. and if
A
P
S
p
f
q
and
x
R
A
, then there exists a closure operator
g
i
P
t
g
1
, . . . , g
k
u
such that
x
R
g
i
p
A
q
.
Theorem 5 warrants some remarks. Assume that the classifier
f
is gener-
ated by
g
1
, . . . , g
k
. By the first condition in the theorem, the topology
S
p
f
q
is finer than that of any constituent operator
S
p
g
i
q
.
The second condition is similar to the Separating Hyper Plane Theorem
in convex geometry. It means that if
A
is a closed set of
f
and
x
R
A
, then
there should be a separating classifier
g
i
P t
g
1
, . . . , g
k
u
that detect that
x
is
not in the closure of
A
with respect to
g
i
.
7
This statement is Theorem 5.1 in Ward, 1942.
14
5 Complexity of operators
In the Machine Learning literature, a neural network is built from a s
et of
simple classifiers. Given a data set, and a set of labels, a researcher
can
add many linear classifiers to build a large neural network that can sh
atter
data points to the correct classes. More generally, a researcher
can combine
many different functions to form a complex function with lots of para
meters
to shatter the set of data points to the correct classes.
We proceed with an analogous motivation: in Theorem 5, complex op-
erators are built up from simpler ones. In the abstract, discrete,
setting of
our paper, there is no useful notion of norm or approximation, but
the rep-
resentation in Theorem 5 serves as a starting point. When
f
“ X
g
i
, we may
think of
f
as being more complex than any of its building blocks
g
i
, just as a
neural network is more complex than its simpler constituents. In fa
ct, using
the theorem, we may associate complexity to the topology defined f
rom each
operator:
Definition 3.
The operator
f
is
more complex
than the operator
g
if
S
p
g
q Ď
S
p
f
q
.
So that a more complex operator induces a finer topology. The “mor
e
complex than” relation is, we think natural, but it induces an incomplet
e par-
tial order on operators, and will render some pairs of operators in
comparable.
We consider two ways of completing the complexity binary relations.
•
The
minimum number of weak orders
(MNWO) of an operator
f
is the smallest number
n
so there exists weak orders
ľ
1
, . . . ,
ľ
n
with
f
“ X
n
i
“
1
g
ľ
i
.
•
The
minimum number of binary classifiers
(MNBC) of an op-
erator
f
is the smallest number
n
so there exists subsets
C
1
, . . . , C
n
of
X
with
f
“ X
n
i
“
1
g
C
i
.
15
The MNWO has a natural interpretation. We may think of strategica
lly
rational operators as simple because they do not exhibit a prefere
nce for
flexibility. Or because they reflect a one-dimensional property (like
size,
color, being closer to the entrance of a supermarket). Their topo
logies are
chains, which are naturally represented using
k
́
1 labels when they are of
length
k
.
The MNBC may be though of as reflecting the length of the binary cod
e
needed to describe the topology of the closure operator. In fact
, we can
show that the minimum number of labels needed to describe a classifier
, as
a labeling correspondence, is exactly the same as minimum number of t
he
binary classifier needed.
Let
P
p
f
q
by the elements of the topology
S
p
f
q
that are not the intersec-
tion of other closed sets in
S
p
f
q
, and let
B
p
f
q
be the elements in
P
p
f
q
other
than
H
and
X
.
Proposition 6.
Let
f
be a closure operator, then the MNWO of
f
is equal
to the width of
P
p
f
q
, and the MNBC is equal to the cardinality of
B
p
f
q
.
6 Application 1: complexity of classifiers
In the application to classifiers, we may think of binary classifers
g
C
as cap-
turing a single property that datapoints may or may not posses, wh
ich makes
them good candidates for the simplest possible classifers available, a
nd thus
suggest MNBC as a natural measure of complexity for a classifier.
In constrast, the strategically-rational operators
f
ľ
are analogous to the
linear classifers in Machine Learning. Indeed,
f
ľ
p
A
q “
H
p
ľ
, h
A
q
, a “discrete
halfspace,” and the topology associated to
f
is
S
p
f
ľ
q “ t
H
p
A, h
A
q|
A
Ď
X
u
.
Observe that
S
p
f
ľ
q
is a single chain with respect to set inclusion (in other
words, the lattice associated with
f
ľ
is a total order). Interestingly, the
reverse is also correct. If a lattice associated with a classifier
f
is a single
chain (total order), then it is generated by a single unique weak ord
er.
16
Proposition 7.
The lattice associated with a closure operator is a single
chain if and only if a single weak order generates it. Moreove
r, the weak
order generating the lattice is unique.
More generally, we say that
f
is a
simple classifier with length
k
,
whenever the number of closed sets (classes) generated by
f
,
|
S
p
f
q|
, is
k
`
1.
Notice that
S
p
f
q
always contains
H
and
X
. Hence, a simple classifier with
length
k
has
k
nonempty different classes.
Consider the classifier associated with Example 1. There are eight cla
sses
other than
H
. Let
L
“ t
Class1
, . . . ,
Class8
u
. Using the result of Proposi-
tion 2, one choice of labeling correspondence is
Φ
p
a
q “ t
Class1
,
Class5
,
Class6
,
Class7
,
Class8
u
Φ
p
b
q “ t
Class2
,
Class5
,
Class6
,
Class8
u
Φ
p
c
q “ t
Class3
,
Class7
,
Class8
u
Φ
p
d
q “ t
Class4
,
Class6
,
Class8
u
.
The above labeling correspondence is different that the original one
in the
example. However, they both have the same classifier with the same
set of
classes.
Remark.
Using the proof of Lemma 7, any simple classifier with length
k
is associated with a unique weak order with
k
different indifference classes.
Let
t
A
1
, . . . , A
k
u
be a partition of
X
into the indifference classes of a weak
order
ľ
with
x
ą
y
whenever
y
P
A
i
, x
P
A
j
such that
i
ă
j
. Then,
t
A
1
,
p
A
1
Y
A
2
q
, . . . ,
p
A
1
Y
A
2
Y
. . .
Y
A
k
qu
is the set of closed sets of a simple
classifier with length
k
. The reverse can be done in the same way.
Remark.
Consider a binary classifier
f
C
: 2
X
Ñ
2
X
. Assume that the cor-
responding topology is
S
p
f
q “ tH
, C, X
u
. The corresponding weak order
ľ
f
is
x
ľ
y
if and only if
x
P
A, y
P
X
z
A
. From the perspective of clas-
sification, there is no difference between
f
and another binary classifier
g
with
S
p
g
q “ tH
, X
z
C, X
u
. The weak order associated with
g
is the reverse
17
order of
ľ
f
. The classifier derived from the label “horse” achieves the same
purposes as one derived from “not a horse.”
Example 2.
By Theorem 3, any given classifier
f
may be generated by weak
orders
ľ
1
, . . . ,
ľ
k
. Now we use use Theorem 5 to define binary classifiers
t
g
1
, . . . , g
k
u
that generate
f
. One simple way is to notice that for any binary
classifier
g
i
,
S
p
g
i
q
should be a subset of
S
p
f
q
. Moreover, for any class
H ‰
A
P
S
p
f
q
and
x
R
A
, there should be one of
g
i
to separate
x
and
A
. Hence,
if we consider all binary classifiers
g
C
for every closed set
C
P
S
p
f
q
, both
requirements of Theorem 5 will be satisfied.
In Example 2, the representation uses
|
S
p
f
q| ́
1 binary classifiers and is
clearly not minimal. We shall see how to obtain a minimal representation
using binary classifiers, but first we consider strategically rational
classifers
of different length. One way to proceed (which we shall see is not opt
imal)
is to decompose
S
p
f
q
into chains:
Example 3.
Let
X
“ t
a, b, c
u
be a set of data points and
f
:
2
X
Ñ
2
X
be a closure operator with the set of closed sets
S
p
f
q “
tH
,
t
a
u
,
t
b
u
,
t
c
u
,
t
a, b
u
,
t
a, b, c
uu
. We consider the following three chains:
S
p
g
1
q “ tH
,
t
a
u
,
t
a, b
u
,
t
a, b, c
uu
S
p
g
2
q “ tH
,
t
b
u
,
t
a, b
u
,
t
a, b, c
uu
S
p
g
3
q “ tH
,
t
c
u
,
t
a, b, c
uu
Notice that, since both conditions of Theorem 5 are satisfied, then
t
g
1
, g
2
, g
3
u
generates
f
. Figure 2 illustrates the decomposition.
We take up a discussion of minimal representations in Section 8
18
t
a, b, c
u
t
a, b
u
t
a
u
t
b
u
t
c
u
H
t
a, b, c
u
t
a, b, c
u
t
a, b, c
u
t
a, b
u
t
a, b
u
t
a
u
t
b
u
t
c
u
H
H
H
Figure 2: The lattice associated with the closure operator
f
is in the top.
The associated decomposition into
g
1
, g
2
, g
3
is in the bottom.
19
7 Application 2: Choice over menus
Our second application is to the choice of menus,
A
Ď
X
. In decision theory,
the choice over menus are thought of as capturing the general pr
oblem of
dynamic choice in an abstract setting. An agent is making choices tod
ay that
may constrain her future choices. To this end, suppose that
İ
, a preference
relation over 2
X
, captures choices made over menus. The indifference relation
derived from
İ
is denoted by
≈
, so that
A
≈
B
when
A
İ
B
and
B
İ
A
.
Kreps, 1979 introduced the idea that an agent may have a prefere
nce for
flexibility due to uncertainty about her ultimate future choices. He p
roposed
some simple axioms on
≈
, and proved that they give rise to a particular kind
of utility representation; a representation that reflects the age
nts’ uncertainty
over a state space that guides her future choices.
The literature on menu choice in economics is substantial, but for our
purposes we want to highlight the work of Dekel, Lipman, and Rustich
ini,
2001 because their theory of choice hinges crucially on adopting a pa
rticular
closure operator. In a setting of choice over lotteries, they use t
he convex
hull operator (which is a closure operator in the Euclidean setting of
their
paper). Kreps’ result also uses a closure operator in constructin
g his state
space, but his axioms do not explicitly reference the closure operat
or.
Following Kreps, 1979, we entertain the following axioms on
İ
:
1. Desire for flexibility:
B
Ď
A
implies
A
İ
B
,
2. Ordinal submodularity:
A
≈
A
Y
B
implies that for all
C
,
A
Y
C
≈
A
Y
B
Y
C
.
There are two possible approaches. The first is to define, as in Krep
s,
1979, the function
f
: 2
X
Ñ
2
X
from preference
İ
by
f
p
A
q “
ď
B
P
2
X
, A
≈
A
Y
B
B.
(3)
20
The second approach is proposed by Dekel, Lipman, and Rustichini, 2
001,
using a convex-hull operator over subsets (menus) of a Euclidean
space. The
key property in this approach is that a closure operator is given so t
hat
A
≈
f
p
A
q
for all menus
A
.
Definition 4.
İ
respects
f
if
A
≈
f
p
A
q
for every menu
A
P
2
X
.
If we follow the first approach then it is easy to show (see Kreps, 19
79)
that, under the two axioms,
1.
f
is a closure operator,
2.
İ
respects
f
,
3.
A
≈
A
Y
B
if and only if
f
p
B
q Ď
f
p
A
q
,
4.
f
p
B
q Ă
f
p
A
q
, then
A
ą
1
B
.
In consequence, we obtain the following version of Kreps’ result:
Theorem 8.
Let
İ
be a preference relation over
2
X
that satisfies desire for
flexibility and ordinal submodularity, and let
f
be defined using Equation
(3)
.
Then there is a function
U
:
X
ˆ
S
p
f
q Ñ
R
, and a strictly increasing function
u
:
R
S
Ñ
R
such that
u
pr
max
a
P
A
U
p
a, s
qs
s
P
S
q
(4)
represents
İ
. The minimum number of states (cardinality of
S
p
f
q
) needed
for the representation is precisely the MNWO of the associat
ed operator
f
,
which is
|
P
p
f
q|
.
If we instead adopt a given closure operator
f
, we obtain an analogue to
the result in Dekel, Lipman, and Rustichini, 2001:
Theorem 9.
Suppose given a closure operator
f
, and a preference
İ
over
2
X
that respects
f
. Then there exist a state space
S
, where
S
“
S
`
Y
S
́
with
21
S
`
X
S
́
“ H
and has cardinality at most
2
p|
S
p
f
q| ́
1
q
, and a state-dependent
utility
U
:
X
ˆ
S
Ñ
R
, such that:
U
p
A
q “
ÿ
s
P
S
`
max
a
P
A
U
p
a, s
q ́
ÿ
s
P
S
́
max
a
P
A
U
p
a, s
q
(5)
represents
İ
.
Theorem 9 presumes a closure operator that respects the prefe
rence
İ
,
but is otherwise arbitrary. Dekel, Lipman, and Rustichini, 2001, wor
king
in a space of lotteries, impose that a preference respects the con
vex-hull
operator, mapping each set in a Euclidean space into its convex hull (
this is
indeed a closure operator in the Euclidean case). Our theorem show
s that
the convexity properties of lotteries are not needed. Once we ado
pt the
framework in our paper, the same ideas may be extended to the pur
ely finite
and discrete setting of our paper.
Now, any preference relation is respected by the identity operato
r, which
gives rise to the following consequence of our result:
Corollary 10.
For every preference ordering
İ
over the set of menus, there
exists a representation as in Equation 5 with at most
2
ˆ p
2
|
X
|
́
1
q
states.
Remark.
The representation constructed in the proof of Theorem 9 is not a
minimal additive representation. For example, consider a one-to-o
ne utility
function
U
:
X
Ñ
R
, which induces a preference ordering
İ
over the set of
menus by means of
A
İ
B
if and only if max
t
U
p
x
q
:
x
P
A
u ě
max
t
U
p
x
q
:
x
P
B
u
.
It should be clear that the associated strategically rational opera
tor has a
topology of size
X
. Our construction in Proposition 9, however, generates a
representation with at least 2
p|
X
| ́
1
q
subjective states.
22
t
a, b, c, d
u
t
a
u
t
b
u
H
t
a, b, c, d
u
t
a, b
u
t
a
u
H
Figure 3: The lattices associated with the closure operator
f
1
(the left one)
and
f
2
.
8 Discussion
We discuss the ideas behind our definitions of complexity through a se
ries
of examples. This discussion will also lead up to a proof of the minimal
numbers of weak orders, and binary classifiers, needed to repres
ent a given
closure operator.
We start with a simple example that illustrates the definitions in the
paper:
Example 4.
Let
X
“ t
a, b, c, d
u
be the set of data points. Consider the
classifiers
f
1
, f
2
defined by:
S
p
f
1
q “ tH
,
t
a
u
,
t
b
u
,
t
a, b, c, d
uu
S
p
f
2
q “ tH
,
t
a
u
,
t
a, b
u
,
t
a, b, c, d
uu
Figure 3, illustrates the underlying lattice structures. The topolog
y
S
p
f
1
q
has depth two and width two, while
S
p
f
2
q
has depth three and width one.
The number of nonempty classes that
f
1
or
f
2
can detect is three. The
MNWO of
f
1
is two, while that of
f
2
is one. The MNBC of both
f
1
and
f
2
is two.
23