Risk-Sensitive Online Algorithms
Nicolas Christianson
Caltech
nchristi@caltech.edu
Bo Sun
University of Waterloo
bo.sun@uwaterloo.ca
Steven Low
Caltech
slow@caltech.edu
Adam Wierman
Caltech
adamw@caltech.edu
ABSTRACT
We study the design of
risk-sensitive online algorithms
, in
which risk measures are used in the competitive analysis of
randomized online algorithms. We introduce the CVaR
δ
-
competitive ratio (
δ
-CR) using the conditional value-at-risk
of an algorithm’s cost, which measures the expectation of the
(1
−
δ
)-fraction of worst outcomes against the o
✏
ine optimal
cost, and use this measure to study three online optimiza-
tion problems: continuous-time ski rental, discrete-time ski
rental, and one-max search. The structure of the optimal
δ
-CR and algorithm varies significantly between problems:
we prove that the optimal
δ
-CR for continuous-time ski
rental is 2
−
2
−
⇥
(
1
1
)
, obtained by an algorithm described
by a delay di
↵
erential equation. In contrast, in discrete-
time ski rental with buying cost
B
, there is an abrupt phase
transition at
δ
=1
−
⇥
(
1
log
B
), after which the classic de-
terministic strategy is optimal. Similarly, one-max search
exhibits a phase transition at
δ
=
1
2
, after which the classic
deterministic strategy is optimal; we also obtain an algo-
rithm that is asymptotically optimal as
δ
#
0 that arises as
the solution to a delay di
↵
erential equation.
1 Introduction
Randomness can improve decision-making performance in
many online problems; for instance, randomization improves
the competitive ratio of online ski rental from 2 to
e
e
−
1
[10]
and of online search from polynomial to logarithmic in the
fluctuation ratio [
6]. However, this improved performance
can only be obtained on average over multiple problem in-
stances, as a randomized algorithm can vary wildly in its
performance on any particular run, which may be undesir-
able if an agent is sensitive to risks of a particular size or
likelihood. Fields such as economics and finance have fielded
research on risk aversion and alternative
risk measures
that
enable modifying decision-making objectives to accommo-
date these risk preferences (e.g., [
11, 2]). One of the most
well-studied risk measures in recent years, due to its nice
properties (as a
coherent
risk measure) and computational
tractability, is the
conditional value-at-risk
(CVaR
δ
), which
measures the expectation of a random loss/reward on its
(1
−
δ
)-fraction of worst outcomes [
12]. CVaR
δ
and other
risk measures have seen wide application across domains and
have been studied as an objective in place of the expectation
in various online learning problems (e.g., [
7, 3, 13]).
Copyright is held by author/owner(s).
Despite the significant extent of literature on risk-sensitive
algorithms for online learning, there has been no work on the
design and analysis of
competitive
algorithms for online op-
timization problems like ski rental, online search, knapsack,
function chasing, or metrical task systems with risk-sensitive
objectives; the closest related work is a recent paper that
studies ski rental with risk
constraints
[5 ]. Thus we ask:
how can we design competitive online algorithms when we
care about the CVaR
δ
of the cost/reward, and what are the
optimal competitive ratios for di
↵
erent problems?
In this work, we begin to work toward answering this
question, studying risk sensitivity in competitive online al-
gorithms for online optimization. In particular, we focus on
two of the prototypical problems in online optimization:
ski
rental
, which encapsulates the fundamental “rent vs. buy”
tradeo
↵
inherent in online optimization with switching costs,
and
one-max search
, which exhibits a complementary “ac-
cept vs. wait” tradeo
↵
fundamental to constrained online
optimization. While these problems are both simple to pose,
they have applications to more complex problems including
TCP acknowledgement and bu
↵
ering problems [
9, 8], and
they reflect crucial components of the di
ffi
culty of more com-
plicated online optimization problems. They thus serve as
ideal analytic testbeds for investigating the design of risk-
sensitive algorithms in online optimization.
In this extended abstract, we give an overview of our
main results on risk-sensitive algorithms for continuous- and
discrete-time ski rental and one-max search, discussing al-
gorithms, optimality, and phase transitions exhibited by the
problems. We refer the reader to the full version of the paper
for further details and proofs [
4].
2 Preliminaries
In this section, we introduce the conditional value-at-risk
and the three online problems studied in this work.
2.1 The Conditional Value-at-Risk
A
risk measure
is a mapping from the set of
R
-valued ran-
dom variables to
R
that gives a deterministic valuation of
the
risk
associated with a particular random loss. One of the
most well-studied risk measures in recent years is the
condi-
tional value-at-risk
(CVaR), defined as the expectation of a
random variable
X
on its (1
−
δ
)-fraction of worst outcomes.
It is defined formally as follows.
Definition 2.1 (Conditional Value-at-Risk).
Let
X
be
a real-valued random loss variable with inverse CDF
F
−
1
X
.
6
Performance
Evaluation
Review,
Vol.
52,
No.
2, September
2024
Its conditional value-at-risk
CVaR
δ
[
X
]
is defined as [
1]:
CVaR
δ
[
X
]=
1
1
−
δ
Z
1
δ
F
−
1
X
(
p
)d
p.
This definition, which is one of several ways of expressing
CVaR
δ
, highlights the intuition that CVaR
δ
[
X
] computes
the expected loss of
X
on the largest (1
−
δ
)-fraction of out-
comes in its distribution. For random rewards, CVaR
δ
is
defined similarly as the expectation on the smallest, rather
than largest, (1
−
δ
)-fraction of outcomes. Note that CVaR
0
[
X
]=
E
[
X
] and CVaR
1
[
X
]
:
= lim
δ
"
1
CVaR
δ
[
X
] = ess sup
X
.
2.2 Competitive Analysis
In the study of online algorithms, algorithm performance is
typically measured via the
competitive ratio
, or the worst
case ratio in (expected) cost between an algorithm and the
o
✏
ine optimal strategy that knows all uncertainty in ad-
vance. In this work, we introduce a modified version of the
competitive ratio for randomized algorithms that goes be-
yond expected performance: instead, we penalize a random-
ized algorithm via the ratio between the conditional value-
at-risk of its cost and the o
✏
ine optimal algorithm’s cost,
terming this metric the CVaR
δ
-competitive ratio
(
δ
-CR).
Definition 2.2 (CVaR
-Competitive Ratio).
Consider
an online problem with uncertainty drawn adversarially from
a set of instances
I
. Let
Alg
be a randomized algorithm,
and let
Opt
betheo
✏
ine optimal algorithm. The
CVaR
-
Competitive Ratio (
-
CR
)
is defined as the worst-case
ratio between the
CVaR
δ
of
Alg
’s cost and the o
✏
ine opti-
mal cost:
δ
-
CR(
Alg
)
:
= sup
I
2
I
CVaR
δ
[
Alg
(
I
)]
Opt
(
I
)
,
where the
CVaR
δ
is taken over
Alg
’s randomness.
Note that the
δ
-CR can be viewed as an interpolation
between the classic randomized case where the adversary has
no power over
Alg
’s randomness and
Alg
pays its expected
cost (
δ
= 0), and the case where the adversary has full
control over
Alg
’s randomness, so
Alg
pays the worst cost
in its support and determinism is optimal (
δ
= 1).
2.3 Online Problems Studied
We now provide brief descriptions of the three problems
studied in this work.
Continuous-Time Ski Rental.
In the
continuous-time
ski rental (CSR)
problem, a player faces a ski season of
unknown and adversarially-chosen duration
s
2
R
++
, and
must choose how long to rent skis before purchasing them.
In particular, the player pays cost equal to the duration of
renting, and cost 1 (without loss of generality) for purchasing
the skis. Deterministic algorithms for ski rental are wholly
determined by the day
x
2
R
++
on which the player stops
renting and purchases the skis: an algorithm that rents until
day
x
and then purchases pays cost
s
·
1
x>s
+(
x
+ 1)
·
1
x
s
.
Randomized algorithms can be described by a random vari-
able
X
over purchase days, in which case the algorithm pays
(random) cost
s
·
1
X>s
+(
X
+1)
·
1
X
s
. Given knowledge of
the total number of skiing days
s
, the o
✏
ine optimal strat-
egy is to rent for the entire season if
s<
1, incurring cost
s
,
and to buy immediately otherwise, yielding cost 1. Defining
↵
CSR
,μ
δ
as the
δ
-CR of a strategy
X
⇠
μ
, we have
↵
CSR
,μ
δ
:
= sup
s
2
R
++
CVaR
δ
[
s
·
1
X>s
+(
X
+ 1)
·
1
X
s
]
min
{
s,
1
}
.
We denote by
↵
CSR
,
⇤
δ
the smallest
δ
-CR of any strategy. It is
well known that
↵
CSR
,
⇤
1
= 2, which is achieved by purchasing
skis deterministically at time 1, and
↵
CSR
,
⇤
0
=
e
e
−
1
[10].
Discrete-Time Ski Rental.
In the
discrete-time ski rental
(DSR)
problem, a player faces a ski season of unknown and
adversarially-chosen duration
s
2
N
and must choose an in-
teger number of days to rent skis before purchasing them;
renting for a day costs 1, and purchasing skis has an integer
cost
B
≥
2. The cost structure is essentially identical to
the continuous-time case, except the algorithm’s and adver-
sary’s decisions are restricted to lie in
N
: if a player buys
skis at the start of day
x
2
N
and the true season duration
is
s
2
N
, their cost will be
s
·
1
x>s
+(
B
+
x
−
1)
·
1
x
s
. Thus
for a random strategy
X
⇠
μ
with support on
N
, the
δ
-CR
is defined as
↵
DSR(
B
)
,μ
δ
:
= sup
s
2
N
CVaR
δ
[
s
·
1
X>s
+(
B
+
X
−
1)
·
1
X
s
]
min
{
s, B
}
.
As in the continuous-time setting, we denote by
↵
DSR(
B
)
,
⇤
δ
the smallest
δ
-CR of any strategy. It is well known that
↵
DSR(
B
)
,
⇤
1
=2
−
1
B
, achieved by deterministically purchasing
skis at the start of day
B
, and
↵
DSR(
B
)
,
⇤
0
=
1
1
−
(
1
−
B
1
)
B
,
which approaches
↵
CSR
,
⇤
0
=
e
e
−
1
as
B
!1
.
One-Max Search.
In the
one-max search (OMS)
problem,
a player faces a sequence of prices
v
t
2
[
L, U
] arriving online,
with
U
≥
L>
0 known upper and lower bounds on the price
sequence; the
fluctuation ratio
✓
=
U
L
is defined as the ratio
between these bounds. The player seeks to sell an indivisible
item for the greatest possible price; after observing a price
v
t
, they can choose to either accept the price and earn profit
v
t
, or to wait and observe the next price. The duration
T
2
N
of the sequence is
a priori
unknown to the player; if
T
elapses and the player has not yet sold the item, they sell
it for the smallest possible price
L
in a compulsory trade.
An algorithm that sells the item at the first price
≥
x
will
earn profit, in the worst case,
L
·
1
x>v
+
x
·
1
x
v
when the
true maximum price in the sequence is
v
. Thus the
δ
-CR of
a “random threshold” algorithm that accepts the first price
meeting or exceeding some random value
X
⇠
μ
is defined:
↵
OMS(
✓
)
,μ
δ
:
= sup
v
2
[
L,U
]
v
CVaR
δ
[
L
·
1
X>v
+
X
·
1
X
v
]
;
Note that in this definition, we take the ratio between the
o
✏
ine optimal cost and the CVaR
δ
of the algorithm’s cost,
since we are maximizing profit rather than minimizing cost.
Similarly, the CVaR
δ
definition employed here is the reward
form; see our full paper [
4] for full details. We denote by
↵
OMS(
✓
)
,
⇤
δ
the optimal
δ
-CR for the problem; it is known
that
↵
OMS(
✓
)
,
⇤
1
=
p
✓
and
↵
OMS(
✓
)
,
⇤
0
=1+
W
0
�
✓
−
1
e
�
, where
W
0
is the principal branch of the Lambert
W
function [
6].
3 Results
In this section, we describe our algorithmic upper bounds
and lower bounds for each of the three problems studied.
Continuous-Time Ski Rental.
For the CSR problem,
the optimal algorithm arises as the solution to a delay dif-
ferential equation.
Performance
Evaluation
Review,
Vol.
52,
No.
2, September
2024
7
Theorem 3.1.
For any
δ
2
[0
,
1)
, let
φ
: [0
,
1]
!
[0
,
1]
be
the solution to the following delay di
↵
erential equation:
φ
0
(
t
)=
1
↵
(1
−
δ
)
[
φ
(
t
)
−
φ
(
t
−
(1
−
δ
))]
for
t
2
[1
−
δ
,
1]
,
with initial condition
φ
(
t
) = log
⇣
1+
t
(
↵
−
1)(1
−
δ
)
⌘
on
t
2
[0
,
1
−
δ
]
. Then when
↵
=
↵
CSR
,
⇤
δ
,
φ
is the inverse CDF
of the unique optimal strategy for continuous-time ski rental
with the
δ
-
CR
metric. Moreover,
↵
CSR
,
⇤
δ
=2
−
2
−
⇥
(
1
1
)
as
δ
"
1
.
Proving this result requires the insight that the inverse CDF
is more tractably analyzed than the PDF when considering
the
δ
-CR, and in addition depends on a set of structural re-
sults characterizing the optimal algorithm’s indi
↵
erence to
the adversary’s decision.
Discrete-Time Ski Rental.
The DSR problem exhibits
a remarkably di
↵
erent structure than the CSR problem: in
particular, there is a phase transition at
δ
=1
−
⇥
(
1
log
B
)
marking a transition from a regime where randomness im-
proves performance to one in which determinism is optimal.
Theorem 3.2.
Let
↵
DSR
(
B
)
,
⇤
δ
be the optimal
δ
-
CR
for discrete-
time ski rental with buying cost
B
2
N
. Then
↵
DSR
(
B
)
,
⇤
δ
ex-
hibits a
phase transition
at
δ
=1
−
⇥
(
1
log
B
)
, whereby before
this transition,
↵
DSR
(
B
)
,
⇤
δ
strictly improves on the determin-
istic optimal
δ
-
CR
of
2
−
1
B
, whereas after this transition,
↵
DSR
(
B
)
,
⇤
δ
=2
−
1
B
.
Moreover, in the discrete setting, the optimal algorithm
for
δ
= 0 remains optimal for su
ffi
ciently small
δ
.
Theorem 3.3.
Suppose
δ
=
O
(
1
B
)
. Then the optimal
δ
-
CR
↵
DSR
(
B
)
,
⇤
δ
and strategy
p
B,
δ
,
⇤
for discrete-time ski rental
with buying cost
B
are
↵
B,
⇤
δ
=
C
−
δ
1
−
δ
and
p
B,
δ
,
⇤
i
=
C
B
✓
1
−
1
B
◆
B
−
i
for all
i
2
[
B
]
, where
C
=
1
1
−
(1
−
1
/B
)
B
is the optimal com-
petitive ratio for the
δ
=0
case. In particular,
p
B,
δ
,
⇤
is
identical to the optimal algorithm for the
δ
=0
setting.
One-Max Search
Finally, we show that OMS admits an
algorithm that arises as the solution to a delay di
↵
erential
equation, just like CSR. Like DSR, though, OMS also ex-
hibits a phase transition at
δ
=
1
2
, after which randomness
does not help and determinism is optimal.
Theorem 3.4.
Let
δ
2
[0
,
1]
, and let
φ
: [0
,
1]
!
[
L, U
]
be
the solution to the following delay di
↵
erential equation:
φ
0
(
t
)=
↵
1
−
δ
[
φ
(
t
−
δ
)
−
L
]
for
t
2
[
δ
,
1]
,
with initial condition
φ
(
t
)=
↵
L
on
t
2
[0
,
δ
]
, where
↵
is
chosen such that
φ
(1) =
U
when
δ
<
1
, and
↵
:
=
p
✓
when
δ
=1
. Then
φ
is the inverse CDF of a random threshold
algorithm for one-max search with
δ
-
CR
↵
. In particular,
↵
is upper bounded as:
↵
(
1+
W
0
�
✓
−
1
e
�
+
O
(
δ
)
as
δ
#
0
p
✓
when
δ
>
1
5
.
Moreover, there is an identical lower bound of
↵
OMS
(
✓
)
,
⇤
δ
≥
p
✓
=
↵
OMS
(
✓
)
,
⇤
1
when
δ
≥
1
2
and an asymptotically identical
lower bound of
↵
OMS
(
✓
)
,
⇤
δ
=1+
W
0
�
✓
−
1
e
�
+
⌦
(
δ
)
as
δ
#
0
.
4Acknowledgments
The authors acknowledge support from an NSF Graduate
Research Fellowship (DGE-2139433), NSF Grants CNS-2146814,
CPS-2136197, CNS-2106403, and NGSDI-2105648, the Resnick
Sustainability Institute, and a Caltech S2I Grant.
References
[1] C. Acerbi and D. Tasche. On the coherence of
expected shortfall.
Journal of Banking & Finance
,
26(7):1487–1503, July 2002.
[2] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath.
Coherent Measures of Risk.
Mathematical Finance
,
9(3):203–228, July 1999.
[3] Y. Chow and M. Ghavamzadeh. Algorithms for CVaR
Optimization in MDPs. In
Advances in Neural
Information Processing Systems
, volume 27. Curran
Associates, Inc., 2014.
[4] N. Christianson, B. Sun, S. Low, and A. Wierman.
Risk-Sensitive Online Algorithms, 2024.
arXiv:2405.09859.
[5] M. Dinitz, S. Im, T. Lavastida, B. Moseley, and
S. Vassilvitskii. Controlling Tail Risk in Online
Ski-Rental. In
Proceedings of the 2024 Annual
ACM-SIAM Symposium on Discrete Algorithms
(SODA)
, pages 4247–4263, 2024.
[6] R. El-Yaniv, A. Fiat, R. M. Karp, and G. Turpin.
Optimal Search and One-Way Trading Online
Algorithms.
Algorithmica
, 30(1):101–139, May 2001.
[7] E. Even-Dar, M. Kearns, and J. Wortman.
Risk-Sensitive Online Learning. In J. L. Balc ́azar,
P. M. Long, and F. Stephan, editors,
Algorithmic
Learning Theory
, pages 199–213, Berlin, Heidelberg,
2006. Springer.
[8] S. Geulen, B. Vocking, and M. Winkler. Regret
Minimization for Online Bu
↵
ering Problems Using the
Weighted Majority Algorithm. In
23rd Annual
Conference on Learning Theory
, 2010.
[9] A. R. Karlin, C. Kenyon, and D. Randall. Dynamic
TCP acknowledgement and other stories about
e/(e-1). In
Proceedings of the Thirty-Third Annual
ACM Symposium on Theory of Computing
, STOC ’01,
pages 502–509, New York, NY, USA, July 2001.
Association for Computing Machinery.
[10] A. R. Karlin, M. S. Manasse, L. A. McGeoch, and
S. Owicki. Competitive randomized algorithms for
nonuniform problems.
Algorithmica
, 11(6):542–571,
June 1994.
[11] H. M. Markowitz.
Portfolio Selection: E
ffi
cient
Diversification of Investments
. Yale University Press,
1959.
[12] R. T. Rockafellar and S. Uryasev. Conditional
value-at-risk for general loss distributions.
Journal of
Banking & Finance
, 26(7):1443–1471, July 2002.
[13] A. Tamkin, C. Dann, R. Keramati, and E. Brunskill.
Distributionally-Aware Exploration for CVaR
Bandits. In
NeurIPS 2019 Workshop on Safety and
Robustness on Decision Making
, 2019.
8
Performance
Evaluation
Review,
Vol.
52,
No.
2, September
2024