of 3
Risk-Sensitive Online Algorithms
Nicolas Christianson
Caltech
nchristi@caltech.edu
Bo Sun
University of Waterloo
bo.sun@uwaterloo.ca
Steven Low
Caltech
slow@caltech.edu
Adam Wierman
Caltech
adamw@caltech.edu
ABSTRACT
We study the design of
risk-sensitive online algorithms
, in
which risk measures are used in the competitive analysis of
randomized online algorithms. We introduce the CVaR
δ
-
competitive ratio (
δ
-CR) using the conditional value-at-risk
of an algorithm’s cost, which measures the expectation of the
(1
δ
)-fraction of worst outcomes against the o
ine optimal
cost, and use this measure to study three online optimiza-
tion problems: continuous-time ski rental, discrete-time ski
rental, and one-max search. The structure of the optimal
δ
-CR and algorithm varies significantly between problems:
we prove that the optimal
δ
-CR for continuous-time ski
rental is 2
2
(
1
1


)
, obtained by an algorithm described
by a delay di
erential equation. In contrast, in discrete-
time ski rental with buying cost
B
, there is an abrupt phase
transition at
δ
=1
(
1
log
B
), after which the classic de-
terministic strategy is optimal. Similarly, one-max search
exhibits a phase transition at
δ
=
1
2
, after which the classic
deterministic strategy is optimal; we also obtain an algo-
rithm that is asymptotically optimal as
δ
#
0 that arises as
the solution to a delay di
erential equation.
1 Introduction
Randomness can improve decision-making performance in
many online problems; for instance, randomization improves
the competitive ratio of online ski rental from 2 to
e
e
1
[10]
and of online search from polynomial to logarithmic in the
fluctuation ratio [
6]. However, this improved performance
can only be obtained on average over multiple problem in-
stances, as a randomized algorithm can vary wildly in its
performance on any particular run, which may be undesir-
able if an agent is sensitive to risks of a particular size or
likelihood. Fields such as economics and finance have fielded
research on risk aversion and alternative
risk measures
that
enable modifying decision-making objectives to accommo-
date these risk preferences (e.g., [
11, 2]). One of the most
well-studied risk measures in recent years, due to its nice
properties (as a
coherent
risk measure) and computational
tractability, is the
conditional value-at-risk
(CVaR
δ
), which
measures the expectation of a random loss/reward on its
(1
δ
)-fraction of worst outcomes [
12]. CVaR
δ
and other
risk measures have seen wide application across domains and
have been studied as an objective in place of the expectation
in various online learning problems (e.g., [
7, 3, 13]).
Copyright is held by author/owner(s).
Despite the significant extent of literature on risk-sensitive
algorithms for online learning, there has been no work on the
design and analysis of
competitive
algorithms for online op-
timization problems like ski rental, online search, knapsack,
function chasing, or metrical task systems with risk-sensitive
objectives; the closest related work is a recent paper that
studies ski rental with risk
constraints
[5 ]. Thus we ask:
how can we design competitive online algorithms when we
care about the CVaR
δ
of the cost/reward, and what are the
optimal competitive ratios for di
erent problems?
In this work, we begin to work toward answering this
question, studying risk sensitivity in competitive online al-
gorithms for online optimization. In particular, we focus on
two of the prototypical problems in online optimization:
ski
rental
, which encapsulates the fundamental “rent vs. buy”
tradeo
inherent in online optimization with switching costs,
and
one-max search
, which exhibits a complementary “ac-
cept vs. wait” tradeo
fundamental to constrained online
optimization. While these problems are both simple to pose,
they have applications to more complex problems including
TCP acknowledgement and bu
ering problems [
9, 8], and
they reflect crucial components of the di
ffi
culty of more com-
plicated online optimization problems. They thus serve as
ideal analytic testbeds for investigating the design of risk-
sensitive algorithms in online optimization.
In this extended abstract, we give an overview of our
main results on risk-sensitive algorithms for continuous- and
discrete-time ski rental and one-max search, discussing al-
gorithms, optimality, and phase transitions exhibited by the
problems. We refer the reader to the full version of the paper
for further details and proofs [
4].
2 Preliminaries
In this section, we introduce the conditional value-at-risk
and the three online problems studied in this work.
2.1 The Conditional Value-at-Risk
A
risk measure
is a mapping from the set of
R
-valued ran-
dom variables to
R
that gives a deterministic valuation of
the
risk
associated with a particular random loss. One of the
most well-studied risk measures in recent years is the
condi-
tional value-at-risk
(CVaR), defined as the expectation of a
random variable
X
on its (1
δ
)-fraction of worst outcomes.
It is defined formally as follows.
Definition 2.1 (Conditional Value-at-Risk).
Let
X
be
a real-valued random loss variable with inverse CDF
F
1
X
.
6
Performance
Evaluation
Review,
Vol.
52,
No.
2, September
2024
Its conditional value-at-risk
CVaR
δ
[
X
]
is defined as [
1]:
CVaR
δ
[
X
]=
1
1
δ
Z
1
δ
F
1
X
(
p
)d
p.
This definition, which is one of several ways of expressing
CVaR
δ
, highlights the intuition that CVaR
δ
[
X
] computes
the expected loss of
X
on the largest (1
δ
)-fraction of out-
comes in its distribution. For random rewards, CVaR
δ
is
defined similarly as the expectation on the smallest, rather
than largest, (1
δ
)-fraction of outcomes. Note that CVaR
0
[
X
]=
E
[
X
] and CVaR
1
[
X
]
:
= lim
δ
"
1
CVaR
δ
[
X
] = ess sup
X
.
2.2 Competitive Analysis
In the study of online algorithms, algorithm performance is
typically measured via the
competitive ratio
, or the worst
case ratio in (expected) cost between an algorithm and the
o
ine optimal strategy that knows all uncertainty in ad-
vance. In this work, we introduce a modified version of the
competitive ratio for randomized algorithms that goes be-
yond expected performance: instead, we penalize a random-
ized algorithm via the ratio between the conditional value-
at-risk of its cost and the o
ine optimal algorithm’s cost,
terming this metric the CVaR
δ
-competitive ratio
(
δ
-CR).
Definition 2.2 (CVaR

-Competitive Ratio).
Consider
an online problem with uncertainty drawn adversarially from
a set of instances
I
. Let
Alg
be a randomized algorithm,
and let
Opt
betheo
ine optimal algorithm. The
CVaR

-
Competitive Ratio (

-
CR
)
is defined as the worst-case
ratio between the
CVaR
δ
of
Alg
’s cost and the o
ine opti-
mal cost:
δ
-
CR(
Alg
)
:
= sup
I
2
I
CVaR
δ
[
Alg
(
I
)]
Opt
(
I
)
,
where the
CVaR
δ
is taken over
Alg
’s randomness.
Note that the
δ
-CR can be viewed as an interpolation
between the classic randomized case where the adversary has
no power over
Alg
’s randomness and
Alg
pays its expected
cost (
δ
= 0), and the case where the adversary has full
control over
Alg
’s randomness, so
Alg
pays the worst cost
in its support and determinism is optimal (
δ
= 1).
2.3 Online Problems Studied
We now provide brief descriptions of the three problems
studied in this work.
Continuous-Time Ski Rental.
In the
continuous-time
ski rental (CSR)
problem, a player faces a ski season of
unknown and adversarially-chosen duration
s
2
R
++
, and
must choose how long to rent skis before purchasing them.
In particular, the player pays cost equal to the duration of
renting, and cost 1 (without loss of generality) for purchasing
the skis. Deterministic algorithms for ski rental are wholly
determined by the day
x
2
R
++
on which the player stops
renting and purchases the skis: an algorithm that rents until
day
x
and then purchases pays cost
s
·
1
x>s
+(
x
+ 1)
·
1
x
s
.
Randomized algorithms can be described by a random vari-
able
X
over purchase days, in which case the algorithm pays
(random) cost
s
·
1
X>s
+(
X
+1)
·
1
X
s
. Given knowledge of
the total number of skiing days
s
, the o
ine optimal strat-
egy is to rent for the entire season if
s<
1, incurring cost
s
,
and to buy immediately otherwise, yielding cost 1. Defining
CSR
δ
as the
δ
-CR of a strategy
X
μ
, we have
CSR
δ
:
= sup
s
2
R
++
CVaR
δ
[
s
·
1
X>s
+(
X
+ 1)
·
1
X
s
]
min
{
s,
1
}
.
We denote by
CSR
,
δ
the smallest
δ
-CR of any strategy. It is
well known that
CSR
,
1
= 2, which is achieved by purchasing
skis deterministically at time 1, and
CSR
,
0
=
e
e
1
[10].
Discrete-Time Ski Rental.
In the
discrete-time ski rental
(DSR)
problem, a player faces a ski season of unknown and
adversarially-chosen duration
s
2
N
and must choose an in-
teger number of days to rent skis before purchasing them;
renting for a day costs 1, and purchasing skis has an integer
cost
B
2. The cost structure is essentially identical to
the continuous-time case, except the algorithm’s and adver-
sary’s decisions are restricted to lie in
N
: if a player buys
skis at the start of day
x
2
N
and the true season duration
is
s
2
N
, their cost will be
s
·
1
x>s
+(
B
+
x
1)
·
1
x
s
. Thus
for a random strategy
X
μ
with support on
N
, the
δ
-CR
is defined as
DSR(
B
)
δ
:
= sup
s
2
N
CVaR
δ
[
s
·
1
X>s
+(
B
+
X
1)
·
1
X
s
]
min
{
s, B
}
.
As in the continuous-time setting, we denote by
DSR(
B
)
,
δ
the smallest
δ
-CR of any strategy. It is well known that
DSR(
B
)
,
1
=2
1
B
, achieved by deterministically purchasing
skis at the start of day
B
, and
DSR(
B
)
,
0
=
1
1
(
1
B

1
)
B
,
which approaches
CSR
,
0
=
e
e
1
as
B
!1
.
One-Max Search.
In the
one-max search (OMS)
problem,
a player faces a sequence of prices
v
t
2
[
L, U
] arriving online,
with
U
L>
0 known upper and lower bounds on the price
sequence; the
fluctuation ratio
=
U
L
is defined as the ratio
between these bounds. The player seeks to sell an indivisible
item for the greatest possible price; after observing a price
v
t
, they can choose to either accept the price and earn profit
v
t
, or to wait and observe the next price. The duration
T
2
N
of the sequence is
a priori
unknown to the player; if
T
elapses and the player has not yet sold the item, they sell
it for the smallest possible price
L
in a compulsory trade.
An algorithm that sells the item at the first price
x
will
earn profit, in the worst case,
L
·
1
x>v
+
x
·
1
x
v
when the
true maximum price in the sequence is
v
. Thus the
δ
-CR of
a “random threshold” algorithm that accepts the first price
meeting or exceeding some random value
X
μ
is defined:
OMS(
)
δ
:
= sup
v
2
[
L,U
]
v
CVaR
δ
[
L
·
1
X>v
+
X
·
1
X
v
]
;
Note that in this definition, we take the ratio between the
o
ine optimal cost and the CVaR
δ
of the algorithm’s cost,
since we are maximizing profit rather than minimizing cost.
Similarly, the CVaR
δ
definition employed here is the reward
form; see our full paper [
4] for full details. We denote by
OMS(
)
,
δ
the optimal
δ
-CR for the problem; it is known
that
OMS(
)
,
1
=
p
and
OMS(
)
,
0
=1+
W
0
1
e
, where
W
0
is the principal branch of the Lambert
W
function [
6].
3 Results
In this section, we describe our algorithmic upper bounds
and lower bounds for each of the three problems studied.
Continuous-Time Ski Rental.
For the CSR problem,
the optimal algorithm arises as the solution to a delay dif-
ferential equation.
Performance
Evaluation
Review,
Vol.
52,
No.
2, September
2024
7
Theorem 3.1.
For any
δ
2
[0
,
1)
, let
φ
: [0
,
1]
!
[0
,
1]
be
the solution to the following delay di
erential equation:
φ
0
(
t
)=
1
(1
δ
)
[
φ
(
t
)
φ
(
t
(1
δ
))]
for
t
2
[1
δ
,
1]
,
with initial condition
φ
(
t
) = log
1+
t
(
1)(1
δ
)
on
t
2
[0
,
1
δ
]
. Then when
=
CSR
,
δ
,
φ
is the inverse CDF
of the unique optimal strategy for continuous-time ski rental
with the
δ
-
CR
metric. Moreover,
CSR
,
δ
=2
2
(
1
1


)
as
δ
"
1
.
Proving this result requires the insight that the inverse CDF
is more tractably analyzed than the PDF when considering
the
δ
-CR, and in addition depends on a set of structural re-
sults characterizing the optimal algorithm’s indi
erence to
the adversary’s decision.
Discrete-Time Ski Rental.
The DSR problem exhibits
a remarkably di
erent structure than the CSR problem: in
particular, there is a phase transition at
δ
=1
(
1
log
B
)
marking a transition from a regime where randomness im-
proves performance to one in which determinism is optimal.
Theorem 3.2.
Let
DSR
(
B
)
,
δ
be the optimal
δ
-
CR
for discrete-
time ski rental with buying cost
B
2
N
. Then
DSR
(
B
)
,
δ
ex-
hibits a
phase transition
at
δ
=1
(
1
log
B
)
, whereby before
this transition,
DSR
(
B
)
,
δ
strictly improves on the determin-
istic optimal
δ
-
CR
of
2
1
B
, whereas after this transition,
DSR
(
B
)
,
δ
=2
1
B
.
Moreover, in the discrete setting, the optimal algorithm
for
δ
= 0 remains optimal for su
ffi
ciently small
δ
.
Theorem 3.3.
Suppose
δ
=
O
(
1
B
)
. Then the optimal
δ
-
CR
DSR
(
B
)
,
δ
and strategy
p
B,
δ
,
for discrete-time ski rental
with buying cost
B
are
B,
δ
=
C
δ
1
δ
and
p
B,
δ
,
i
=
C
B
1
1
B
B
i
for all
i
2
[
B
]
, where
C
=
1
1
(1
1
/B
)
B
is the optimal com-
petitive ratio for the
δ
=0
case. In particular,
p
B,
δ
,
is
identical to the optimal algorithm for the
δ
=0
setting.
One-Max Search
Finally, we show that OMS admits an
algorithm that arises as the solution to a delay di
erential
equation, just like CSR. Like DSR, though, OMS also ex-
hibits a phase transition at
δ
=
1
2
, after which randomness
does not help and determinism is optimal.
Theorem 3.4.
Let
δ
2
[0
,
1]
, and let
φ
: [0
,
1]
!
[
L, U
]
be
the solution to the following delay di
erential equation:
φ
0
(
t
)=
1
δ
[
φ
(
t
δ
)
L
]
for
t
2
[
δ
,
1]
,
with initial condition
φ
(
t
)=
L
on
t
2
[0
,
δ
]
, where
is
chosen such that
φ
(1) =
U
when
δ
<
1
, and
:
=
p
when
δ
=1
. Then
φ
is the inverse CDF of a random threshold
algorithm for one-max search with
δ
-
CR
. In particular,
is upper bounded as:
(
1+
W
0
1
e
+
O
(
δ
)
as
δ
#
0
p
when
δ
>
1
5
.
Moreover, there is an identical lower bound of
OMS
(
)
,
δ
p
=
OMS
(
)
,
1
when
δ
1
2
and an asymptotically identical
lower bound of
OMS
(
)
,
δ
=1+
W
0
1
e
+
(
δ
)
as
δ
#
0
.
4Acknowledgments
The authors acknowledge support from an NSF Graduate
Research Fellowship (DGE-2139433), NSF Grants CNS-2146814,
CPS-2136197, CNS-2106403, and NGSDI-2105648, the Resnick
Sustainability Institute, and a Caltech S2I Grant.
References
[1] C. Acerbi and D. Tasche. On the coherence of
expected shortfall.
Journal of Banking & Finance
,
26(7):1487–1503, July 2002.
[2] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath.
Coherent Measures of Risk.
Mathematical Finance
,
9(3):203–228, July 1999.
[3] Y. Chow and M. Ghavamzadeh. Algorithms for CVaR
Optimization in MDPs. In
Advances in Neural
Information Processing Systems
, volume 27. Curran
Associates, Inc., 2014.
[4] N. Christianson, B. Sun, S. Low, and A. Wierman.
Risk-Sensitive Online Algorithms, 2024.
arXiv:2405.09859.
[5] M. Dinitz, S. Im, T. Lavastida, B. Moseley, and
S. Vassilvitskii. Controlling Tail Risk in Online
Ski-Rental. In
Proceedings of the 2024 Annual
ACM-SIAM Symposium on Discrete Algorithms
(SODA)
, pages 4247–4263, 2024.
[6] R. El-Yaniv, A. Fiat, R. M. Karp, and G. Turpin.
Optimal Search and One-Way Trading Online
Algorithms.
Algorithmica
, 30(1):101–139, May 2001.
[7] E. Even-Dar, M. Kearns, and J. Wortman.
Risk-Sensitive Online Learning. In J. L. Balc ́azar,
P. M. Long, and F. Stephan, editors,
Algorithmic
Learning Theory
, pages 199–213, Berlin, Heidelberg,
2006. Springer.
[8] S. Geulen, B. Vocking, and M. Winkler. Regret
Minimization for Online Bu
ering Problems Using the
Weighted Majority Algorithm. In
23rd Annual
Conference on Learning Theory
, 2010.
[9] A. R. Karlin, C. Kenyon, and D. Randall. Dynamic
TCP acknowledgement and other stories about
e/(e-1). In
Proceedings of the Thirty-Third Annual
ACM Symposium on Theory of Computing
, STOC ’01,
pages 502–509, New York, NY, USA, July 2001.
Association for Computing Machinery.
[10] A. R. Karlin, M. S. Manasse, L. A. McGeoch, and
S. Owicki. Competitive randomized algorithms for
nonuniform problems.
Algorithmica
, 11(6):542–571,
June 1994.
[11] H. M. Markowitz.
Portfolio Selection: E
ffi
cient
Diversification of Investments
. Yale University Press,
1959.
[12] R. T. Rockafellar and S. Uryasev. Conditional
value-at-risk for general loss distributions.
Journal of
Banking & Finance
, 26(7):1443–1471, July 2002.
[13] A. Tamkin, C. Dann, R. Keramati, and E. Brunskill.
Distributionally-Aware Exploration for CVaR
Bandits. In
NeurIPS 2019 Workshop on Safety and
Robustness on Decision Making
, 2019.
8
Performance
Evaluation
Review,
Vol.
52,
No.
2, September
2024