Accepted for publication in the Astronomical Journal
Preprint typeset using L
A
T
E
X style emulateapj v. 12/16/11
THE CALIFORNIA-KEPLER SURVEY.
III. A GAP IN THE RADIUS DISTRIBUTION OF SMALL PLANETS
1
Benjamin J. Fulton
2,3,12,*
, Erik A. Petigura
3,15
, Andrew W. Howard
3
, Howard Isaacson
4
, Geoffrey W.
Marcy
4
, Phillip A. Cargile
5
, Leslie Hebb
6
, Lauren M. Weiss
7,13
, John Asher Johnson
5
, Timothy D. Morton
8
,
Evan Sinukoff
2,3,14
, Ian J. M. Crossfield
9,16
, Lea A. Hirsch
3
Accepted for publication in the Astronomical Journal
ABSTRACT
The size of a planet is an observable property directly connected to the physics of its formation
and evolution. We used precise radius measurements from the California-Kepler Survey (CKS) to
study the size distribution of 2025
Kepler
planets in fine detail. We detect a factor of
≥
2 deficit in
the occurrence rate distribution at 1.5–2.0
R
⊕
. This gap splits the population of close-in (
P
< 100
d) small planets into two size regimes:
R
P
<
1.5
R
⊕
and
R
P
= 2
.
0
–
3
.
0
R
⊕
, with few planets in
between. Planets in these two regimes have nearly the same intrinsic frequency based on occurrence
measurements that account for planet detection efficiencies. The paucity of planets between 1.5 and
2.0
R
⊕
supports the emerging picture that close-in planets smaller than Neptune are composed of
rocky cores measuring 1.5
R
⊕
or smaller with varying amounts of low-density gas that determine their
total sizes.
Subject headings:
planetary systems,
Kepler
1.
INTRODUCTION
NASA’s
Kepler
space telescope enabled the discovery
of over 4000 transiting planet candidates
16
,
17
opened the
door to detailed studies of exoplanet demographics. One
of the first surprises to arise from studies of the newly
revealed sample of planets was the multitude of planets
with radii smaller than Neptune but larger than Earth
(
R
P
=1.0–3.9
R
⊕
, Batalha et al. 2013). Our solar sys-
tem has no example of these intermediate planets, yet
they are by far the most common in the
Kepler
sample
(Howard et al. 2012; Fressin et al. 2013; Petigura et al.
2013b; Youdin 2011; Christiansen et al. 2015; Dressing &
1
Based on observations obtained at the W. M. Keck Observa-
tory, which is operated jointly by the University of California and
the California Institute of Technology. Keck time was granted
for this project by the University of California, and California
Institute of Technology, the University of Hawaii, and NASA.
2
Institute for Astronomy, University of Hawai‘i, 2680 Wood-
lawn Drive, Honolulu, HI 96822, USA
3
California Institute of Technology, Pasadena, California,
U.S.A.
4
Department of Astronomy, University of California, Berke-
ley, CA 94720, USA
5
Harvard-Smithsonian Center for Astrophysics, 60 Garden St,
Cambridge, MA 02138, USA
6
Hobart and William Smith Colleges, Geneva, NY 14456,
USA
7
Institut de Recherche sur les Exoplanètes, Université de
Montréal, Montréal, QC, Canada
8
Department of Astrophysical Sciences, Peyton Hall, 4 Ivy
Lane, Princeton, NJ 08540 USA
9
Astronomy and Astrophysics Department, University of Cal-
ifornia, Santa Cruz, CA, USA
12
National Science Foundation Graduate Research Fellow
13
Trottier Fellow
14
Natural Sciences and Engineering Research Council of
Canada Graduate Student Fellow
15
Hubble Fellow
16
NASA Sagan Fellow
*
bfulton@hawaii.edu
16
NASA Exoplanet Archive, 2/27/2017
17
The false positive probability for the majority of the
Kepler
candidates is 5–10% (Morton & Johnson 2011).
Charbonneau 2015; Morton & Swift 2014).
A key early question of the
Kepler
mission was whether
these sub-Neptune-size planets are predominantly rocky
or possess low-density envelopes that contribute signif-
icantly to the planet’s overall size. The radial velocity
(RV) follow-up effort of the
Kepler
project focused on
22 stars hosting one or more sub-Neptunes (Marcy et al.
2014). In addition, detailed modeling of transit timing
variations (TTVs) provided mass constraints for a large
number of systems in specific architectures (e.g., Wu &
Lithwick 2013; Hadden & Lithwick 2014, 2016). The re-
sulting mass measurements revealed that most planets
larger than 1.6
R
⊕
have low densities that were incon-
sistent with purely rocky compositions, and instead re-
quired gaseous envelopes (Weiss & Marcy 2014; Rogers
2015).
The distinction between rocky and gaseous planets re-
flects the typical core sizes of planets as well as the
physical mechanisms by which planets acquire (and lose)
gaseous envelopes. The densities of planets with radii
smaller than
∼
1.6
R
⊕
are generally consistent with a
purely rocky composition (Weiss & Marcy 2014; Rogers
2015) and their radius distribution likely reflects their
initial core sizes. However, a small amount of H/He gas
added to a roughly Earth-size rocky core can substan-
tially increase planet size, without significantly increas-
ing planet mass. For this reason, it has been suggested
that the radii of sub-Neptune-size planets, along with
knowledge of the irradiation history, would be sufficient
to estimate bulk composition without additional infor-
mation (Lopez & Fortney 2013; Wolfgang & Lopez 2015).
The large number of planets smaller than Neptune dis-
covered by the
Kepler
mission was unexpected given pre-
vailing theories of planet formation, which were devel-
oped to explain the distribution of giant planets (Ida &
Lin 2004; Mordasini et al. 2009). These theories pre-
dicted that planets should either fail to accrete enough
material to become super-Earths, or they would grow
quickly, accreting all of the gas in their feeding zones
arXiv:1703.10375v2 [astro-ph.EP] 16 Jun 2017
2
Fulton et al.
growing to massive, gas-rich giant planets. Modern for-
mation models are now able to reproduce the observed
population of super-Earths (Hansen & Murray 2012;
Mordasini et al. 2012; Alibert et al. 2013; Chiang &
Laughlin 2013; Lee et al. 2014; Chatterjee & Tan 2014;
Coleman & Nelson 2014; Raymond & Cossou 2014; Lee
& Chiang 2016). Many of these new models can be cor-
roborated by measuring the bulk properties of individual
planets and the typical properties of the population.
As formation models continue to be refined, the role of
atmospheric erosion on these short-period planets is be-
coming more apparent. Several authors have predicted
the existence of a “photoevaporation valley” in the dis-
tribution of planet radii (e.g., Owen & Wu 2013; Lopez
& Fortney 2014; Jin et al. 2014; Chen & Rogers 2016;
Lopez & Rice 2016).
Photoevaporation models predict that there should be
a dearth of intermediate sub-Neptune size planets orbit-
ing in highly irradiated environments. The mass of H/He
in the envelope must be finely tuned to produce a planet
in this intermediate size range. Planets with too little
gas in their envelopes are stripped to bare, rocky cores
by the radiation from their host stars. In general, the
radii of bare, rocky cores versus planets with a few per-
cent by mass H/He envelopes depend on many uncertain
variables such as the initial core mass distribution and
the insolation flux received by the planet. A rift in the
distribution of small planet radii is a common result of
the planet formation models that include photoevapora-
tion.
Owen & Wu (2013) provided tentative observational
evidence for such a feature in the radius distribution
of
Kepler
planets. They observed a bimodal structure
in the planet radius distribution, particularly when the
planet sample was split into subsamples with low and
high integrated X-ray exposure histories. However, the
relatively large planet radius uncertainties in Owen & Wu
(2013) diluted the gap and reduced its statistical signifi-
cance. Their study also considered the number distribu-
tion of planets, and was not corrected for completeness
as we do below. Such corrections mitigate sample bias
and allow for the recovery of the underlying planet dis-
tribution from the observed one.
Here, we examine a sample of planets orbiting stars
with precisely measured radii from the California-Kepler
Survey (CKS; see Petigura et al. (2017) and Johnson
et al. (2017)). We use the precise stellar radii to update
the planet radii, bringing the distribution of planet radii
into sharper focus and revealing a gap between 1.5 and
2.0
R
⊕
.
This paper is structured as follows. In §2 we dis-
cuss our stellar and planetary samples. We describe
our methods for correcting for pipeline search sensitiv-
ity and transit probabilities in §3. In §4 we examine
the one-dimensional marginalized radius distribution and
also two-dimensional distributions of planet radius as a
function of orbital period, stellar radius, and insolation
flux. We discuss potential explanations for the observed
planet radius gap in §5 and finish with some concluding
remarks in §6.
2.
SAMPLE OF PLANETS
2.1.
California Kepler Survey
For this work we adopt the stellar sample and the
measured stellar parameters from the CKS program (Pe-
tigura et al. 2017, hereafter Paper I). The measured val-
ues of
T
eff
,
log
g
, and [Fe/H] are based on a detailed
spectroscopic characterization of 1305
Kepler
Object
of Interest (KOI) host stars using observations from
Keck/HIRES (Vogt et al. 1994). In Johnson et al. (2017,
hereafter Paper II), we associated those stellar parame-
ters from Paper I to Dartmouth isochrones (Dotter et al.
2008) to derive improved stellar radii and masses, allow-
ing us to recalculate planetary radii using the light curve
parameters from Mullally et al. (2015), hereafter “Q16”.
Median uncertainties in stellar radius improve from 25%
(Huber et al. 2014) to 11% after our CKS spectroscopic
analysis. Stellar mass uncertainties improve from 14% to
4% in the Paper II catalog. This leads to median uncer-
tainties in planet radii of 12% which enable the detection
of finer structures in the planet radius distribution.
2.2.
Sample Selection
The CKS stellar sample was constructed to address a
variety of science topics (Paper I). The core sample is
a magnitude-limited set of KOIs (
Kp
<
14.2). Addi-
tional fainter stars were added to include habitable zone
planets, ultra-short-period planets, and multi-planet sys-
tems. Here, we enumerate a list of cuts in parameter
space designed to create a sample of planets with well-
measured radii and with well-quantified detection com-
pleteness. The primary goal is to determine anew the
occurrence of planets as a function of planet radius, with
greater reliability than was previously possible.
We start by removing planet candidates deemed false
positives in Paper I. The Paper I false positive desig-
nations were determined using the false positive proba-
bilities calculated by Morton & Johnson (2011); Morton
(2012); Morton et al. (2016), the
Kepler
team’s desig-
nation available on the NASA Exoplanet Archive, and a
search for secondary lines in the HIRES spectra (Kolbl
et al. 2015) as well as any other information available
in the literature for individual KOIs. Next, we restrict
our sample to only the magnitude-limited portion of the
larger CKS sample (
Kp
<
14
.
2
).
The planet-to-star radius ratio (
R
P
/R
?
) becomes un-
certain at high impact parameters (
b
) due to degeneracies
with limb-darkening. We excluded KOIs with
b >
0
.
7
to
minimize the impact of grazing geometries. We experi-
mented other thresholds in
b
and found that our results
are relatively insensititve to
b <
0.6, 0.7, or 0.8, with the
trade-off of smaller sample size with decreasing threshold
in
b
.
We removed planets with orbital periods longer than
100 days in order to avoid domains of low completeness
(especially for planets smaller than about 4
R
⊕
) and low
transit probability.
We also excised planets orbiting evolved stars since
they have somewhat lower detectability and less cer-
tain radii. This was implemented using an
ad hoc
temperature-dependent stellar radius filter,
R
?
R
>
10
0
.
00025(
T
eff
/
K
−
5500)+0
.
20
,
(1)
which is plotted in Figure 1. We also restricted our sam-
ple to planets orbiting stars within the temperature range
The Radius Gap
3
4500
5000
5500
6000
6500
Stellar effective temperature [K]
0.5
0.7
1.0
1.5
2.1
2.9
4.2
6.0
Stellar radius [Solar radii]
8
10
12
14
16
Apparent magnitude [
]
0.5
0.7
1.0
1.5
2.1
2.9
4.2
6.0
Stellar radius [Solar radii]
Fig. 1.—
Top:
HR diagram of the sample of stars selected for
analysis. The full Paper II sample is plotted in light grey points
and the sample selected for analysis after applying the filters dis-
cussed in Section 2.2 are plotted as blue squares. Giant planet
hosting stars that fall above the dashed line given by Equation 1
are omitted from the final sample.
Bottom:
Stellar radius of CKS
stars as a function of
Kepler
magnitude (
Kp
). We note that stars
fainter than 14.2 do not follow the same stellar radius distribution.
We omit stars fainter than
Kp
= 14
.
2
to avoid biasing our planet
radius distribution. The point colors are the same as in the
top
panel.
TABLE 1
Depth of the Gap
Filter
V
A
Full CKS sample
0.746
False positives removed 0.742
Kp
<
14
.
2
0.686
b <
0
.
7
0.572
P <
100
d
0.498
Giant stars removed
0.507
T
eff
= 4700–6500 K
0.483
where we can extract precise stellar parameters from our
high resolution optical spectra (6500–4700 K). Finally,
we accounted for uncertainties in the completeness cor-
rections caused by systematic and random measurement
errors in the simulations, described in Appendix C.
The multiple filters purify the CKS sample of stars and
planets and are summarized in Figure 2. We assessed the
impact of filters on the depth of the planet radius valley
using an
ad hoc
metric
V
A
. This quantity is defined as
the ratio of the number of planets with radii of 1.64–1.97
R
⊕
(the bottom of the valley) to the average number of
planets with radii of 1.2–1.44
R
⊕
or 2.16–2.62
R
⊕
(the
peaks of the distrubtion immediately outside of the val-
ley). The radius limits for the calculation of
V
A
were
chosen so that
V
A
= 1
for a log-uniform distribution of
planets with radii between 1.2
R
⊕
and 2.62
R
⊕
. Smaller
values of
V
A
denote a deeper valley. The values of
V
A
af-
ter applying each successive filter are tabulated in Table
1.
Furlan et al. (2017) compiled a catalog of KOI host
stars that were observed using a collection of high-
resolution imaging facilities (Lillo-Box et al. 2012, 2014;
Horch et al. 2012, 2014; Everett et al. 2015; Gilliland
et al. 2015; Cartier et al. 2015; Wang et al. 2015a,b;
Adams et al. 2012, 2013; Dressing et al. 2014; Law et al.
2014; Baranec et al. 2016; Howell et al. 2011). Many of
the 1902 KOIs in the Furlan et al. (2017) catalog also ap-
pear in our sample. We investigated removing KOI hosts
with known companions or large dilution corrections but
found no significant changes to the shape of the distribu-
tion. Since only a subset of our KOIs were observed by
Furlan et al. (2017) and it is difficult to determine the bi-
narity of the parent stellar population for occurrence cal-
culations, we chose not to filter our planet catalog using
the results of high-resolution imaging. However, many
of these stars may have already been identified as false
positives in the Paper I catalog and therefore removed
from our final sample of planets.
We investigated the impact of our apparent magnitude
cut by examinging the size distribution for three ranges
of
Kp
(Figure 3). For these tests we applied all of the
filters described in this section except the
Kp
<
14
.
2
magnitude cut. We found that the planet radius dis-
tribution for
Kp
<
13
.
5
is statistically indistinguishable
from the radius distribution for planets orbiting stars
with
13
.
5
<
Kp
≤
14
.
2
. An Anderson-Darling test (An-
derson & Darling 1952; Scholz & Stephens 1987) predicts
that the two distributions were drawn from the same par-
ent population with a p-value of 0.6. However, the radius
distribution of planets orbiting host stars with
Kp
≥
14
.
2
is visually and statistically different (p-value < 0.0004).
This is somewhat expected given the non-systematic tar-
get selection for both the initial
Kepler
target stars and
the stars observed in the CKS survey. Stars with
Kp
>
14
.
2
were only observed in the CKS program because
they were hosts to multi-planet systems, habitable-zone
candidates, ultra-short period planets, or other special
cases. Targets fainter than
Kp
>
14
.
0
were observed
by
Kepler
only if their stellar and noise properties indi-
cated that there was a high probability of the detection of
small planets (Batalha et al. 2010). These non-uniform
Kepler
target selection effects motivate our choice to ex-
clude faint stars. The final distributions of planet radii
do not depend on the
Kp
<
14
.
2
or
Kp
<
14
.
0
(p-value
>
0.95) choice. But there are 153 planet candidates with
14
.
0
<
Kp
<
14
.
2
so we choose to include those additional
candidates to maximize the statistical power of the final
sample.
The two distinct peaks separated by a valley (Figure 2)
are apparent in the initial number distribution of planet
radii and the final distribution after the filters are ap-
4
Fulton et al.
20
60
100
140
typical
uncert.
full CKS sample (2025)
a)
20
60
100
140
typical
uncert.
false pos. removed (1861)
b)
20
60
100
typical
uncert.
<
.
(1232)
c)
10
30
50
70
90
typical
uncert.
<
.
(1025)
d)
10
30
50
70
90
typical
uncert.
<
d (952)
e)
10
30
50
70
90
typical
uncert.
giant stars removed (916)
f)
0.7
1.0
1.8
3.5
6.0
10.0
20.0
Planet Size [Earth radii]
10
30
50
70
90
typical
uncert.
4700 K <
< 6500 K (900)
g)
Number of Planets
Fig. 2.—
(a)
Size distribution of all planet candidates in the
CKS planet sample. Panels
(b)–(g)
show the radius distribution
after applying several successive cuts to
(b)
: remove known false
positives,
(c)
: keep candidates orbiting bright stars (
Kp
< 14.2),
(d)
: retain candidates with low impact parameters (
b <
0
.
7
),
(e)
:
keep candidates with orbital periods shorter than 100 days,
(f)
:
remove candidates orbiting giant host stars, and
(g)
: include only
candidates orbiting stars within our adopted
T
eff
range (4700 K
<
T
eff
<
6500 K). The number of planets remaining after applying
each successive filter is annotated in the upper right portion of each
panel. Our filters produce a reliable sample of accurate planet radii
and accentuate the deficit of planets at 1.8
R
⊕
.
Planet Size [Earth radii]
20
40
60
typical
uncert.
<
.
Planet Size [Earth radii]
20
40
60
Number of Planets
typical
uncert.
.
<
.
0.7
1.0
1.3
1.8
2.4
3.5
4.5
6.0
Planet Size [Earth radii]
0
20
40
60
typical
uncert.
.
Fig. 3.—
Histograms of planet radii broken up into the three
magnitude ranges annotated in each panel. All of the filters have
been applied to the sample as described in §2.2. The gap is ap-
parent in all magnitude ranges. The distribution of planet radii in
the two brightest magnitude ranges are indistinguishable (p-value
= 0.6). However, the planets orbiting stars with
Kp
>
14.2 are
statistically different (p-value = 0.0004) when compared to the
Kp
= 13.5–14.2 magnitude range. This is expected due to the non-
systematic nature of the target selection for CKS and KIC stars
fainter than
Kp
= 14.2. This motivates our removal of planets with
hosts fainter than
Kp
= 14.2.
plied. The depth of the valley increases as we apply these
filters, suggesting that the purity of the planet sample
improves with filter application. Note that the filters act
on the stellar characteristics and are agnostic to planet
radius.
Figure 4 shows histograms of the stellar radii and
planet-to-star radius ratios (
R
P
/R
?
) for the filtered sam-
ple stars. These two distributions are both unimodel.
This demonstrates that the bimodality of the planet ra-
dius distribution is not an artifact of the stellar sample
or the light curve fitting used to measure
R
P
/R
?
.
3.
COMPLETENESS CORRECTIONS
To recover the underlying planet radius distribution
from the observed distribution we made completeness
corrections to compensate for decreasing detectability of
planets with small radii and/or long orbital periods.
An additional complication associated with the com-
pleteness corrections in this work is that the stellar prop-
erties of the planet-hosting stars come from a different
source and have higher precision than the stellar prop-
erties for the full set of
Kepler
target stars. We explore
the additional uncertainties introduced by this fact by
running a suite of simulated transit surveys described
The Radius Gap
5
0.6
0.8
1.0
1.5
2.0
3.0
Stellar Radius [Solar radii]
0
20
40
60
80
100
120
Number of Planets
typical
uncert.
0.3
1.0
3.0
10.0
30.0
Planet-star radius ratio [%]
0
20
40
60
80
100
Number of Planets
typical
uncert.
Fig. 4.—
Top:
Histogram of stellar radii derived in Paper II and
used to update planet radii in this work after the filters described in
Section 2.2 are applied.
Bottom:
Histogram of planet-to-star radii
ratios for the stars remaining after the filters described in Section
2.2 are applied to the full Paper II sample of planet candidates. In
both cases, the median measurement uncertainties are plotted in
the upper right. Neither of these two histograms shows the same
bimodal feature that is observed in the planet radius distribution,
which demonstrates that the feature is not an artifact of our stellar
sample or transit fitting.
in Appendix C. We inflate the uncertainties on the his-
togram bin heights by the scaling factors listed in Table
C.1 to account for these effects.
3.1.
Pipeline Efficiency
We followed the procedure described in Christiansen
et al. (2016) using the results from their injection-
recovery experiments (Christiansen et al. 2015). They
injected about ten-thousand transit signals into the raw
pixel data and processed the results with version 9.1 of
the official
Kepler
pipeline (Jenkins et al. 2010). These
completeness tests were used to identify combinations of
transit light curve parameters that could be recovered
by the
Kepler
pipeline for a given sample of target stars.
They injected signals onto both target stars and neigh-
boring pixels to quantify the pipeline’s ability to identify
astrophysical false positives. We assumed that our sam-
ple is free of the vast majority of false positives so we only
considered injections of transits onto the target stars. We
only considered injections on stars that would have been
included in the CKS sample and would not be removed
by the filters described in §2.2. Namely, we considered in-
jected impact parameters less than 0.7, injected periods
shorter than 100 days,
Kp
≤
14
.
2
, 4700 K
< T
eff
<
6500
K, and stellar radii compatible with Equation 1 based
on the values in the Stellar17 catalog
18
prepared by the
Kepler
stellar parameters working group (Mathur et al.
2016). This leaves a total of 3840 synthetic transit sig-
nals injected onto the target pixels of 3840 stars observed
by
Kepler
. We also apply these same filters to the stars
in the Stellar17 catalog. The number of stars remain-
ing after the filters are applied is the number of stars
observed by
Kepler
that could have led to detections of
planets that would be present in our filtered planet cata-
log (
N
?
=
36,075). We calculated the fraction of injected
signals recovered as a function of injected signal-to-noise
as
m
i
=
(
R
P
R
?,i
)
2
√
T
obs
,
i
P
(
1
CDPP
dur
,
i
)
,
(2)
where
R
P
and
P
are the radius and period of the
particular injected planet.
R
?,i
is the stellar radius
for the
i
th
star in the Stellar17 catalog,
T
obs
,
i
is the
amount of time that the particular star was observed,
and
CDPP
dur
,
i
is the Combined Differential Photomet-
ric Precision (CDDP, Koch et al. 2010) value for each
star extrapolated to the transit duration for each injec-
tion. We fit a 2
nd
order polynomial in
1
/
√
d
to the
d
= 3
,
6, and 12-hour CDPP values for each star to perform the
extrapolation (Sinukoff et al. 2013).
We fit a
Γ
cumulative distribution function (CDF) to
the recovery fraction vs. injected (
m
i
) of the form
C
(
m
i
;
k,θ,l
) = Γ(
k
)
∫
m
i
−
l
θ
0
t
k
−
1
e
−
t
dt,
(3)
to derive the average pipeline efficiency.
C
(
m
i
)
is
the probability that a signal with a given value
of
m
i
would actually be detected by the
Kepler
transit search pipeline. In practice we used the
scipy.stats.gammacdf
(t, k, l,
θ
) function in SciPy ver-
sion 0.18.1. Using the
lmfit
Python package (Newville
et al. 2014) to minimize the residuals we found best-fit
values of
k
= 17
.
56
,
l
= 1
.
00
(fixed), and
θ
= 0
.
49
.
Figure 5 shows the fraction of injections recovered as a
function of
m
i
and our model for pipeline efficiency.
Our pipeline efficiency curve is
∼
15-25% lower than the
efficiency as a function of the
Kepler
multi-event statis-
tic (MES) derived in (Christiansen et al. 2015) for their
FGK subsample. The difference can be explained by the
fact that the MES is estimated in the
Kepler
pipeline
during a multidimensional grid search. In most cases,
the search grid is not fine enough to find the exact pe-
riod and transit time for a given planet candidate. Since
the grid search doesn’t find the best-fit transit model it
generally underestimates the SNR (
m
i
) by a factor of
∼
25% (Petigura et al., in preparation).
3.2.
Survey Sensitivity
For each planet detection there are a number of similar
planets that would not have been detected due to a lack
of sensitivity or unfavorable geometric transit probabil-
ity. To compensate, we weighted each planet detection
by the inverse of these probabilities,
w
i
=
1
(
p
det
·
p
tr
)
,
(4)
18
https://archive.stsci.edu/kepler/stellar17/search.php
6
Fulton et al.
TABLE 2
Planet Detection Statistics
Planet
P R
P
SNR Detection probability Transit probability Weight
candidate
d
R
⊕
m
i
p
det
p
det
1
/w
i
K00002.01 2.20 13.41 750.22
1.00
0.14
6.94
K00003.01 4.89 5.11 877.10
1.00
0.05 20.14
K00007.01 3.21 4.13 146.38
1.00
0.11
8.88
K00010.01 3.52 13.39 914.62
1.00
0.09 11.06
K00017.01 3.23 15.04 1212.38
1.00
0.11
9.40
K00018.01 3.55 13.94 820.96
1.00
0.10
9.58
K00020.01 4.44 21.41 1469.42
1.00
0.10 10.15
K00022.01 7.89 14.20 1085.97
1.00
0.06 17.98
K00041.01 12.82 2.37
37.15
0.98
0.05 22.37
K00041.02 6.89 1.35
15.04
0.91
0.07 15.98
Note
. — Table 2 is available in its entirety in machine-readable format, which also includes period
and radius uncertainties. A portion is shown here for guidance regarding its form and content. Refer
to Paper II for the CKS stellar parameters associated with each KOI. This table contains only
the subset of planet detections that passed the filters described in §2.2. The full sample of planet
candidates orbiting CKS target stars can be found in Paper II.
0
5
10
15
20
25
Signal to Noise Ratio (
)
0
20
40
60
80
100
Fraction of Injections Recovered [%]
CDF
(
=
.
,
=
.
,
=
.
)
Fig. 5.—
Fraction of injected transit signals recovered as a func-
tion of signal to noise ratio (
m
i
, Equation 2) in our subsample
of the
Kepler
target stars using the injection recovery tests from
Christiansen et al. (2015). We fit a
Γ
CDF (Equation 3) and plot
the best-fit model in green.
where
p
det
is the fraction of stars in our sample where a
transiting planet with a given signal to noise ratio given
by Equation 2 could be detected:
p
det
=
1
N
?
N
?
∑
i
C
(
m
i
)
.
(5)
The geometric transit probability is
p
tr
= 0
.
7
R
?
/a
.
The factor of 0.7 compensates for our omission of planet
detections with
b >
0
.
7
from the planet catalog. Figure
6 shows the mean pipeline completeness (
p
det
) and mean
total search completeness (
1
/w
i
) as a function of planet
radius and orbital period for the filtered Stellar17 sample
of
Kepler
target stars. The detection probabilites, transit
probabilities, and weights (
w
i
) for each planet in our final
catalog are listed in Table 2.
3.3.
Occurrence Calculation
1
3
10
30
100
300
Orbital Period [days]
0.3
0.5
1.0
2.0
4.0
10.0
Planet Radius [Earth radii]
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Pipeline Completeness
1
3
10
30
100
300
Orbital Period [days]
0.3
0.5
1.0
2.0
4.0
10.0
Planet Radius [Earth radii]
0.000
0.001
0.010
0.020
0.050
0.100
Total Detectability (1/
)
Fig. 6.—
Top:
Mean survey completeness for transiting planets
orbiting the stars in our sample (
p
det
).
Bottom:
Mean survey
completeness for all planets orbiting stars in our sample (
p
det
·
p
tr
).
Following the definitions in Petigura et al. (2013a), the
average planet occurrence rate (number of planets per
star) for any discrete bin in planet radius or orbital pe-
riod is the sum of these weights divided by the total
number of stars in the sample (
N
?
):
f
bin
=
1
N
?
n
pl
,
bin
∑
i
=1
w
i
.
(6)
The Radius Gap
7
TABLE 3
Planet Occurrence
Radius bin Number of planets per star
R
⊕
f
bin
for
P <
100
d
1.16–1.29
0
.
078
±
0
.
017
1.29–1.43
0
.
08
±
0
.
013
1.43–1.59
0
.
053
±
0
.
011
1.59–1.77
0
.
0334
±
0
.
0092
1.77–1.97
0
.
05
±
0
.
01
1.97–2.19
0
.
086
±
0
.
016
2.19–2.43
0
.
098
±
0
.
016
2.43–2.70
0
.
077
±
0
.
016
2.70–3.00
0
.
053
±
0
.
012
3.00–3.33
0
.
0316
±
0
.
0089
3.33–3.70
0
.
0242
±
0
.
0066
3.70–4.12
0
.
0094
±
0
.
0057
4.12–4.57
0
.
0056
±
0
.
0034
4.57–5.08
0
.
0037
±
0
.
0031
5.08–5.65
0
.
0066
±
0
.
0048
5.65–6.27
0
.
005
±
0
.
003
6.27–6.97
0
.
0
±
inf
6.97–7.75
0
.
0019
±
0
.
0029
7.75–8.61
0
.
0044
±
0
.
0034
8.61–9.56
0
.
00022
±
0
.
00032
9.56–10.63
0
.
001
±
0
.
0015
10.63–11.81
0
.
00035
±
0
.
00053
11.81–13.12
0
.
00104
±
0
.
00094
13.12–14.58
0
.
0038
±
0
.
0021
14.58–16.20
0
.
00084
±
0
.
00066
16.20–18.00
0
.
0003
±
0
.
0004
Again,
N
?
= 36
,
075
is the total number of dwarf stars in
the Stellar17 catalog that pass the same filters on stellar
parameters that were applied to the planet catalog: no
giant stars (selected using Equation 1), 4700 K
< T
eff
<
6500 K, and
Kp
≤
14
.
2
.
4.
THE PLANET RADIUS GAP
Figure 7 shows the completeness-corrected distribution
of planet radii for the filtered sample of 900 planets and
the corresponding occurrence values are tabulated in Ta-
ble 3. Uncertainties on the bin heights are calculated us-
ing Poisson statistics on the number of detections within
the bin, scaled by the size of the completeness correc-
tion in each bin, and scaled again by a correction factor
determined from a collection of simulated transit sur-
veys as described in Section C. The completeness cor-
rections are generally small. We are sensitive to
>
80
%
of 2.0
R
⊕
planets out to orbital periods of 100 days,
and
>
50
% of 1.0
R
⊕
planets out to 30 days (Figure 6).
The transit probability term in Equation 4 dominates
the corrections in most of the parameter space explored.
Somewhat surprisingly, the larger, sub-Neptunes receive
a completeness boost that is larger than the boost re-
ceived by the smaller, super-Earths (compare the dot-
ted grey line in Figure 7 to the solid black line) because
the sub-Neptunes tend to orbit at larger orbital distances
where transit probabilities are smaller. The mean transit
probability (
p
tr
) for planets with radii of 1.0–1.75
R
⊕
in
our sample is 6% while the transit probability for plan-
ets with radii of 1.75–3.5
R
⊕
is a factor of two lower
(3%). However, the mean detectability (
p
det
) for those
same two classes of planets are both very high at 86%
and 96% respectively.
4.1.
Comparison with Log-Uniform Distribution
TABLE 4
Spline Fit
Node Location Best-fit Value 1
σ
Credible Interval
R
⊕
(
f
bin
)
(
f
bin
)
1.3
0.078
fixed
1.5
0.051
0
.
05
±
0
.
02
1.9
0.030
0
.
03
±
0
.
02
2.4
0.116
0
.
11
±
0
.
01
3.0
0.043
0
.
044
±
0
.
005
4.5
0.0050
0
.
005
±
0
.
002
11.0
0.00050
0
.
0005
±
0
.
0003
We performed several tests to quantify the significance
of the gap in the planet radius distribution. First, we
performed a two-sided Kolmogorov-Smirnov (K-S, Kol-
mogorov 1933; Smirnov 1948) test to assess the probabil-
ity that the planet radius number distribution for radii
in the range 1–3
R
⊕
is drawn from a log-uniform dis-
tribution. This test returns a probability of 0.003 that
the planet radii between 1–3
R
⊕
are drawn from a log-
uniform distribution. However, we note that blind in-
terpretation of p-values from K-S tests can often lead to
overestimates of significance (Babu & Feigelson 2006).
Similarly, an Anderson-Darling test also rejects the hy-
pothesis that the planet radii between 1–3
R
⊕
were
drawn from a log-uniform distribution with a p-value of
0.012.
4.2.
Dip Test of Multimodality
Hartigan’s dip test is a statistical tool used to esti-
mate the probability that a sample was drawn from a
unimodal distribution or a multi-modal distribution with
≥
2 modes (Hartigan & Hartigan 1985). It is similar to
the K-S statistic in that it measures the maximum dis-
tance between an empirical distribution and a unimodal
distribution. Applying this test to the number distri-
bution of
log
R
P
for planet radii in the range 1–3
R
⊕
returns a p-value of 1.4
×
10
−
3
that the distribution was
drawn from a unimodal distribution. This strongly sug-
gests that the planet radius distribution is multi-modal.
4.3.
Spline Model
Modeling the planet radius distribution with splines
having nodes at fixed values gives a good fit for a range of
planet sizes. Virtues of this model are the small number
of free parameters and model flexibility, particularly in
asymptotic regions where others models (e.g. Gaussians)
force the distribution to zero. We fit a second-order
spline with seven node points fixed at specific radii to the
weighted histogram of planet occurrence. We excluded
from the fit bins for radii smaller than 1.14
R
⊕
where
the pipeline completeness at
P
= 100
days is less than
25%. The model was adjusted by varying the ampli-
tudes of the spline nodes, then convolving with a Gaus-
sian kernel whose width is the median fractional planet
radius uncertainty (12%). The convolved model is aver-
aged over each of the histogram bins before performing
the
χ
2
comparison. This allows us to separate the smear-
ing of the observed distribution due to measurement un-
certainties from a “deconvolved” view of the underlying
distribution. Again we found the best-fit solution using
the
lmfit
package to minimize the normalized residuals
of the histogram bins relative to the convolved model.
8
Fulton et al.
0.7
1.0
1.3
1.8
2.4
3.5
4.5
6.0
8.0
12.0
20.0
Planet Size [Earth radii]
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Number of Planets per Star
(Orbital period < 100 days)
typical
uncert.
0.7
1.0
1.3
1.8
2.4
3.5
4.5
6.0
8.0
12.0
20.0
Planet Size [Earth radii]
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Number of Planets per Star
(Orbital period < 100 days)
typical
uncert.
Fig. 7.—
Top:
Completeness-corrected histogram of planet radii for planets with orbital periods shorter than 100 days. Uncertainties
in the bin amplitudes are calculated using the suite of simulated surveys described in Section C. The light gray region of the histogram for
radii smaller than 1.14
R
⊕
suffers from low completeness. The histogram plotted in the dotted grey line is the same distribution of planet
radii uncorrected for completeness. The median radius uncertainty is plotted in the upper right portion of the plot.
Bottom:
Same as top
panel with the best-fit spline model over-plotted in the solid dark red line. The region of the histogram plotted in light grey is not included
in the fit due to low completeness. Lightly shaded regions encompass our definitions of “super-Earths” (light red) and “sub-Neptunes”
(light cyan). The dashed cyan line is a plausible model for the underlying occurrence distribution after removing the smearing caused by
uncertainties on the planet radii measurements. The cyan circles on the dashed cyan line mark the node positions and values from the
spline fit described in §4.3.