Tests of general relativity with the binary black hole signals
from the LIGO-Virgo catalog GWTC-1
B. P. Abbott
etal.
*
(The LIGO Scientific Collaboration and the Virgo Collaboration)
(Received 29 March 2019; published 20 November 2019)
The detection of gravitational waves by Advanced LIGO and Advanced Virgo provides an opportunity
to test general relativity in a regime that is inaccessible to traditional astronomical observations and
laboratory tests. We present four tests of the consistency of the data with binary black hole gravitational
waveforms predicted by general relativity. One test subtracts the best-fit waveform from the data and
checks the consistency of the residual with detector noise. The second test checks the consistency of the
low- and high-frequency parts of the observed signals. The third test checks that phenomenological
deviations introduced in the waveform model (including in the post-Newtonian coefficients) are
consistent with 0. The fourth test constrains modifications to the propagation of gravitational waves
due to a modified dispersion relation, including that from a massive graviton. We present results both for
individual events and also results obtained by combining together particularly strong events from the first
and second observing runs of Advanced LIGO and Advanced Virgo, as collected in the catalog GWTC-1.
We do not find any inconsistency of the data with the predictions of general relativity and improve our
previously presented combined constraints by factors of 1.1 to 2.5. In particular, we bound the mass of
the graviton to be
m
g
≤
4
.
7
×
10
−
23
eV
=c
2
(90% credible level), an improvement of a factor of 1.6 over
our previously presented results. Additionally, we check that the four gravitational-wave events published
for the first time in GWTC-1 do not lead to stronger constraints on alternative polarizations than those
published previously.
DOI:
10.1103/PhysRevD.100.104036
I. INTRODUCTION
Einstein
’
s theory of gravity, general relativity (GR), has
withstood a large number of experimental tests
[1]
. With
the advent of gravitational-wave (GW) astronomy and the
observations by the Advanced LIGO
[2]
and Advanced
Virgo
[3]
detectors, a range of new tests of GR have
become possible. These include both weak-field tests of the
propagation of GWs, as well as tests of the strong-field
regime of compact binary sources. See
[4
–
8]
for previous
applications of such tests to GW data.
We report results from tests of GR on all the confident
binary black hole GW events in the catalog GWTC-1
[9]
,
i.e., those from the first and second observing runs of the
advanced generation of detectors. Besides all of the
events previously announced (GW150914, GW151012,
GW151226, GW170104, GW170608, and GW170814)
[5
–
7,10
–
13]
, this includes the four new GWevents reported
in
[14]
(GW170729, GW170809, GW170818, and
GW170823). We do not investigate any of the marginal
triggers in GWTC-1, which have a false-alarm rate (FAR)
greater than one per year. Table
I
displays a complete list of
the events we consider. Tests of GR on the binary neutron
star event GW170817 are described in
[8]
.
The search results in
[14]
originate from two modeled
searches and one weakly modeled search
[5,11,14,15]
.
The modeled searches use templates based on GR to find
candidate events and to assess their significance. However,
detection by such searches does not in itself imply full
compatibility of the signal with GR
[16,17]
. The weakly
modeled search relies on coherence of signals between
multiple detectors, as expected for an astrophysical source.
While it assumes that the morphology of the signal
resembles a chirp (whose frequency increases with time),
as expected for a compact binary coalescence, it does not
assume that the detailed waveform shape agrees with
GR. A transient signal strongly deviating from GR would
likely be found by the weakly modeled search, even if
missed by the modeled searches. So far, however, all
significant [FAR
<
ð
1
yr
Þ
−
1
] transient signals found by
the weakly modeled search were also found by at least one
of the modeled searches
[14]
.
At present, there are no complete theories of gravity
other than GR that are mathematically and physically
viable and provide well-defined alternative predictions
for the waveforms arising from the coalescence of
two black holes (if, indeed, these theories even admit
*
Full author list given at the end of the article.
PHYSICAL REVIEW D
100,
104036 (2019)
2470-0010
=
2019
=
100(10)
=
104036(30)
104036-1
© 2019 American Physical Society
black holes).
1
Thus, we cannot test GR by direct compari-
son with other specific theories. Instead, we can (i) check
the consistency of the GR predictions with the data and
(ii) introduce
ad hoc
modifications in GR waveforms to
determine the degree to which the values of the deviation
parameters agree with GR.
These methods are agnostic to any particular choice of
alternative theory. For the most part, our results should
therefore be interpreted as observational constraints on
possible GW phenomenologies, independent of the overall
suitability or well-posedness of any specific alternative to GR.
These limits are useful in providing a quantitative indication
of the degree towhich the data are described by GR; they may
also be interpreted more specifically in the context of any
given alternative to produce constraints, if applicable.
In particular, with regard to the consistency of the GR
predictions (i), we (a) look for residual power after sub-
tracting the best-fitting GR waveform from the data, and
(b) evaluate the consistency of the high- and low-frequency
components of the observed signal. With regard to devia-
tions from GR (ii), we separately introduce parametrizations
for (a) the emitted waveform, and (b) its propagation. The
former could be viewed as representing possible GR
modifications in the strong-field region close to the binary,
while the latter would correspond to weak-field modifica-
tions away from the source. Although we consider these
independently, modifications to GW propagation would
most likely be accompanied by modifications to GW
generation in any given extension of GR. We have also
checked that none of the events discussed here provide
stronger constraints on models with purely vector and purely
scalar GW polarizations than those previously published in
[7,8]
. Our analyses do not reveal any inconsistency of the
data with the predictions of GR. These results supersede all
our previous testing GR results on the binary black hole
signals found in O1 and O2
[4
–
7]
.Inparticular,the
previously published residuals and propagation test results
were affected by a slight normalization issue.
Limits on deviations from GR for individual events are
dominated by statistical errors due to detector noise. These
errors can be reduced by appropriately combining results
from multiple events. Sources of systematic errors, on the
other hand, include uncertainties in the detector calibration
and power spectral density (PSD) estimation and errors in the
modeling of waveforms in GR. Detector calibration uncer-
tainties are modeled as corrections to the measured detector
response function and are marginalized over. Studies on the
effect of PSD uncertainties are currently ongoing. A full
characterization of the systematic errors due to the GR
waveform models that we employ is beyond the scope of
this study; some investigations can be found in
[21
–
25]
.
This paper is organized as follows. Section
II
provides an
overview of the data sets employed here, while Sec.
III
details
which GW events are used to produce the individual and
combined results presented in this paper. In Sec.
IV
we
explain the gravitational waveforms and data analysis for-
malismswhichourtestsofGRarebasedon,beforewepresent
the results in the following sections. Section
V
contains two
signal consistency tests: the residuals test in
VA
and the
inspiral-merger-ringdown consistency test in
VB
.Results
from parametrized tests are given in Sec.
VI
for GW
generation, and in Sec.
VII
for GW propagation. We briefly
discussthestudy of GW polarizationsin Sec.
VIII
. Finally, we
conclude in Sec.
IX
. We give results for individual events and
some checks on waveform systematics in the Appendix.
The results of each test and associated data products can
be found in Ref.
[26]
. The GW strain data for all the events
considered are available at the Gravitational Wave Open
Science Center
[27]
.
II. DATA, CALIBRATION, AND CLEANING
The first observing run of Advanced LIGO (O1) lasted
from September 12th, 2015 to January 19th, 2016. The
second observing run (O2) lasted from November 30th, 2016
to August 25th, 2017, with the Advanced Virgo observatory
joining on August 1st, 2017. This paper includes all GW
events originating from the coalescence of two black holes
found in these two data sets and published in
[5,14]
.
The GW detector
’
s response to changes in the differ-
ential arm length (the interferometer
’
s degree of freedom
most sensitive to GWs) must be calibrated using indepen-
dent, accurate, absolute references. The LIGO detectors use
photon recoil (radiation pressure) from auxiliary laser
systems to induce mirror motions that change the arm
cavity lengths, allowing a direct measure of the detector
response
[28
–
30]
. Calibration of Virgo relies on measure-
ments of Michelson interference fringes as the main optics
swing freely, using the primary laser wavelength as a
fiducial length. Subsequent measurements propagate the
calibration to arrive at the final detector response
[31,32]
.
These complex-valued, frequency-dependent measure-
ments of the LIGO and Virgo detectors
’
response yield
the uncertainty in their respective estimated amplitude and
phase of the GW strain output. The amplitude and phase
correction factors are modeled as cubic splines and mar-
ginalized over in the estimation of astrophysical source
parameters
[14,33
–
35]
. Additionally, the uncertainty in the
time stamping of Virgo data (much larger than the LIGO
timing uncertainty, which is included in the phase correc-
tion factor) is also accounted for in the analysis.
Postprocessing techniques to subtract noise contributions
and frequency lines from the data around gravitational-wave
1
There are very preliminary simulations of scalar waveforms
from binary black holes in the effective field theory (EFT)
framework in alternative theories
[18,19]
, and the leading correc-
tions to the gravitational waveforms in head-on collisions
[20]
, but
these simulations require much more development before their
results can be used in gravitational-wave data analysis. There are
also concerns about the mathematical viability of the theories
considered when they are not treated in the EFT framework.
B. P. ABBOTT
et al.
PHYS. REV. D
100,
104036 (2019)
104036-2
events were developed in O2 and introduced in
[7,13,36]
,
for the astrophysical parameter estimation of GW170608,
GW170814, and GW170817. This noise subtraction was
achieved using optimal Wiener filters to calculate coupling
transfer functions from auxiliary sensors
[37]
.Anew,
optimized parallelizable method in the frequency domain
[38]
allows large scale noise subtraction on LIGO data. All
of the O2 analyses presented in this manuscript use the
noise-subtracted data set with the latest calibration available.
The O1 data set is the same used in previous publications, as
the effect of noise subtraction is expected to be negligible.
Reanalysis of the O1 events is motivated by improvements in
the parameter estimation pipeline, an improved frequency-
dependent calibration, and the availability of new waveform
models.
III. EVENTS AND SIGNIFICANCE
We present results for all confident detections of binary
black hole events in GTWC-1
[9]
, i.e., all such events
detected during O1 and O2 with a FAR lower than one per
year, as published in
[14]
. The central columns of Table
I
list the FARs of each event as evaluated by the three search
pipelines used in
[14]
. Two of these pipelines (
PYCBC
and
GSTLAL
) rely on waveform templates computed from
binary black hole coalescences in GR. Making use of a
measure of significance that assumes the validity of GR
could potentially lead to biases in the selection of events to
be tested, systematically disfavoring signals in which a GR
violation would be most evident. Therefore, it is important
to consider the possibilities that (1) there were GW signals
with such large deviations from GR that they were missed
entirely by the modeled searches, and (2) there were events
that were picked up by the modeled searches but classified
as marginal (and thus excluded from our analysis) because
of their significant deviations from GR.
These worries can largely be dispelled by considering
the third GW search pipeline, the coherent WaveBurst
(
CWB
) weakly modeled search presented in
[14]
. This
CWB
search
[15,39,40]
was tuned to detect chirping
signals
—
like those that would be expected from compact
binary coalescences
—
but was not tuned to any specific
GR predictions.
2
CWB
is most sensitive to short signals
TABLE I. The GWevents considered in this paper, separated by observing run. The first block of columns gives the names of the events
and lists some of their relevant properties obtained using GR waveforms (luminosity distance
D
L
, source frame total mass
M
tot
and final
mass
M
f
, and dimensionless final spin
a
f
). The next block of columns gives the significance, measured by the FAR, with which each event
was detected by each of the three searches employed, as well as the matched filter signal-to-noise ratio from the stochastic sampling
analyses with GR waveforms. An ellipsis indicates that an event was not identified by a search. The parameters and SNR values give the
medians and 90% credible intervals. All the events except for GW151226 and GW170729 are consistent with a binary of nonspinning
black holes (when analyzed assuming GR). See
[14]
for more details about all the events. The last block of columns indicates which GR
tests are performed on a given event: RT
¼
residuals test (Sec.
VA
); IMR
¼
inspiral-merger-ringdown consistency test (Sec.
VB
); PI and
PPI
¼
parametrized tests of GW generation for inspiral and postinspiral phases (Sec.
VI
); MDR
¼
modified GW dispersion relation
(Sec.
VII
). The events with bold names are used to obtain the combined results for each test.
Properties
FAR
GR tests performed
Event
D
L
[Mpc]
M
tot
[
M
⊙
]
M
f
[
M
⊙
]
a
f
PYCBC
[yr
−
1
]
GSTLAL
[yr
−
1
]
CWB
[yr
−
1
] SNR RT IMR PI PPI MDR
GW150914
a
440
þ
150
−
170
66
.
1
þ
3
.
8
−
3
.
3
63
.
1
þ
3
.
4
−
3
.
0
0
.
69
þ
0
.
05
−
0
.
04
<
1
.
5
×
10
−
5
<
1
.
0
×
10
−
7
<
1
.
6
×
10
−
4
25
.
3
þ
0
.
1
−
0
.
2
✓✓✓✓✓
GW151012
a
1080
þ
550
−
490
37
.
2
þ
10
.
6
−
3
.
9
35
.
6
þ
10
.
8
−
3
.
8
0
.
67
þ
0
.
13
−
0
.
11
0.17
7
.
9
×
10
−
3
9
.
2
þ
0
.
3
−
0
.
4
✓
✓✓
GW151226
a,b
450
þ
180
−
190
21
.
5
þ
6
.
2
−
1
.
5
20
.
5
þ
6
.
4
−
1
.
5
0
.
74
þ
0
.
07
−
0
.
05
<
1
.
7
×
10
−
5
<
1
.
0
×
10
−
7
0.02
12
.
4
þ
0
.
2
−
0
.
3
✓
✓
✓
GW170104
990
þ
440
−
430
51
.
0
þ
5
.
3
−
4
.
1
48
.
9
þ
5
.
1
−
4
.
0
0
.
66
þ
0
.
08
−
0
.
11
<
1
.
4
×
10
−
5
<
1
.
0
×
10
−
7
2
.
9
×
10
−
4
14
.
0
þ
0
.
2
−
0
.
3
✓✓✓✓✓
GW170608
320
þ
120
−
110
18
.
6
þ
3
.
2
−
0
.
7
17
.
8
þ
3
.
4
−
0
.
7
0
.
69
þ
0
.
04
−
0
.
04
<
3
.
1
×
10
−
4
<
1
.
0
×
10
−
7
1
.
4
×
10
−
4
15
.
6
þ
0
.
2
−
0
.
3
✓
✓✓✓
GW170729
c
2840
þ
1400
−
1360
84
.
4
þ
15
.
8
−
11
.
1
79
.
5
þ
14
.
7
−
10
.
2
0
.
81
þ
0
.
07
−
0
.
13
1.4
0.18
0.02
10
.
8
þ
0
.
4
−
0
.
5
✓✓
✓✓
GW170809
1030
þ
320
−
390
59
.
0
þ
5
.
4
−
4
.
1
56
.
3
þ
5
.
2
−
3
.
8
0
.
70
þ
0
.
08
−
0
.
09
1
.
4
×
10
−
4
<
1
.
0
×
10
−
7
12
.
7
þ
0
.
2
−
0
.
3
✓✓
✓✓
GW170814
600
þ
150
−
220
55
.
9
þ
3
.
4
−
2
.
6
53
.
2
þ
3
.
2
−
2
.
4
0
.
72
þ
0
.
07
−
0
.
05
<
1
.
2
×
10
−
5
<
1
.
0
×
10
−
7
<
2
.
1
×
10
−
4
17
.
8
þ
0
.
3
−
0
.
3
✓✓✓✓✓
GW170818
1060
þ
420
−
380
62
.
2
þ
5
.
2
−
4
.
1
59
.
4
þ
4
.
9
−
3
.
8
0
.
67
þ
0
.
07
−
0
.
08
4
.
2
×
10
−
5
11
.
9
þ
0
.
3
−
0
.
4
✓✓
✓✓
GW170823
1940
þ
970
−
900
68
.
7
þ
10
.
8
−
8
.
1
65
.
4
þ
10
.
1
−
7
.
4
0
.
72
þ
0
.
09
−
0
.
12
<
3
.
3
×
10
−
5
<
1
.
0
×
10
−
7
2
.
1
×
10
−
3
12
.
0
þ
0
.
2
−
0
.
3
✓✓
✓✓
a
The FARs for these events differ from those in
[5]
because the data were reanalyzed with the new pipeline statistics used in O2
(see
[14]
for more details).
b
At least one black hole has dimensionless spin
>
0
.
28
(99% credible level).
c
This event has a higher significance in the unmodeled search than in the modeled searches. Additionally, at least one black hole has
dimensionless spin
>
0
.
27
(99% credible level).
2
Chirping signals from compact binary coalescences are a
feature of many theories of gravity. All that is required is that the
orbital frequency increases as the binary radiates energy and
angular momentum in GWs.
TESTS OF GENERAL RELATIVITY WITH THE BINARY BLACK
...
PHYS. REV. D
100,
104036 (2019)
104036-3
from high-mass binary black holes. It is still able to detect
signals from lower mass binaries (e.g., GW151226),
though with reduced significance compared to the
modeled searches. Thus, a signal from a low-mass binary,
or a marginal event, with a significant departure from
the GR predictions (hence not detected by the GR
modeled searches) would not necessarily be detected
by the
CWB
search with a FAR
<
ð
1
yr
Þ
−
1
.However,if
there is a population of such signals, they will not all be
weak and/or from low-mass binaries. Thus, one would
expect some of the signals in the population to be detected
by
CWB
, even if they evade detection by the modeled
searches.
All signals detected by the
CWB
search with FAR
<
ð
1
yr
Þ
−
1
were also found by at least one modeled search
with FAR
<
ð
1
yr
Þ
−
1
. Given the above considerations,
this is evidence that our analysis does not exclude
chirping GW signals that were missed in the modeled
searches because of drastic departures from GR.
Similarly, this is also evidence against the possibility
of marginal events representing a population of GR-
deviating signals, as none of them show high significance
[FAR
<
ð
1
yr
Þ
−
1
]inthe
CWB
search only. Thus, we
believe that we have not biased our analysis by consid-
eringonlytheteneventswithFAR
<
ð
1
yr
Þ
−
1
,as
published in
[14]
.
We consider each of the GW events individually,
carrying out different analyses on a case-by-case basis.
Some of the tests presented here, such as the inspiral-
merger-ringdown (IMR) consistency test in Sec.
VB
and the parametrized tests in Sec.
VI
, distinguish between
the inspiral and the postinspiral regimes of the signal.
The separation between these two regimes is performed
in the frequency domain, choosing a particular cutoff
frequency determined by the parameters of the event.
Larger-mass systems merge at lower frequencies, pre-
senting a short inspiral signal in band; lower mass
systems have longer observable inspiral signals, but
the detector
’
s sensitivity decreases at higher frequencies
and hence the postinspiral signal becomes less inform-
ative. Therefore, depending on the total mass of the
system, a particular signal might not provide enough
information within the sensitive frequency band of the
GW detectors for all analyses.
As a proxy for the amount of information that can
be extracted from each part of the signal, we calculate
the signal-to-noise ratio (SNR) of the inspiral and the
postinspiral parts of the signals separately. We only
apply inspiral (postinspiral) tests if the inspiral (post-
inspiral) SNR is greater than 6. Each test uses a dif-
ferent inspiral-cutoff frequency, and hence they
assign different SNRs to the two regimes (details pro-
vided in the relevant section for each test). In Table
I
we
indicate which analyses have been performed on which
event, based on this frequency and the correspond-
ing SNR.
3
In addition to the individual analysis of each event, we
derive combined constraints on departures from GR using
multiple signals simultaneously. Constraints from individual
events are largely dominated by statistical uncertainties due
to detector noise. Combining events together can reduce
such statistical errors on parameters that take consistent
values across all events. However, it is impossible to make
joint probabilistic statements from multiple events without
prior assumptions about the nature of each observation and
how it relates to others in the set. This means that, although
there are well-defined statistical procedures for producing
joint results, there is no unique way of doing so.
In light of this, we adopt what we take to be the most
straightforward strategy, although future studies may follow
different criteria. First, in combining events we assume that
deviations from GR are manifested equally across events,
independent of source properties. This is justified for studies
of modified GW propagation, since those effects should not
depend on the source.
4
For other analyses, it is quite a strong
assumption to take all deviations from GR to be indepen-
dent of source properties. Such combined tests should not
be expected to necessarily reveal generic source-dependent
deviations, although they might if the departures from GR
are large enough (see, e.g.,
[41]
). Future work may circum-
vent this issue by combining marginalized likelihood ratios
(Bayes factors), instead of posterior probability distributions
[42]
. More general ways of combining results are discussed
and implemented in Refs.
[43,44]
.
Second, we choose to produce combined constraints
only from events that were found in both modeled searches
(
PYCBC
[45
–
47]
and
GSTLAL
[48,49]
) with a FAR of at most
one per one thousand years. This ensures that there is a very
small probability of inclusion of a nonastrophysical event.
The events used for the combined results are indicated with
bold names in Table
I
. The events thus excluded from the
combined analysis have low SNR and would therefore
contribute only marginally to tightened constraints.
Excluding marginal events from our analyses amounts to
assigning a null
a priori
probability to the possibility that
those data contain any information about the tests in
question. This is, in a sense, the most conservative choice.
In summary, we enforce two significance thresholds:
FAR
<
ð
1
yr
Þ
−
1
, for single-event analyses, and FAR
<
ð
1000
yr
Þ
−
1
, for combined results. This two-tiered setup
3
While we perform these tests on all events with SNR
>
6
in
the appropriate regime, in a few cases the results appear
uninformative and the posterior distribution extends across the
entire prior considered. Since the results are prior dependent,
upper limits should not be set from these individual analyses. See
Sec.
A3
of the Appendix for details.
4
Propagation effects do depend critically on source distance.
However, this dependence is factored out explicitly, in a way that
allows for combining events as we do here (see Sec.
VII
).
B. P. ABBOTT
et al.
PHYS. REV. D
100,
104036 (2019)
104036-4
allows us to produce conservative joint results by including
only the most significant events, while also providing
information about a broader (less significant) set of triggers.
This is intended to enable the interested reader to combine
individual results with less stringent criteria and under
different statistical assumptions, according to their specific
needs and tolerance for false positives. In the future, we may
adapt our thresholds depending on the rate of detections.
IV. PARAMETER INFERENCE
The starting point for all the analyses presented here are
waveform models that describe the GWs emitted by
coalescing black hole binaries. The GW signature depends
on the intrinsic parameters describing the binary as well as
the extrinsic parameters specifying the location and ori-
entation of the source with respect to the detector network.
The intrinsic parameters for circularized black-hole bina-
ries in GR are the two masses
m
i
of the black holes and the
two spin vectors
⃗
S
i
defining the rotation of each black hole,
where
i
∈
f
1
;
2
g
labels the two black holes. We assume
that the binary has negligible orbital eccentricity, as is
expected to be the case when the binary enters the band of
ground-based detectors
[50,51]
(except in some more
extreme formation scenarios,
5
e.g.,
[60
–
63]
). The extrinsic
parameters comprise four parameters that specify the space-
time location of the binary black hole, namely, the sky
location (right ascension and declination), the luminosity
distance, and the time of coalescence. In addition, there are
three extrinsic parameters that determine the orientation of
the binary with respect to Earth, namely, the inclination
angle of the orbit with respect to the observer, the
polarization angle, and the orbital phase at coalescence.
We employ two waveform families that model binary
black holes in GR: the effective-one-body based
SEOBNR
v4
[21]
waveform family that assumes nonprecessing spins
for the black holes (we use the frequency domain reduced
order model
SEOBNR
v4_
ROM
for reasons of computational
efficiency), and the phenomenological waveform family
IMRPHENOMP
v2
[22,64,65]
that models the effects of pre-
cessing spins using two effective parameters by twisting up
the underlying aligned-spin model. We use
IMRPHENOMP
v2
to obtain all the main results given in this paper, and use
SEOBNR
v4
to check the robustness of these results, whenever
possible. When we use
IMRPHENOMP
v2
,weimposeaprior
m
1
=m
2
≤
18
on the mass ratio, as the waveform family is
not calibrated against numerical-relativity (NR) simulations
for
m
1
=m
2
>
18
. We do not impose a similar prior when
using
SEOBNR
v4
, since it includes information about the
extreme mass ratio limit. Neither of these waveform models
includes the full spin dynamics (which requires six spin
parameters). Fully precessing waveform models have been
recently developed
[24,66
–
69]
and will be used in future
applications of these tests.
Thewaveformmodels used in this paperdo not include the
effects of subdominant (nonquadrupole) modes, which are
expected to be small for comparable-mass binaries
[70,71]
.
The first generation of binary black hole waveform models
including spin and higher order modes has recently been
developed
[68,69,72
–
74]
. Preliminary results in
[14]
, using
NR simulations supplemented by NR surrogate waveforms,
indicate that the higher mode content of the GW signals
detected by Advanced LIGO and Virgo is weak enough that
models without the effect of subdominant modes do not
introduce substantial biases in the intrinsic parameters of the
binary. For unequal-mass binaries, the effect of the non-
quadrupole modes is more pronounced
[75]
, particularly
when the binary
’
s orientation is close to edge on. In these
cases,thepresenceofnonquadrupolemodescanshowupasa
deviation from GR when using waveforms that only include
the quadrupole modes, as was shown in
[76]
. Applications of
tests of GR with the new waveform models that include
nonquadrupole modes will be carried out in the future.
We believe that our simplifying assumptions on the
waveform models (zero eccentricity, simplified treatment
of spins, and neglect of subdominant modes) are justified
by astrophysical considerations and previous studies.
Indeed, as we show in the remainder of the paper, the
observed signals are consistent with the waveform models.
Of course, had our analyses resulted in evidence for
violations of GR, we would have had to revisit these
simplifications very carefully.
The tests described in this paper are performed within
the framework of Bayesian inference, by means of the
LALINFERENCE
code
[34]
in the LIGO Scientific Colla-
boration Algorithm Library Suite (LALSuite)
[77]
.We
estimate the PSD using the
BAYESWAVE
code
[78,79]
,as
described in Appendix B of
[14]
. Except for the residuals
test described in Sec.
VA
, we use the waveform models
described in this section to estimate from the data the
posterior distributions of the parameters of the binary.
These include not only the intrinsic and extrinsic parameters
mentioned above, but also other parameters that describe
possible departuresfrom theGR predictions.Specifically,for
the parametrized tests in Secs.
VI
and
VII
, we modify the
phase
Φ
ð
f
Þ
of the frequency-domain waveform
̃
h
ð
f
Þ¼
A
ð
f
Þ
e
i
Φ
ð
f
Þ
:
ð
1
Þ
For the GR parameters, we use the same prior distributions
as the main parameter estimation analysis described in
[14]
,
though for a number of the tests we need to extend the
ranges of these priors to account for correlations with the
non-GR parameters, or for the fact that only a portion of
the signal is analyzed (as in Sec.
VB
). We also use the same
5
These scenarios could occur often enough, compared to the
expected rate of detections, that the inclusion of eccentricity in
waveform models is a necessity for tests of GR in future
observing runs; see, e.g.,
[52
–
59]
for recent work on developing
such waveform models.
TESTS OF GENERAL RELATIVITY WITH THE BINARY BLACK
...
PHYS. REV. D
100,
104036 (2019)
104036-5
low-frequency cutoffs for the likelihood integral as in
[14]
,
i.e., 20 Hz for all events except for GW170608, where 30 Hz
is used for LIGO Hanford, as discussed in
[13]
, and
GW170818, where 16 Hz is used for all three detectors.
Forthemodelagnostic residual test described inSec.
VA
,we
use the
BAYESWAVE
code
[78]
which describes the GW
signals in terms of a number of Morlet-Gabor wavelets.
V. CONSISTENCY TESTS
A. Residuals test
One way to evaluate the ability of GR to describe GW
signals is to subtract the best-fit template from the data and
make sure the residuals have the statistical properties expe-
cted of instrumental noise. This largely model-independent
test is sensitive to a wide range of possible disagreements
between the data and our waveform models, including those
caused by deviations from GR and by modeling systematics.
This analysis can look for GR violations without relying on
specific parametrizations of the deviations, making it a
versatile tool. Results from a similar study on our first
detection were already presented in
[4]
.
In order to establish whether the residuals agree with
noise (Gaussian or otherwise), we proceed as follows. For
each event in our set, we use
LALINFERENCE
and the
IMRPHENOMP
v2
waveform family to obtain an estimate of
the best-fit (i.e., maximum likelihood) binary black hole
waveform based on GR. This waveform incorporates factors
that account for uncertainty in the detector calibration, as
described in Sec.
II
. This best-fit waveform is then subtracted
from the data to obtain residuals for a 1 second window
centered on the trigger time reported in
[14]
.
6
If the GR-based
modelprovidesa gooddescriptionof the signal,we expect the
resulting residuals at each detector to lack any significant
coherent SNR beyond what is expected from noise fluctua-
tions. We compute the coherent SNR using
BAYESWAVE
,
which models the multidetector residuals as a superposition
of incoherent Gaussian noise and an elliptically polarized
coherent signal. The residual network SNR reported by
BAYESWAVE
is the SNR that would correspond to such a
coherent signal.
In particular, for each event,
BAYESWAVE
produces a
distribution of possible residual signals consistent with the
data, together with corresponding
a posteriori
probabilities.
This is trivially translated into a probability distribution
over the coherent residual SNR. We summarize each of
these distributions by computing the corresponding 90%
credible upper limit (SNR
90
). This produces one number
per event that represents an upper bound on the coherent
power that could be present in its residuals.
We may translate theSNR
90
into a measure of how well the
best-fit templates describe the signals in our data. We do this
through the fitting factor
[80]
,FF
≔
SNR
GR
=
ð
SNR
2
res
þ
SNR
2
GR
Þ
1
=
2
, where SNR
res
is the coherent residual SNR
and SNR
GR
is the network SNR of the best-fit template
(see Table
I
for network SNRs). By setting SNR
res
¼
SNR
90
,
we produce a 90% credible lower limit on the fitting
factor (FF
90
). Because the FF is itself a lower limit on the
overlap between the true and best-fit templates, so is FF
90
.
As in
[4]
, we may then assert that the disagreement between
the true waveform and our GR-based template is at most
ð
1
−
FF
90
Þ
×
100%
. This is interesting as a measure of the
sensitivity of our test, but does not tell us about the
consistency of the residuals with instrumental noise.
To assess whether the obtained residual SNR
90
values
are consistent with detector noise, we run an identical
BAYESWAVE
analysis on 200 different sets of noise-only
detector data near each event. This allows us to estimate the
p
-value for the null hypothesis that the residuals are
consistent with noise. The
p
-value gives the probability
of noise producing coherent power with SNR
n
90
greater than
or equal to the residual value SNR
90
, i.e.,
p
≔
P
ð
SNR
n
90
≥
SNR
90
j
noise
Þ
. In that sense, a smaller
p
-value indicates a
smaller chance that the residual power arose from instru-
mental noise only. For each event, our estimate of
p
is
produced from the fraction of noise instantiations that
yielded SNR
n
90
≥
SNR
90
(that is, from the empirical sur-
vival function).
7
Our results are summarized in Table
II
. For each event,
we present the values of the residual SNR
90
, the lower limit
on the fitting factor (FF
90
), and the SNR
90
p
-value. The
background distributions that resulted in those
p
-values are
shown in Fig.
1
. In Fig.
1
we represent these distributions
through the empirical estimate of their survival functions,
i.e.,
p
ð
SNR
90
Þ¼
1
−
CDF
ð
SNR
90
Þ
, with CDF being the
cumulative distribution function. Figure
1
also displays the
actual value of SNR
90
measured from the residuals of each
event (dotted vertical line). In each case, the height of the
curve evaluated at the SNR
90
measured for the correspond-
ing detection yields the
p
-value reported in Table
II
(markers in Fig.
1
).
The values of residual SNR
90
vary widely among events
because they depend on the specific state of the instruments
at the time of detection: segments of data with elevated
noise levels are more likely to result in spurious coherent
residual power, even if the signal agrees with GR.
In particular, the background distributions for events seen
by three detectors are qualitatively different from those seen
by only two. This is both due to (i) the fact that
BAYESWAVE
is configured to expect the SNR to increase with the
number of detectors and (ii) the fact that Virgo data present
6
The analysis is sensitive only to residual power in that 1 s
window due to technicalities related to how
BAYESWAVE
handles
its sine-Gaussian basis elements
[78,79]
.
7
Computing
p
-values would not be necessary if the noise was
perfectly Gaussian, in which case we could predict the noise-only
distribution of SNR
n
90
from first principles.
B. P. ABBOTT
et al.
PHYS. REV. D
100,
104036 (2019)
104036-6
a higher rate of non-Gaussianities than LIGO. We
have confirmed that both these factors play a role by
studying the background SNR
90
distributions for real data
from each possible pair of detectors, as well as distributions
over simulated Gaussian noise. Specifically, removing
Virgo from the analysis results in a reduction in the
coherent residual power for GW170729 (SNR
HL
90
¼
6
.
5
),
GW170809 (SNR
HL
90
¼
6
.
3
), GW170814 (SNR
HL
90
¼
6
.
0
),
and GW170818 (SNR
HL
90
¼
7
.
2
).
The event-by-event variation of SNR
90
is also reflected
in the values of FF
90
. GW150914 provides the strongest
result with FF
90
¼
0
.
97
, which corresponds to an upper
limit of 3% on the magnitude of potential deviations from
our GR-based template,
8
in the specific sense defined in
[4]
and discussed above. On the other hand, GW170818 yields
the weakest result with FF
90
¼
0
.
78
and a corresponding
upper limit on waveform disagreement of
1
−
FF
90
¼
22%
.
The average FF
90
over all events is 0.88.
The set of
p
-values shown in Table
II
is consistent with
all coherent residual power being due to instrumental noise.
Assuming that this is indeed the case, we expect the
p
-values to be uniformly distributed over [0, 1], which
explains the variation in Table
II
. With only ten events,
however, it is difficult to obtain strong quantitative evi-
dence of the uniformity of this distribution. Nevertheless,
we follow Fisher
’
s method
[81]
to compute a meta
p
-value for the null hypothesis that the individual
p
-values
in Table
II
are uniformly distributed. We obtain a meta
p
-value of 0.4, implying that there is no evidence for
coherent residual power that cannot be explained by noise
alone. All in all, this means that there is no statistically
significant evidence for deviations from GR.
B. Inspiral-merger-ringdown consistency test
The inspiral-merger-ringdown consistency test for
binary black holes
[41,82]
checks the consistency of the
low-frequency part of the observed signal (roughly corre-
sponding to the inspiral of the black holes) with the high-
frequency part (to a good approximation, produced by the
postinspiral stages).
9
The cutoff frequency
f
c
between the
two regimes is chosen as the frequency of the innermost
stable circular orbit of a Kerr black hole
[83]
, with mass and
dimensionless spin computed by applying NR fits
[84
–
87]
to the median values of the posterior distributions of the
initial masses and spherical coordinate components of the
spins. This determination of
f
c
is performed separately for
each event and based on parameter inference of the full
FIG. 1. Survival function (
p
¼
1
−
CDF) of the 90% credible
upper limit on the network SNR (SNR
90
) for each event (solid or
dashed curves), compared to the measured residual values
(vertical dotted lines). For each event, the value of the survival
function at the measured SNR
90
gives the
p
-value reported in
Table
II
(markers). The colored bands correspond to uncertainty
regions for a Poisson process and have half width
p=
ffiffiffiffi
N
p
, with
N
being the number of noise-only instantiations that yielded
SNR
n
90
greater than the abscissa value.
TABLE II. Results of the residuals analysis. For each event, this
table presents the 90% credible upper limit on the reconstructed
network SNR after subtraction of the best-fit GR waveform
(SNR
90
), a corresponding lower limit on the fitting factor (FF
90
in the text), and the SNR
90
p
-value. SNR
90
is a measure of
the maximum possible coherent signal power not captured by the
best-fit GR template, while the
p
-value is an estimate of the
probability that instrumental noise produced such SNR
90
or
higher. We also indicate which interferometers (IFOs) were used
in the analysis of a given event, either the two Advanced LIGO
detectors (HL) or the two Advanced LIGO detectors plus
Advanced Virgo (HLV). See Sec.
VA
in the main text for details.
Event
IFOs Residual SNR
90
Fitting factor
p
-value
GW150914 HL
6.1
≥
0
.
97
0.46
GW151012 HL
7.3
≥
0
.
79
0.11
GW151226 HL
5.6
≥
0
.
91
0.81
GW170104 HL
5.1
≥
0
.
94
0.99
GW170608 HL
7.9
≥
0
.
89
0.05
GW170729 HLV
6.5
≥
0
.
85
0.74
GW170809 HLV
6.5
≥
0
.
88
0.78
GW170814 HLV
8.9
≥
0
.
88
0.16
GW170818 HLV
9.2
≥
0
.
78
0.19
GW170823 HL
5.5
≥
0
.
90
0.86
8
This value is better than the one quoted in
[4]
by 1 percentage
point. The small difference is explained by several factors,
including that paper
’
s use of the maximum
a posteriori
waveform
(instead of maximum likelihood) and 95% (instead of 90%)
credible intervals, as well as improvements in data calibration.
9
Note that this is not exactly equal to testing the consistency
between the early and late part of the waveform in time domain,
because the low-frequency part of the signal could be
“
con-
taminated
”
by power from late times and vice versa. In practice,
this effect is negligible with our choice of cutoff frequencies. See
[41]
for a discussion.
TESTS OF GENERAL RELATIVITY WITH THE BINARY BLACK
...
PHYS. REV. D
100,
104036 (2019)
104036-7
signal (see Table
III
for values of
f
c
).
10
The binary
’
s
parameters are then estimated independently from the
low- (high-) frequency parts of the data by restricting
the noise-weighted integral in the likelihood calculation
to frequencies below (above) this frequency cutoff
f
c
.
For each of these independent estimates of the source
parameters, we make use of fits to numerical-relativity
simulations given in
[84
–
86]
to infer the mass
M
f
and
dimensionless spin magnitude
a
f
¼
c
j
⃗
S
f
j
=
ð
GM
2
f
Þ
of the
remnant black hole.
11
If the data are consistent with GR,
these two independent estimates have to be consistent with
each other
[41,82]
. Because this consistency test ultimately
compares between the inspiral and the postinspiral results,
posteriors of both parts must be informative. In the case of
low-mass binaries, the SNR in the part
f>f
c
is insuffi-
cient to perform this test, so that we only analyze seven
events as indicated in Tables
I
and
III
.
In order to quantify the consistency of the two different
estimates of the final black hole
’
s mass and spin we define
two dimensionless quantities that quantify the fractional
difference between them:
Δ
M
f
=
̄
M
f
≔
2
ð
M
insp
f
−
M
postinsp
f
Þ
=
ð
M
insp
f
þ
M
postinsp
f
Þ
and
Δ
a
f
=
̄
a
f
≔
2
ð
a
insp
f
−
a
postinsp
f
Þ
=
ð
a
insp
f
þ
a
postinsp
f
Þ
, where the superscripts indicate the
estimates of the mass and spin from the inspiral and
postinspiral parts of the signal.
12
The posteriors of
these dimensionless parameters, estimated from different
events, are shown in Fig.
2
. For all events, the posteriors are
consistent with the GR value (
Δ
M
f
=
̄
M
f
¼
0
,
Δ
a
f
=
̄
a
f
¼
0
).
The fraction of the posterior enclosed by the isoprobability
contour that passes through the GR value (i.e., the GR
quantile) for each event is shown in Table
III
. Figure
2
also shows the posteriors obtained by combining all
the events that pass the stronger significance threshold
FAR
<
ð
1000
yr
Þ
−
1
, as outlined in Sec.
III
(see the same
section for a discussion of caveats).
The parameter estimation is performed employing uni-
form priors in component masses and spin magnitudes and
isotropic priors in spin directions
[14]
. This introduces a
nonflat prior in the deviation parameters
Δ
M
f
=
̄
M
f
and
Δ
a
f
=
̄
a
f
, which is shown as a thin, dashed contour in Fig.
2
.
Posteriors are estimated employing the precessing spin
TABLE III. Results from the inspiral-merger-ringdown consis-
tency test for selected binary black hole events.
f
c
denotes
the cutoff frequency used to demarcate the division between the
inspiral and postinspiral regimes;
ρ
IMR
,
ρ
insp
, and
ρ
postinsp
are the
median values of the SNR in the full signal, the inspiral part, and
the postinspiral part, respectively; and the GR quantile denotes
the fraction of the posterior enclosed by the isoprobability
contour that passes through the GR value, with smaller values
indicating better consistency with GR. (Note, however, that the
posterior distribution is broader for smaller SNRs, and hence the
GR quantile is typically smaller in such cases. This effect is
further complicated by the randomness of the noise.)
Event
f
c
[Hz]
ρ
IMR
ρ
insp
ρ
postinsp
GR quantile [%]
GW150914 132 25.3 19.4 16.1
55.5
GW170104 143 13.7 10.9
8.5
24.4
GW170729
91 10.7 8.6
6.9
10.4
GW170809 136 12.7 10.6
7.1
14.7
GW170814 161 16.8 15.3
7.2
7.8
GW170818 128 12.0 9.3
7.2
25.5
GW170823 102 11.9 7.9
8.5
80.4
FIG. 2. Results of the inspiral-merger-ringdown consistency
test for the selected binary black hole events (see Table
I
).
The main panel shows 90% credible regions of the posterior
distributions of
ð
Δ
M
f
=
̄
M
f
;
Δ
a
f
=
̄
a
f
Þ
, with the cross marking
the expected value for GR. The side panels show the
marginalized posteriors for
Δ
M
f
=
̄
M
f
and
Δ
a
f
=
̄
a
f
. The thin
black dashed curve represents the prior distribution, and the
grey shaded areas correspond to the combined posteriors
from the five most significant events (as ou
tlined in Sec.
III
and Table
I
).
10
The frequency
f
c
was determined using preliminary param-
eter inference results, so the values in Table
III
are slightly
different than those that would be obtained using the posterior
samples in GWTC-1
[9]
. However, the test is robust against small
changes in the cutoff frequency
[41]
.
11
As in
[6]
, we average the
M
f
;a
f
posteriors obtained by
different fits
[84
–
86]
after augmenting the fitting formulas for
aligned-spin binaries by adding the contribution from in-plane
spins
[87]
. However, unlike in
[6,87]
, we do not evolve the spins
before applying the fits, due to technical reasons.
12
For black hole binaries with comparable masses and mod-
erate spins, as we consider here, the remnant black hole is
expected to have
a
f
≳
0
.
5
; see, e.g.,
[84
–
86]
for fitting formulas
derived from numerical simulations, or Table
I
for values of the
remnant
’
s spins obtained from GW events. Hence,
Δ
a
f
=
̄
a
f
is
expected to yield finite values.
B. P. ABBOTT
et al.
PHYS. REV. D
100,
104036 (2019)
104036-8