Increasing
Proteome
Coverage
Through
a Reduction
in
Analyte
Complexity
in
Single-Cell
Equivalent
Samples
Marion
Pang,
*
Jeff J. Jones,
Ting-Yu
Wang,
Baiyi Quan,
Nicole
J. Kubat,
Yanping
Qiu,
Michael
L. Roukes,
*
and Tsui-Fen
Chou
*
Cite
This:
https://doi.org/10.1021/acs.jproteome.4c00062
Read
Online
ACCESS
Metrics
& More
Article
Recommendations
*
sı
Supporting
Information
ABSTRACT:
The advancement
of sophisticated
instrumentation
in mass
spectrom-
etry has catalyzed
an in-depth
exploration
of complex
proteomes.
This
exploration
necessitates
a nuanced
balance
in experimental
design,
particularly
between
quantitative
precision
and
the enumeration
of analytes
detected.
In bottom-up
proteomics,
a key challenge
is that oversampling
of abundant
proteins
can adversely
affect
the identification
of a diverse
array
of unique
proteins.
This
issue
is especially
pronounced
in samples
with
limited
analytes,
such
as small
tissue
biopsies
or single-
cell samples.
Methods
such
as depletion
and fractionation
are suboptimal
to reduce
oversampling
in single
cell samples,
and other
improvements
on LC and mass
spectrometry
technologies
and methods
have
been
developed
to address
the trade-off
between
precision
and enumeration.
We
demonstrate
that by using
a monosubstrate
protease
for proteomic
analysis
of single-cell
equivalent
digest
samples,
an improvement
in quantitative
accuracy
can be achieved,
while
maintaining
high
proteome
coverage
established
by trypsin.
This
improvement
is
particularly
vital for the field of single-cell
proteomics,
where
single-cell
samples
with
limited
number
of protein
copies,
especially
in
the context
of low-abundance
proteins,
can benefit
from
considering
analyte
complexity.
Considerations
about
analyte
complexity,
alongside
chromatographic
complexity,
integration
with
data acquisition
methods,
and other
factors
such
as those
involving
enzyme
kinetics,
will be crucial
in the design
of future
single-cell
workflows.
KEYWORDS:
single-cell
proteomics,
peptide
identification
optimization,
protease
choice,
bottom-up
proteomics
■
INTRODUCTION
Bottom-up
proteomics,
a mass
spectrometry
approach
used
for
a majority
of current
proteomic
studies,
involves
sequencing
and identifying
protease-derived
peptides
as proxies
for full-
length
proteome
constituents.
Expanding
the
depth
of
coverage
in single-cell
proteomics
poses
a significant
technical
challenge,
due to limited
copy
number
per protein
in single-cell
samples,
especially
so for low abundance
proteins.
Notable
progress
has been
made
in single-cell
proteomic
sample
preparation
efforts.
This
includes,
for example,
the develop-
ment
of new methods
and tools
that enable
the use of smaller
sample
volumes
to minimize
sample
loss and increase
reaction
efficiency,
1
−
4
improved
throughput
capabilities
using
new
multiplexing
methods
such
as that demonstrated
in SCoPE-
MS,
5
and advances
in tools
for parallelizing
sample
preparation
such
as the ProteoCHIP
6
and other
techniques.
3,7,8
However,
a relatively
unexplored
facet
in the context
of
single-cell
proteomics
are applications
to reduce
sample
complexity,
whereby
we limit
the total
number
of analytes
both
theoretically
and in situ. Even
within
a single
cell, protein
copy
numbers
exhibit
a considerable
dynamic
range
with
almost
6 orders
of magnitude
between
the most
abundant
and
least
abundant
proteins.
9
−
11
The
detection
of abundant
proteins
often
obscures
the detection
of biologically
interesting
low-abundance
sequences,
such
as regulatory
proteins,
12
and
the uncharted
“dark
proteome”.
13,14
As a consequence,
robustness,
precision,
and
accuracy
of the quantification
process
is diminished.
15
−
18
Therefore,
the development
of
strategies
aimed
at reducing
oversampling
of abundant
proteins
may improve
the depth
of quantitative
precision
and proteome
coverage
attainable
in low analyte
and single-cell
proteomic
samples.
Reducing
complexity
in proteomic
analyses
by way of wholly
removing
the most
abundant
proteins
(i.e.,
depletion)
is one
method
routinely
applied
to the investigation
of complex
mixtures,
such
as cellular
lysates
or plasma.
19
Alternative
strategies
for mitigating
analyte
complexity
include
refining
separation
methodologies
through
advanced
liquid
chromatog-
raphy
and peptide
fractionation
techniques.
20
−
23
Moreover,
complexity
reduction
can be achieved
through
mass-spectrom-
etry tools,
such
as parallel
reaction
monitoring
(PRM)
and ion
mobility.
15,16,18
It is important
to note,
however,
that
these
approaches
may
either
impose
limitations
on throughput
or
Special
Issue:
Single-Cell
Omics
Received:
January
31, 2024
Revised:
April
23, 2024
Accepted:
May
14,
2024
Article
pubs.acs.org/jpr
© XXXX
The Authors.
Published
by
American
Chemical
Society
A
https://doi.org/10.1021/acs.jproteome.4c00062
J. Proteome
Res.
XXXX,
XXX,
XXX
−
XXX
This article is licensed under CC-BY 4.0
Downloaded via CALIFORNIA INST OF TECHNOLOGY on June 5, 2024 at 19:09:40 (UTC).
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
prove
unsuitable
for the analysis
of sparse
analyte
samples
and
single-cell
proteomes.
11,24,25
Trypsin
is currently
the most
widely
used
protease
in
bottom-up
proteomics,
and such
protocols
have
traditionally
been
optimized
using
bulk
sample
lysates.
26
−
28
In the context
of single-cell
proteomics,
various
optimizations
have
been
explored
considering
enzymes
employed
in the workflow
including
types
of trypsin,
substrate
−
enzyme
ratios
as well as
combination
of trypsin
protease
with
detergents.
2,29,30
Alternative
proteases
have
recently
been
explored
in the
context
of providing
improved
protein
sequence
coverage
by
purposefully
oversampling
individual
protein
sequences.
31,32
However,
these
methods
typically
utilize
large
sample
quantities
and
long
duration
chromotography
to maximize
analytical
depth,
these
are not ideal
for single-cell
proteomics,
where
analytes
are sparse
and desirably
high
throughput.
In
high
throughput
proteomics,
oversampling
abundant
proteins
can diminish
the total
number
of unique
proteins
identified
in
the sample,
this is especially
problematic
in samples
with
limited
analytes
such
as small
tissue
biopsies
or single
cells.
To
address
this,
we hypothesized
that
reducing
the number
of
peptide
analytes
per protein
basis
would
effectively
reduce
the
analyte
complexity
of the sample,
and thereby
yield
improved
results
for high-throughput
single
cell analyses.
We explored,
in
silico,
protease
alternatives
to trypsin,
namely
monosubstrate
proteases,
in an effort
to identify
proteases
that generate
fewer
total
peptides
than
trypsin
for a given
proteome,
while
maintaining
similar
proteome
coverage.
For
the human
proteome,
LysC,
which
cleaves
peptides
C-terminal
to lysine
(K) residues,
yields
less than
40%
as many
peptides
as trypsin
(Figure
1A),
which
cleaves
peptides
C-terminal
to both
lysine
and arginine
(R),
while
still theoretically
covering
98%
of the
known
proteome
(Figure
1B).
Additionally,
other
proteases
such
as those
cleaving
only at glutamine
(Q, GluC)
or arginine
also theoretically
show
similar
low yields
in total
peptides
while
retaining
an overall
high
proteome
coverage.
Trypsin,
LysC,
and GluC
are all commercially
readily
available,
and thus were
chosen
for further
study
to assess
the potential
experimental
impact
on proteome
coverage.
In this study,
we investigate
the application
of monosub-
strate
proteases,
such
as LysC,
to mitigate
the challenges
associated
with
oversampling
of abundant
proteins
in bottom-
up proteomics
assays.
33
Our
experimental
findings
demon-
strate
that using
LysC
results
in identifying
a similar
number
of
proteins
with
significantly
fewer
peptides
with
improved
quantitative
accuracy,
effectively
reducing
the overall
sample
complexity.
Moreover,
we explore
the broader
implications
of
this method
in the context
of single-cell
proteomic
method-
ologies,
specifically
examining
the interplay
of its effects
on
chromatographic
complexity
with
varying
LC run times,
and
analyte
complexity,
particularly
considering
the presence
of
carrier
and reference
proteomes.
By elucidating
the impact
of
monosubstrate
proteases
on analyte
complexity,
our study
contributes
to a comprehensive
understanding
of the interplay
between
experimental
methodologies
and sample
complexity
in single-cell
proteomics.
We conclude
that
monosubstrate
proteases,
such
as LysC,
can offer
a significant
advantage
in
terms
of proteome
coverage
and quantitation
accuracy
for high
throughput
analysis
of samples
containing
analyte
amounts
equivalent
to single
cells.
■
MATERIALS
AND
METHODS
In
Silico
Simulation
of
Protease
Digestion
A computational
simulation
was carried
out to estimate
the net
reduction
in complexity,
when
compared
to trypsin,
for a given
protease
(GluC:
FYWI,
trypsin:
KR, LysC:
K, ArgC:
R) or n-
terminal
digesting
for a specific
amino
acid
using
protein
sequences
from
the human
Uniprot
database
of proteins
(20 398 sequences).
The
protease
conditions
were
simulated
to allow
all possible
2 mis-cleaved
events
while
only
enumerating
peptides
with
amino
acid
lengths
between
6
and 60 residues,
PTMs
were
not considered.
All analyses
were
performed
in R (R version
4.1.2,
2021
−
11
−
01)
utilizing
the
package
msfastr
.
34
Peptide
and
unique
protein
counts
are
shown
Figure
1, demonstrating
the differences
in total
number
of peptides
that account
for a given
proportion
of the total
proteome.
Cell
Culture
Human
cell line HeLa
S3 (CCL-2.2)
was purchased
from
ATCC
and grown
as adherent
cultures
in 10 cm plates
and
maintained
in DMEM
(Sigma-Aldrich)
supplemented
with
Figure
1.
In silico
simulation
of protease
digestion
reveals
that monosubstrate
proteases
maintain
high
proteome
coverage
while
yielding
fewer
substrates
(peptides
per protein)
compared
to trypsin.
A. Number
of peptides
(relative
to trypsin)
to total
number
of proteins
yielding
peptides
within
the criteria.
Note,
LysC
(K) and ArgC
(R) yield
less than
50% the number
of peptides
compared
to trypsin
while
still covering
98% of the
human
proteome
or greater.
B. The
median
number
of peptides
per protein,
relative
to trypsin,
again
demonstrating
the potential
for
monosubstrate
proteases
to reduce
protein
oversampling
compared
to trypsin.
Journal
of
Proteome
Research
pubs.acs.org/jpr
Article
https://doi.org/10.1021/acs.jproteome.4c00062
J. Proteome
Res.
XXXX,
XXX,
XXX
−
XXX
B
10%
(v/v)
fetal
bovine
serum,
glutamine
(2 mmol/L),
penicillin
(100
IU/ml),
and
streptomycin
(100
IU/ml).
Passage
of HeLa
cells
was conducted
every
2 days,
typically
when
cell density
reached
approximately
10
6
cells/ml.
TrypLE
Express
(Thermo
Fisher
Scientific)
was used
for cell harvesting
with
gentle
pipetting.
Cell suspensions
were
then
washed
with
cold phosphate-buffered
saline
and subjected
to centrifugation
at 300
g
for 4 min,
and the supernatant
discarded.
Pellets,
containing
approximately
2
×
10
6
cells
each,
were
then
promptly
frozen
at
−
80
°
C until
further
use, prior
to cell lysis
and subsequent
digestion.
Sample
Preparation
for
Mass
Spectrometry
Identical
sets of samples
were
subjected
to two independent
workflows;
one that diluted
samples
prior
to protease
digestion
(dilute-then-digest)
and another
that subjected
samples
to an
optimal
protease
digestion,
with
respect
to sample-protease
ratio
and reaction
volumes,
prior
to dilution
(typical
bulk
digestion).
Lysis
buffer
(500
μ
L) consisting
of 50 mM
triethylammo-
nium
bicarbonate
(TEAB)
(Thermo
Scientific,
90114)
and
0.1%
n
-Dodecyl-
β
-Maltoside
(DDM)
(Thermo
Scientific,
89903)
was added
to each
cell pellet.
Each
pellet
was then
gently
pipetted,
followed
by sonication
using
a Branson
550
probe
sonicator
for five rounds
of 3 s, 10 J, pulses
at 60%
amplitude
to achieve
cell lysis.
Samples
were
then
heated
for 1
h at 70
°
C with the thermocycler’s
heated
lid set to 105
°
C for
protein
denaturation.
Finally,
samples
were
centrifuged
at 3000
rpm,
and
the
protein
concentration
in the
lysate
was
determined
using
a Pierce
BCA
Protein
Assay
kit (Thermo
Scientific,
23225).
For serial
dilution
of bulk
digested
samples,
samples
were
diluted
with
freshly
prepared
Solvent
A (comprising
97.8%
water,
2% acetonitrile,
0.2%
formic
acid
(FA))
with
0.1%
DDM.
For
samples
prepared
using
the dilute-then-digest
method,
serial
dilutions
were
prepared
at protein
concen-
trations
of 100 ng/
μ
L,
20 ng/
μ
L,
2 ng/
μ
L,
and 200 pg/
μ
L
using
Solvent
A with
0.1%
DDM.
100
μ
L of each
dilution
was
then
aliquoted
into wells
of a nonskirted
96-well
PCR
plate
(Thermo
Scientific,
AB0600).
The
remaining
100
ng/
μ
L
sample
was
used
for the preparation
of digest-then-dilute
samples
and aliquoted
300
μ
L in Protein
LoBind
1.5 mL tubes
(Eppendorf,
022431081).
Proteolytic
digestions
were
carried
out
using
Glu-C
(Thermo
Fisher
Scientific),
Lys-C
(Wako
Chemicals,
Lysyl
Endopeptidase),
and trypsin
(Thermo
Fisher).
For samples
prepared
using
the dilute-then-digest
method,
2
μ
L of enzyme
was added
to each
sample,
resulting
in a final
1:10
enzyme
−
substrate
ratio
per protein
concentration.
For samples
prepared
using
the digest-then-dilute
methods,
6
μ
L of each
enzyme
(500
ng/
μ
L)
was added
to each
aliquot
for a 1:10
enzyme
to
substrate
ratio.
Both
sets of samples
were
then
incubated
at 37
°
C overnight.
Following
digestion,
samples
were
centrifuged
at
1000
g
for 1 min,
and digestion
was quenched
with
1
μ
L of
Solvent
A with
4% FA. Peptide
concentration
was determined
using
a Pierce
Quantitative
Fluorometric
Peptide
Assay
kit
(Thermo
Scientific,
23290),
and serial
dilutions
using
Solvent
A with
0.1%
DDM
were
performed
for digest-then-dilute
Figure
2.
Comparison
of monosubstrate
enzymes
versus
trypsin
used
to digest
a 200 pg HeLa
lysate
(
n
= 3). Two
μ
L of each
sample
was used
for
LC/MS-MS
analysis
as described
in the methods,
with
200 pg representative
of a single
cell equivalent
load;
see also SI Figures
S1 and S2). A.
Average
number
of protein
groups
identified
with high confidence.
(q<0.01)
B. Number
of peptides
identified.
C. Consensus
feature
counts
plotted
for trypsin
(orange)
and LysC
(blue)
that were
either
extracted
only
(light
shading)
or further
identified
(dark
shading).
Dotted
and solid
lines
denote
medians
for extracted
and identified
features,
respectively.
D. UpSet
plot indicating
intersection
sizes
between
proteins
identified
for
samples
digested
with
trypsin,
LysC
and GluC.
E. Quantitative
rank
plot of proteins
identified.
Proteins
uniquely
identified
in samples
prepared
using
trypsin
or LysC
are outlined
in black
in D, and colored
black
in E.
Journal
of
Proteome
Research
pubs.acs.org/jpr
Article
https://doi.org/10.1021/acs.jproteome.4c00062
J. Proteome
Res.
XXXX,
XXX,
XXX
−
XXX
C
samples
to generate
samples
with
peptide
concentrations
of
100 ng/
μ
L,
20 ng/
μ
L,
2 ng/
μ
L,
and 200 pg/
μ
L,
which
were
subsequently
added
to a 96-well
plate.
LC-MS/MS
Analysis
Peptides
were
separated
on an Aurora
Ultimate
UHPLC
Column
(25 cm by 75
μ
m, 1.7
μ
m C18;
AUR3
−
25075C18,
IonOpticks)
with
column
temperature
maintained
at 50
°
C.
To optimize
system
sensitivity,
peptides
were
directly
introduced
onto
the analytical
column
without
the use of a
trapping
column.
The separation
gradient
was configured
with
a flow
rate of 0.22
μ
L/min
for all gradients.
For digestion
methods
and dilution
series,
samples
were
run using
a gradient
run time
of 50 min
(including
washing)
unless
otherwise
noted.
The
LC system
(Vanquish
Neo
UHPLC,
Thermo
Scientific)
was
coupled
to an Orbitrap
Exploris
480
mass
spectrometer
(Thermo
Scientific)
with
a Nanospray
Flex
ion
source
(Thermo
Scientific).
Data-dependent
acquisition
(DDA)
was carried
out in positive
ion mode
using
a positive
ion voltage
of 1600
V while
maintaining
the ion transfer
tube
at a temperature
of 300
°
C. MS1
scans
were
acquired
with
a
range
of 375
−
1200
m
/
z
and a resolution
of 60 000 with a cycle
time
of 3 s. The maximum
injection
time
was set to auto,
and
the normalized
AGC
target
was set to 300%.
Precursor
ions
with
charges
ranging
from
+2 to +6 were
selectively
targeted
for fragmentation
using
a minimum
intensity
threshold
of 5e3.
Dynamic
exclusion
was set to exclude
after
one acquisition,
with
a 45 s exclusion
duration
and 10 ppm
mass
tolerance.
MS2
scans
were
acquired
in the Orbitrap
at 60 000 resolution
with
a isolation
window
of 1.6
m
/
z
,
HCD
collision
energy
set
Figure
3.
Bottom-up
proteomics
analysis
of HeLa
cell lysates
(200
pg sample
load)
digested
with
three
different
enzymes
(trypsin,
LysC,
GluC)
analyzed
with
five different
LC gradient
run times
(10, 14, 20, 30, 50 min (
n
= 3); see also SI Table
S1, Figures
S4
−
S6).
A. Number
of quantified
proteins
(q<0.01)
and B. peptides
(q<0.01)
across
different
LC gradient
run times.
C. CV distributions
and D. proportion
of quantified
peptides.
E.
Quantitative
rank
plots
for the same
sample
runs.
Proteins
identified
exclusively
using
the shortest
(10 min)
gradient
time
are denoted
in red.
Journal
of
Proteome
Research
pubs.acs.org/jpr
Article
https://doi.org/10.1021/acs.jproteome.4c00062
J. Proteome
Res.
XXXX,
XXX,
XXX
−
XXX
D