of 36
Synthetic recording and
in situ
readout of lineage information in
single cells
Kirsten L. Frieda
1,*
,
James M. Linton
1,*
,
Sahand Hormoz
1,*
,
Joonhyuk Choi
2
,
Ke-Huan K.
Chow
1
,
Zakary S. Singer
1
,
Mark W. Budde
1
,
Michael B. Elowitz
1,3,§
, and
Long Cai
2,§
1
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena,
California 91125, USA.
2
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena,
California 91125, USA.
3
Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California 91125,
USA.
Abstract
Reconstructing the lineage relationships and dynamic event histories of individual cells within
their native spatial context is a long-standing challenge in biology. Many biological processes of
interest occur in optically opaque or physically inaccessible contexts, necessitating approaches
other than direct imaging. Here we describe a synthetic system that enables cells to record lineage
information and event histories in the genome in a format that can be subsequently read out of
single cells
in situ.
This system, termed memory by engineered mutagenesis with optical
in situ
readout (MEMOIR), is based on a set of barcoded recording elements termed scratchpads. The
state of a given scratchpad can be irreversibly altered by CRISPR/Cas9-based targeted
mutagenesis, and later read out in single cells through multiplexed single-molecule RNA
fluorescence hybridization (smFISH). Using MEMOIR as a proof of principle, we engineered
mouse embryonic stem cells to contain multiple scratchpads and other recording components. In
these cells, scratchpads were altered in a progressive and stochastic fashion as the cells
proliferated. Analysis of the final states of scratchpads in single cells
in situ
enabled reconstruction
of lineage information from cell colonies. Combining analysis of endogenous gene expression
with lineage reconstruction in the same cells further allowed inference of the dynamic rates at
which embryonic stem cells switch between two gene expression states. Finally, using simulations,
we show how parallel MEMOIR systems operating in the same cell could enable recording and
readout of dynamic cellular event histories. MEMOIR thus provides a versatile platform for
information recording and
in situ
, single-cell readout across diverse biological systems.
Reprints and permissions information is available at
www.nature.com/reprints
.
Correspondence and requests for materials should be addressed to L.C. (lcai@caltech.edu) or M.B.E. (melowitz@caltech.edu).
*
These authors contributed equally to this work.
§
These authors jointly supervised this work.
Author Contributions
K.L.F. and J.M.L. performed the experiments with assistance from S.H., J.C., K.K.C. and Z.S.S.; K.L.F. and
S.H. analysed the data; S.H. performed the simulations; M.B.E. and L.C. supervised the project. All authors wrote the manuscript.
The authors declare competing financial interests: details are available in the online version of the paper.
Supplementary Information
is available in the online version of the paper.
HHS Public Access
Author manuscript
Nature
. Author manuscript; available in PMC 2019 April 29.
Published in final edited form as:
Nature
. 2017 January 05; 541(7635): 107–111. doi:10.1038/nature20777.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Somatic mutations occur stochastically and independently in different cells, and are
inherited from one cell generation to the next. They can therefore leave a record of lineage
relationships, or other information, in the genomes of related cells. Pioneering work showed
that sequencing can be used to identify somatic mutations and thereby recover lineage
information
1
6
. However, sequencing has generally required disrupting the spatial context of
cells, and somatic mutations are distributed throughout the genome, hindering their
identification and analysis. Two recent advances together enable an alternative approach.
First, CRISPR/Cas9 (refs
7
9
) can target mutagenesis to specific genomic elements,
facilitating the continuous and controlled generation of stochastic genetic variation at
designated genomic regions. Second,
in situ
single cell analysis by sequential smFISH
10
,
11
(seqFISH) allows genetic information to be directly interrogated in a highly multiplexed
fashion in individual cells within native tissue. Together, these techniques could in principle
permit recording and
in situ
readout of genetic changes at specific loci for lineage
reconstruction and event recording.
To implement such a system, we devised a bipartite genetic recording element termed the
‘barcoded scratchpad’. The state of this scratchpad can be stochastically altered in live cells
and read out
in situ
in single cells by smFISH (Fig. 1a, Extended Data Fig. 1a). The
scratchpad element consists of 10 repeat units
12
. gRNA targeting of Cas9 to the scratchpad
generates double-strand breaks that result in its deletion, or ‘collapse’. (Fig. 1a, b). Adjacent
to each scratchpad, we incorporated a co-transcribed barcode (Supplementary Table 1). The
barcode and scratchpad components can each be identified using specific sets of smFISH
probes (Supplementary Table 2), and thus serve as an addressable ‘bit’.
Using a pool of such barcoded scratchpads enables lineage recording and readout through a
two-step process. During cell proliferation, Cas9 generates gradual and stochastic
accumulation of collapsed scratchpads in each cell lineage. Subsequently, cells can be fixed
and analysed by seqFISH to identify barcodes and assess their states based on the presence
or absence of a co-localized scratchpad signal (Fig. 1c).
To implement the MEMOIR system, we engineered a stable mouse embryonic stem (ES)
cell line, designated MEM-01, incorporating barcoded scratchpads, Cas9, and a scratchpad-
targeting gRNA (Fig. 1b). First, we used PiggyBac transposition
13
to integrate a set of 28
barcoded scratchpad elements into the genome. We identified a clone in which 13 different
barcodes were highly expressed (Extended Data Fig. 1b–d). Within this line, we stably
integrated a Cas9 variant containing an inducible degron to allow external modulation of
Cas9 activity
14
. Finally, we engineered a scratchpad-targeting gRNA expressed from a Wnt-
regulated promoter
15
(Methods), to enable both external control as well as recording of Wnt
pathway activity.
Using this cell line, we verified that smFISH could detect scratchpad collapse. After 48 h of
Cas9 and gRNA induction, we observed a substantial loss of scratchpad smFISH signal, but
not barcode signal (Fig. 2a, b, Extended Data Fig. 2). By contrast, in cells in which
MEMOIR recording was not induced, co-localization between barcode and scratchpad
signals was observed in approximately 90% of the transcripts, consistent with expected
smFISH accuracies
16
,
17
(Fig. 2b, c). Although individual barcoded scratchpad transcripts
Frieda et al.
Page 2
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
appeared either collapsed or uncollapsed based on co-localization, cells typically exhibited a
mixture of collapsed and uncollapsed scratchpads with the same barcode owing to the
existence of multiple genomic integrations undergoing independent collapse events
(Extended Data Fig. 1b). Together, these results indicate that scratchpad states can be altered
and that the fraction of collapsed scratchpads for each barcode can be subsequently read out
in situ
.
The fraction of collapsed scratchpads increased progressively over time after Cas9 and
gRNA induction, as required for MEMOIR operation. We observed an approximately 27%
decrease in mean co-localization fraction after 48 h of Cas9 and gRNA induction (Fig. 2b,
c). Additionally, the collapse rate correlated with the level of gRNA expression, suggesting
that collapse rates are tuneable (Extended Data Fig. 2d). By contrast, in the absence of
induction, scratchpad states remained stable (Extended Data Fig. 2e–g). Further, a Cre-
activated gRNA functioned similarly to the Wnt-activated gRNA (Extended Data Fig. 3a–d),
and scratchpad collapse also occurred in CHO-K1 cells and budding yeast (Extended Data
Fig. 3e, f), suggesting that the system design can be generalized to other methods of
activation and to other species. Finally, we verified that seqFISH could enable readout of 13
distinct barcoded scratchpads in single cells using 7 rounds of hybridization (Fig. 2d, e;
Methods).
To analyse cell lineage, we activated MEMOIR and allowed cells to grow for 3 or 4
generations, while performing time-lapse imaging to establish an independent ‘ground truth’
lineage for later validation (Fig. 3a). We then fixed the cells and analysed their barcoded
scratchpads by seqFISH (Fig. 3b). Altogether, we analysed 108 colonies, including 836
cells.
Inspection of scratchpad collapse patterns revealed lineage information. For example, in one
colony, barcode 9 was differentially collapsed between two 4-cell clades, consistent with a
collapse event occurring after the first cell division (Fig. 3c, left). Similarly, barcode 2
revealed distinct collapse frequencies between first cousins, but similar frequencies between
sister cell pairs (Fig. 3c, middle). Barcode 10 provided additional lineage information, as
different sister cell pairs showed collapse frequencies that were similar to each other but
different from their cousins (Fig. 3c, right). These examples, along with others (Extended
Data Figs 4 and 5), show how scratchpad collapse patterns can provide insight into lineage
relationships.
To analyse lineage reconstruction more systematically, we tabulated scratchpad collapse
frequencies for all probed barcodes in each colony (Fig. 3d) and used these data to calculate
a cell-to-cell ‘distance’ matrix, representing differences in collapse patterns between each
pair of cells (Fig. 3e; Supplementary Information). We then applied a binary hierarchical
clustering algorithm adapted from phylogenetic analysis to these distance scores in order to
reconstruct a lineage tree
18
,
19
(Fig. 3f; see Methods). Finally, as validation, we compared
each reconstructed tree to the actual colony lineage obtained directly from the corresponding
time-lapse video (Fig. 3a).
Frieda et al.
Page 3
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Across all 108 colonies, we observed a broad distribution of reconstruction fidelity (Fig. 3g,
all colonies). However, using a bootstrap procedure to rank colonies based on the robustness
of reconstruction to resampling of the underlying data, it was possible to identify colonies
with more informative scratchpad collapse patterns, and these tended to reconstruct with
higher accuracy (Extended Data Fig. 6; Methods). For example, within the top 20% of
colonies ranked by bootstrap, 72% of lineage relationships were correctly reconstructed
(Fig. 3g, subset 1 and Extended Data Fig. 6a).
To compare these results to theoretical expectations, we simulated idealized MEMOIR
operation in three-generation binary trees (Methods). As expected, mean reconstruction
fidelity increased with the number of distinct scratchpads and required relatively few
scratch-pads to reach high fidelity. For example, fidelity was
81
−29
+19
%
(mean and 68% central
confidence interval) for 10 scratchpads and
93
−8
+7
%
for 20 scratchpads at the experimentally
measured collapse rate of approximately 0.1 per scratchpad per cell generation (Fig. 3h).
With around eight scratchpads, the performance of these idealized simulations matched that
of the bootstrap selected colonies (Fig. 3g, Extended Data Fig. 6b, subset 1), consistent with
the majority of the 13 barcoded scratchpads targeted by seqFISH providing useful
information. The diversity of states generated corresponds to approximately 2
8
= 256
scratchpad configurations, comparable to the number of distinguishable alleles observed by
sequencing-based approaches
20
.
The current implementation of MEMOIR exhibited limited reconstruction depth and
accuracy. To understand the relevant sources of error, we performed more detailed
simulations, incorporating empirical measurements of noise in both recording (Cas9 and
gRNA expression) and readout (for example, scratchpad expression and smFISH detection)
(Extended Data Fig. 7). Notably, stochasticity in Cas9 and gRNA expression, as well as
smFISH detection, contributed relatively minor errors in reconstruction. Rather, for a given
number of scratchpads, the primary sources of error in reconstruction were stochastic
fluctuations in scratchpad expression, and ambiguities introduced due to multiple
incorporations of the same barcoded scratchpad (Fig. 3i, Extended Data Fig. 7;
Supplementary Information). On the basis of this analysis, future versions of MEMOIR can
be improved by increasing the number of unique scratchpad variants and reducing noise in
their expression (see Supplementary Information for further discussion of potential
improvements). These improvements should enable MEMOIR to reconstruct deeper and/or
more sparsely sampled trees (Extended Data Figs 8 and 9).
Because MEMOIR is compatible with same-cell measurements of endogenous gene
expression through additional rounds of smFISH, it can provide both lineage and endpoint
cell state information for the same colony. This combination can provide insight into the
dynamics of switching between gene expression states (Fig. 4a). For example, ES cells
stochastically transition among states with distinct expression levels of the pluripotency
regulator
Esrrb
21
,
22
. To infer the rates of these transitions, we measured
Esrrb
expression,
and assigned each cell a probability of being in a high or low
Esrrb
expression state
23
,
24
(Fig. 4b; Supplementary Information). Using the MEMOIR-inferred lineage, we found that
sisters or first cousins were significantly more likely to appear in the same
Esrrb
expression
Frieda et al.
Page 4
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
state compared with pairs of second cousins (
P
< 0.004) (Fig. 4c, d). Using a dynamic
inference framework
24
,
25
, we further inferred the quantitative rates of switching between
states (Fig. 4d, right panel, and Extended Data Fig. 10; Supplementary Information), and
verified that they were consistent with direct measurements of switching dynamics
23
. Going
forward, multiplexed
in situ
transcriptional profiling of endogenous genes
10
,
11
, together with
MEMOIR, should enable analysis of more complex dynamic cell state transition processes.
The design of MEMOIR provides a platform that can record and read out histories of
dynamic cellular events beyond lineage information (Fig. 4e, f). Specifically, orthogonal
gRNAs expressed from signal- specific promoters can in principle record multiple
intracellular signals onto distinct sets of scratchpads. We simulated binary trees of six
generations in which different cell lineages experienced distinct time courses of two input
signals (Fig. 4g). In these simulations, one gRNA variant was constitutively expressed solely
to enable lineage reconstruction using one set of scratchpads. In addition, each of the signals
activated expression of a corresponding gRNA variant, generating collapse events in its own
specific set of 50 scratchpads, at a rate proportional to the signal magnitude. By analysing
endpoint scratchpad collapse patterns for all three sets of scratchpads, we were able to
reconstruct both lineage trees and event histories (Fig. 4e–g; Methods). This reconstruction
process takes advantage of the reconstructed lineage tree to map the most likely assignment
of collapse events from the signal-recording gRNAs to specific positions on the lineage tree,
with a maximum possible time resolution of one cell cycle (since the sequence of collapse
events within a cell cycle cannot be distinguished). Thus, over timescales of multiple cell
cycles, MEMOIR should enable analysis of the sequence, duration, and magnitude of signals
along individual cell lineages (Fig. 4g).
Using genomic DNA as a writable and readable recording medium within living cells is a
long-standing goal of synthetic biology
26
30
. A key application for this technology is to
enable analysis of lineage and molecular event histories that unfold in complex and optically
inaccessible developmental systems over timescales of multiple cell generations. MEMOIR
provides a proof of principle, showing recording and readout of such information with
endpoint single-cell
in situ
measurements. Importantly, the capacity of MEMOIR can be
extended beyond the current demonstration using more scratchpads with improved designs
and highly multiplexed seqFISH
10
,
11
. Thus, we anticipate this approach will open up new
ways of studying developmental trajectories in developing embryos, tumours, and other
systems, eventually enabling us to read, within their native spatial contexts, each cell’s own
individual ‘memoir’.
METHODS
Data reporting.
No statistical methods were used to predetermine sample size. The experiments were not
randomized and the investigators were not blinded to allocation during experiments outcome
assessment.
Frieda et al.
Page 5
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
MEMOIR component construction.
The scratchpad transposon was constructed from a ten-repeat array (20X PP7 stem loops)
derived from plasmid pCR4–24XPP7SL
12
and ligated directionally using BamH1 and BglII
sites into a modified form of the PiggyBac (PB) vector PB510B (SBI) lacking the 3
insulator and including a multiple cloning site (MCS). The CMV promoter was then
removed using NheI and SpeI and replaced by a PGK promoter with Gibson assembly. A
gBlock (IDT) containing the AvrII and Xhol restriction sites, priming sequences, and the
BGH polyA was then introduced 3
of the PP7 array by Gibson assembly using the EagI
site in the backbone. Unique barcodes were then inserted into the transposon in the region 3
of the scratchpad array either by Gibson assembly or directed ligation using AvrII and XhoI.
A total of 28 unique barcode sequences (Supplementary Table 1, GenScript Biotech) derived
from
Saccharomyces cerevisiae
were used to generate the barcoded scratchpads. Scratchpad
transposons were found to produce transcripts with half-lives of approximately 2 h
(Extended Data Fig. 1e–g).
The Cas9 construct was made using hSpCas9 from pX330
7
. First, the FKBP degron (DD)
was PCR-amplified from pBMN FKBP(DD)-YFP
14
and introduced with Gibson assembly
into pX330 restricted with AgeI, 5
of the open reading frame of hSpCas9, to create pX330-
DD-hSpCas9. DD-hSpCas9 was amplified from this plasmid by PCR and introduced into
another plasmid, 3
of a PGK promoter using Gibson assembly. After sequence verification,
the PGK-DD-hSpCas9 construct was excised using restriction enzymes (AvrII and SacII),
blunted with T4 polymerase, and ligated into a modified form of the PiggyBac vector
PB510B (SBI) lacking the CMV promoter and including a MCS. A non-transposon version
of Cas9 was also created using hSpCas9 amplified from pX330 and introduced with Gibson
assembly at the 3
end of a CMV promoter containing two Tet operator sites into a standard
plasmid backbone.
The Wnt-pathway-responsive gRNA expression transposon was created using a LEF-1
response element
15
. The enhancer and promoter combination exhibited low basal activity,
large dynamic range, and responsiveness to the GSK3 inhibitor CHIR99021 and the Wnt3a
ligand. This Wnt sensor was cloned upstream of a nuclear localization signal (NLS)-tagged
mTurquoise2, which served as a reporter of guide expression, that contained an embedded
gRNA. The gRNA was flanked by self-cleaving ribozymes to excise it from the mRNA
31
,
32
,
and was purchased as a gblock (IDT) and inserted using Gibson assembly between the end
of the mTurquoise2 coding sequence and a SV40 polyA. This construct was contained in a
modified form of the PiggyBac vector PB510B.
The Cre-activated gRNA expression transposon was created using the U6 TATA-lox
promoter design
33
, as illustrated (Extended Data Fig. 3a). The promoter, shRNA against
mTurquoise2, and gRNA regions were purchased as a gblocks or oligos (IDT) and inserted
into a modified form of the PiggyBac vector PB510B containing PGK-H2B-mTurquoise2.
Cell line engineering and culture conditions.
To create MEM-01 we co- transfected the E14 mouse embryonic stem cell line (ATCC cat
no. CRL-1821) with expression plasmids for-hSpCas9 and the Tet repressor and then
Frieda et al.
Page 6
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
selected on neomycin. A single Cas9-positive clone was then used for co-transfection of 28
PB transposon barcoded scratchpads and a PB transposon PGK-palmitoylatedmTurquoise2/
HygroR to facilitate segmentation of cell membranes and selection on hygromycin.
Subsequent scratchpad-containing clones were inspected for overall scratchpad expression
by smFISH. Scratchpad clones were also assessed for Cas9 expression, which was found to
be very low and heterogeneous in most clones, with no expression in many cells (for
example, 6 ± 21 transcripts per cell). A scratchpad clone with good scratchpad expression
was then simultaneously transfected with the DD-hSpCas9 PB transposon (to improve Cas9
expression (26 ± 17 transcripts per cell)) and the Wnt-activated gRNA expression PB
transposon. Cells were selected on blasticidin. Single clones were assessed for activation
potential on the basis of mTurquoise2 expression in response to CHIR99021 (Stemgent) or
Wnt3a (1324-WN-002 R&D systems), and enhanced Cas9 expression was m easured by
smFISH. Among these clones was MEM-01, which demonstrated good gRNA activation in
response to Wnt3a and increased Cas9 activity in the presence of the stabilizing agent,
Shield1 (Clontech) (Extended Data Fig. 2c). MEM-01 resembled the parental E14 line in
terms of cell morphology, cycle times, and expression of pluripotency markers including
Esrrb
,
Nanog
, and SSEA-1. Stably selected MEMOIR lines containing a Cre-activated
gRNA were similarly engineered (Extended Data Fig. 3a–d).
The transfections described above were carried out using Fugene HD (Promega) at a mass
(μg) DNA/volume (μl) Fugene ratio of 1:3 and following the manufacturer’s instructions.
For transfection of the PB components a total DNA mass of 1μg was used at a ratio of 6:1,
PB transposons to PB transposase PB200PA-1 (SBI). For selection with antibiotics,
transfected cells were lifted with Accutase (ThermoFisher) after transfection media was
removed and plated on 100-mm plates (Nunc). 24 h later growth media was replaced with
selection media. Single colonies were lifted from selection plates as they matured.
During standard cell culturing, ES cells were maintained at 37 °C and 5% CO
2
in GMEM
(Sigma), 15% ES cell qualified fetal bovine serum (FBS) (Gibco/ThermoFisher), PSG (2
mM L-glutamine, 100 units per ml penicillin, 100 μg ml
−1
streptomycin) (ThermoFisher), 1
mM sodium pyruvate (ThermoFisher), 1,000 units per ml Leukaemia Inhibitory Factor (LIF,
Millipore), 1 ×Minimum Essential Medium Non-Essential Amino Acids (MEM NEAA,
ThermoFisher) and 50–100μM
β
- mercaptoethanol (Gibco/ThermoFisher). Cells were
maintained on polystyrene (Falcon) coated with 0.1% gelatin (Sigma).
Quantitative PCR.
For detection of genomic barcode copy number, genomic DNA was prepared from cells
using the DNeasy Blood and Tissue kit (Qiagen). DNA was quantified on a NanoDrop 8000
spectrophotometer (ThermoScientific). Reactions were assembled as above with around
1,000–5,000 haploid genome copies, based on 3 picograms per haploid genome
approximation. For gene expression analysis, total RNA was prepared using the RNeasy
Mini kit (Qiagen). One microgram of total RNA was used with the iScript cDNA synthesis
kit (BioRad) following the manufacturer’s instructions. For qPCR a 1:20 dilution of the
cDNA was used in each reaction. All reactions were performed with IQ SYBR Green
Supermix (BioRad). Reaction cycling was carried out on a BioRad CFX96 thermocycler.
Frieda et al.
Page 7
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Both genomic DNA and cDNA samples were compared against
Sdha
copy n umber or
expression level, respectively. Analyses included at least three biological replicates with
each reaction run in triplicate, unless otherwise noted. Primer sets for all barcodes and
normalizers were obtained from IDT, and the efficiencies of all primer pairs were tested.
Time-lapse videos and cell culture for imaging.
Tissue culture grade glass bottom 24-well plates (MatTek) were treated with laminin-511
(20 μg ml
−1
) (Biolamina) for 4 h at 37 °C and plated with cells at approximately 2,500 cells
per cm
2
. Cells were exposed to Wnt3a (50–100 ng ml
−1
) and Shield1 (50–100 nM) at the
time of plating. After approximately 16 h, cells were selected for time-lapse imaging based
on system activation, assessed by visible mTurquoise2 signal, and then imaged in an
incubated microscope environment every 14 min over 20–40 h before being immediately
fixed. Samples were fixed with 4% formaldehyde in PBS for 5 min. Samples cultured for
smFISH imaging, but without time-lapse video tracking, were prepared similarly (typically
with a higher plated cell density) and activated for different lengths of time, as stated.
Single molecule fluorescence
in situ
hybridization (smFISH).
Hybridization and imaging were carried out as previously described
23
with the following
exceptions: scratchpad transcripts were targeted with 40 DNA oligo 20mer probes and
barcode regions were targeted with 18 20mer probes (Supplementary Table 2). Probes were
coupled to one of three dyes (Alexa 555, 594 or 647 (ThermoFisher)) and used at
approximately 130 nM concentration per probe set. Post-hybridization, cells were washed in
20% formamide in 2× SSC containing DAPI at 30 °C for 30 min, rinsed in 2× SSC at room
temperature, and imaged in 2× SSC. For seqFISH, after imaging each round of
hybridization, 2× SSC was replaced with wash buffer for about 5 min at room temperature
and then replaced with the next probe set in hybridization buffer for overnight incubation.
Most barcode signals from the previous hybridization were no longer visible during imaging
of the following hybridization (owing to photobleaching and probe loss facilitated by the
small number of barcode probes (18) used per barcode); any remaining visible transcripts
were computationally subtracted during analysis. Incubation, washing, and imaging
proceeded as above for up to nine rounds of hybridization.
For analysis of smFISH images, semi-automated cell segmentation and dot detection were
performed using custom Matlab software. Raw images were processed by a Laplacian of the
Gaussian filter and then thresholded to select dots. Co-localization between dots in the
scratchpad image and barcode image was detected if both dots were above the threshold and
within a few pixels of each other. To generate the histogram of intensities for the collapsed
and uncollapsed scratchpads in Fig. 2b, we integrated the fluorescence intensities in the
regions of the scratchpad smFISH image that corresponded to individual barcode dots or the
detected scratchpad dots, respectively. For the collapse rate experiment in Fig. 2c and
Extended Data Fig. 3c, we measured the aggregate smFISH scratchpad co-localization levels
for four highly expressed barcodes in cells that had been induced for different lengths of
time. For activating conditions shown in Fig. 2b, c, only data from cells that were actually
activated (as assessed by mTurquoise2 expression) were included.
Frieda et al.
Page 8
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Lineage reconstruction of experimental data.
Cell-to-cell barcode distance scores were determined for each pair of cells based on the
similarity of the two cells’ co-localization fractions for each barcode and weighted by the
barcode’s transcript number (as a measure of confidence in the observation). See
Supplementary Information for details.
Lineage trees were reconstructed from the cell-to-cell barcode distance matrices using a
modified version of a standard agglomerative hierarchical clustering algorithm
34
.
Reconstructions were constrained to binary trees such that cells were paired into sisters
before first cousin pairs were assigned. Pairing proceeded by successively grouping pairs of
cells or cell clusters with the minimum barcode distance. At each step, if the two most
optimal (that is, minimum distance) pairings were close in distance, the algorithm optimized
for the lowest combined distance of the current and next minimum distances. The distance
between two clusters was computed using the standard UPGMA algorithm
19
by averaging
the cell-to-cell barcode distance between all possible pairs of cells across the two clusters.
Bootstrap to identify robust reconstructions.
For each colony, the barcoded scratchpad data were resampled by bootstrap and
corresponding lineage trees were reconstructed (
n
= 1,000 resampled reconstructions per
colony). On the basis of the frequency at which the original cousin clades occurred in the
resampled reconstructed trees, a robustness score was assigned to each colony. Colonies
whose clade reconstructions were less sensitive to resampling showed significantly
improved overall reconstruction accuracy. Subsets of colonies with more reliable
reconstructions could thus be selected without prior knowledge of their accuracy by
selecting colonies with higher robustness scores, for example, scores in the top 20–40% of
the data.
Alternative metrics for identifying colonies with robust lineage information were also tested.
These metrics similarly enriched for subsets of data with improved reconstruction accuracy,
further supporting the observation that some colonies showed clear lineage information
while others did not acquire well-defined collapse patterns, probably owing to limited,
excessive, or ambiguous collapse events.
Lineage reconstruction simulations.
To simulate MEMOIR for three-generation binary trees, we started with one cell with a fixed
number of idealized scratchpads. At each division, the daughter cells inherited the same
scratchpad profile as their parent and independently collapsed each uncollapsed site with a
fixed probability, defined as the collapse rate. After three generations, the scratchpad profiles
of the eight resulting cells were used to reconstruct their lineage tree using either a modified
neighbour joining algorithm
34
, or the Camin–Sokal maximum parsimony algorithm
35
that
exhaustively scored all 315 possible tree reconstructions. Both forward simulations and the
reconstruction algorithms were implemented in Matlab. For the heat map and the cumulative
distribution functions shown in Fig. 3g–i, the fraction of correct relationships was computed
as the fraction of all distinct pairwise relationships in the actual tree that were correctly
identified in the reconstructed tree. If multiple reconstructions were equally valid (same
Frieda et al.
Page 9
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
parsimony score), the fraction of correct relationships was averaged over all of them.
Reconstruction accuracy was tested over a wide range of collapse rates (Fig. 3h) or for the
approximate collapse rate observed in our experiments, 0.1 per site per generation (Fig.
3g,i). The empirical collapse rate, 0.1, was estimated from the observed co-localization
fraction of the barcodes, ~0.67, in 108 MEM-01 colonies induced for approximately 48 h
(same colonies as in Fig. 3). In Extended Data Fig. 8a, trees of a higher number of
generations were reconstructed from the final collapse pattern using a modified neighbour
joining algorithm
34
in which allowed reconstructions were restricted to full binary trees.
Fraction of correct relationships was again computed as the fraction of all distinct pairwise
relationships in the actual tree that were correctly identified in the reconstructed tree
averaged over at least 1,000 trees.
Event recording simulations.
Simulation of signal recording.—
To demonstrate event recording, we simulated the
same forward tree-generation algorithm as in the MEMOIR lineage reconstruction
simulations (Fig. 3h and Methods), for trees of six generations, assuming 50 idealized
scratchpads and a collapse rate of 0.1 per scratchpad per generation. The simulated cells also
contained two additional sets of recording scratchpads of 50 sites each (Fig. 4e). We
assumed these scratchpads collapsed through independent events occurring at rates
proportional to the magnitude of their respective input signals. The minimum and maximum
collapse rates at low and high signal were set to 0 and 0.2 per scratchpad per generation,
respectively. The magnitude of the input signals varied over time and from branch to branch
as shown in Fig. 4f, g, resulting in different collapse rates for each of the two recording
scratchpad sets over time and along different lineages.
Reconstruction of simulated signal dynamics.—
We first reconstructed the lineage
tree using only the lineage-tracking scratchpad sites. This reconstruction used a neighbour-
joining algorithm, as in Fig. 3h
34
. We then reconstructed the history of the collapse events
of the recording scratchpads on the reconstructed lineage tree. For this procedure, we used a
Camin–Sokal maximum parsimony algorithm
35
. In brief, the algorithm proceeds from the
leaves of the tree to the root. At each generation, it infers the collapse state of the parental
node, based on the known collapse states of the two daughters, while minimizing the number
of new collapse events occurring between the parent and the daughters. For binary
scratchpads this corresponds to computing the intersection between the collapse patterns of
the two daughters. This procedure is then repeated for the parent and its sister until reaching
the root. At the end of this procedure, one obtains a maximum parsimony assignment of
scratchpad states to each node in the tree. On the basis of these assignments, we calculated
the number of scratchpad collapse events in recording scratchpads that occurred along each
branch. Finally, this reconstructed collapse level provides an estimate of the underlying
signal intensity along each lineage (for example, actual and reconstructed signals shown for
two lineages of interest in Fig. 4g).
Data availability.
Data that are not included in the paper are available upon reasonable request to the authors.
Frieda et al.
Page 10
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Extended Data
Frieda et al.
Page 11
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Extended Data Figure 1 |. MEM-01 consistently expresses short-lived transcripts from multiple
integrated barcoded scratchpads.
a
, The barcoded scratchpad transposon is composed of the following elements (left to right):
the PiggyBac 5
terminal repeat (triangle), the chicken HS4 insulator
36
, a PGK promoter
driving expression of the hygromycin resistance coding sequence, a 5
FRT site, the PP7
scratchpad array consisting of 10 repeats, a 3
FRT site, a barcode sequence (Supplementary
Table 1), a priming region for sequencing and PCR, the BGH polyA, and the PiggyBac 3
terminal repeat (triangle).
b
, Unique genomic integrations for the MEM-01 cell line were
detected by qPCR. Bars show mean ± s.d. of four biological repeats with individual data
points marked.
c
, The relative RNA expression levels of barcode integrations were
quantified by RT–qPCR. Bars show mean ± s.d. of three biological repeats with individual
data points marked.
d
, Scratchpad expression profiles remain constant over 1.3 months of
passaging. Low- and high-passage cultures of MEM-01 cells (light and dark bars,
respectively) were assayed for RNA expression levels by RT–qPCR. The unchanged
expression levels indicate that most barcoded scratchpads express at a consistent level and
are not routinely silenced over time. Bars show values from single biological samples with
error bars calculated by combining in quadrature the technical replicate variation in barcode
and normalizer quantitation cycle, Cq, values.
e
g
, RNA half-lives assessed by RT–qPCR
analysis of transcript levels after blocking transcription with actinomycin D (10μg ml
1
).
e
,
Frieda et al.
Page 12
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Barcoded scratchpad transcripts were assayed with two different sets of qPCR primers (left
and right panels). These data indicate a half-life of approximately 2 h.
f
,
g
,
Myc
and
Sdha
are known to have short and long mRNA half-lives, respectively, and were assessed as
controls, for comparison
37
39
.
Myc
half-life (
f
) of 1 h was shorter than the other measured
half-lives, while
Sdha
(
g
) was longer lived. For
Sdha
, the measured half-life value (indicated
with an asterisk) is expected to overestimate the true value, as
Sdha
levels were determined
relative to those of the similarly long-lived gene
Atp5e
, whose transcript levels were also
decaying over the time course. A previous estimate of
Sdha
half-life in mESCs was 8–13 h
(ref.
37
). All sample transcript levels were assessed relative to those of
Atp5e
37
39
.
Transcript abundances were normalized to 1 at time zero. Decay curves were fit assuming
one-phase exponential decay using weighted nonlinear least squares regression (
e
,
f
) or
assuming a linear approximation to exponential decay (
g
). Half-lives were determined on the
basis of the best fit decay constants and a range reported based on the 95% confidence
interval (shown in parentheses). Data represent two biological replicates with multiple
technical replicates; error bars show standard deviations.
Frieda et al.
Page 13
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Extended Data Figure 2 |. Barcoded scratchpads collapse to truncated products in activated cells
and are stable in full-length and collapsed forms.
a
, Agarose gel electrophoresis of PCR amplified scratchpads reveals scratchpad collapse
after gRNA induction. Full-length scratchpads were amplified from plasmid DNA (lane 1),
as well as from cells without gRNA constructs (lane 3), or with uninduced gRNAs (lane 4).
By contrast, cells expressing gRNA showed shorter products (lane 5). Cells with no
scratchpads are also shown as a negative control (lane 2). Bands corresponding to the full-
length scratchpad and the collapsed scratchpad are indicated (arrows). Note that the
laddering effect seen in all lanes and gels is due in part to PCR amplification artefacts with
Frieda et al.
Page 14
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
the repetitive arrays. For gel source data, see Supplementary Fig. 1.
b
, The lowest molecular
weight band from scratchpad collapse, as shown in lane 5 in
a
, was extracted and subcloned
into a vector. Nine of the colonies were sequenced. They aligned to a single repeat unit with
5
and 3
flanking regions, suggesting complete collapse of the repeats owing to Cas9
activity. Six of the nine sequencing reads resulted in collapse to a perfect single repeat (with
a possible point mutation in the scratchpad sequence associated with barcode 2), and the
remaining three sequencing reads had additional small deletions in the scratchpad.
c
,
Scratchpad collapse requires induction of both Cas9 and gRNA. The gel shows scratchpad
states for MEM-01 cells treated with no ligand, with Shield1 (to stabilize Cas9 protein), with
Wnt3a (to induce gRNA expression), and with both Wnt3a (100 ng ml
−1
) and Shield1 (100
nM), all after 48 h.
d
, Scratchpad collapse increased with increasing gRNA activation, as
assessed using smFISH to detect scratchpad co-localization with four highly expressed
barcodes. Cells were analysed either without gRNA activation or 48 h after gRNA activation
by addition of Wnt3a and Shield1 (same concentrations as in
c
). gRNA expression was
measured by the intensity of co-expressed nuclear mTurquoise signal. Box plots show
median (red bar), first and third quartiles (box), and extrema of distributions;
n
= 1,826,
1,081, 345, 191 cells, left to right. Related to Fig. 2c.
e
g
, Scratchpad states remain stable
over extended periods.
e
, Unactivated MEM-01 cells maintained uncollapsed scratchpads
over timescales of months.
f
, To check the stability of individual barcoded scratchpad
variants over time, multiple subclones of MEM-01 were isolated after no activation (control;
top panels) and after a pulse of activation for 24 h (Wnt3a 100 ng ml
−1
, Shield1 100 nM;
bottom panels). Subclones were assessed for the states of different barcoded scratchpad
types after initial isolation (0 month relative age, left) and after one month of maintenance
(right). The apparent collapse states (from uncollapsed to fully collapsed) of the barcoded
scratchpad types were distinct in different subclones and remained stable over a month,
indicating that scratchpad states are stable over these timescales.
g
, Barcoded scratchpads
are also stable over long periods as assessed by smFISH readout. The fraction per cell of
barcode transcripts (from four distinct barcode types) that co-localized with scratchpad
signal was essentially unchanged between an unactivated low passage cell culture and one
maintained for over a month. The imperfect co-localization fraction is largely the result of
errors in smFISH detection and not gradual scratchpad collapse. Boxplots as in
d
;
n
= 1,826,
983 cells, left to right.
Frieda et al.
Page 15
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Extended Data Figure 3 |. Scratchpad collapse works with an alternative gRNA, and in multiple
cell types.
a
d
, A Cre-recombinase-activated gRNA is effective at inducing collapse events.
a
,
Schematic of Creactivated gRNA system. The construct contains a constitutive PGK
promoter driving expression of a histone 2B (H2B)–mTurquoise fusion protein (the H2B
provides nuclear localization). This is followed by a U6 TATA-
lox
promoter
33
driving
expression of an shRNA against mTurquoise, followed in turn by a polyT (T6)
transcriptional stop, and then a gRNA directed against scratchpad regions. Prior to Cre
expression, expression of the shRNA keeps mTurquoise levels low (brown dashed line) and
prevents expression of the gRNA. After the introduction of Cre, the shRNA-stop cassette is
Frieda et al.
Page 16
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
removed, allowing mTurquoise and gRNA expression. Thus, mTurquoise provides a visual
marker of gRNA expression. This type of gRNA architecture could allow MEMOIR
activation in specific tissues expressing Cre.
b
, PCR analysis shows that Cre can induce
scratchpad collapse. Gel shows genomic DNA from a clonal cell line harbouring the
construct in
a
. Scratchpads appear uncollapsed in untransfected cells (left lane), but show
significant collapse after transfection with mRNA encoding Cre protein (right lane,
approximately 52 h after transfection). Note that the laddering effect seen in all lanes and
gels is due in part to PCR amplification artefacts with the repetitive arrays.
c
, smFISH
analysis reveals Cre-activated scratchpad collapse. Quantification of barcode–scratchpad co-
localization fractions as measured by smFISH. Cre transfection reduced scratchpad and
barcode co-localization levels in cells that showed evidence of Cre activity, as assessed by
mTurquoise expression (right). Transfected cells that were mTurquoise-negative or low and
untransfected cells retained high co-localization levels (middle and left). Co-localization
levels per cell were assessed based on the co-localization of four expressed barcodes with
scratchpad transcripts. Box plots show median (red bar), first and third quartiles (box), and
extrema of distributions;
n =
995, 643, 649 cells, left to right.
d
, Example smFISH images of
scratchpad and barcode co-localization detected in single cells containing the Cre-activated
gRNA. Some activated cells (top panels, mTurquoise expression ‘on’) show loss of co-
localized signal for a specific barcode (top panels, lower cell). Unactivated cells, as assessed
by low mTurquoise expression, typically show no loss of co-localization (bottom panels).
Scale bars, 10 μm.
e
,
f
, Scratchpads in CHO-K1 cells and yeast also undergo Cas9/gRNA-
dependent collapse.
e
, Cas9- and gRNA-expressing plasmids were transiently transfected
into Chinese Hamster Ovary (CHO-K1) cells containing stably integrated scratchpads. Gel
analysis reveals Cas9 and gRNA-dependent scratchpad collapse (middle lane), while
transfection with a Cas9-expressing plasmid alone or control plasmids resulted in no
collapse (left and right lanes, respectively).
f
, Scratchpad collapse was tested in a yeast strain
with doxycycline-inducible Cas9 and gRNA and integrated scratchpads. Before inducing
Cas9-gRNA expression (lane 1 and 3), the scratchpads were intact. After Cas9-gRNA
induction with 2 μg ml
1
doxycycline for 11 h, scratchpads appeared collapsed (lane 2 and
4). Left two lanes (lanes 1 and 2) and right two lanes (lanes 3 and 4) correspond to two
biological replicates. Note that the scratchpads in CHO-K1 and yeast cells have a similar
scratchpad PP7 array to that used elsewhere but different flanking sequences, so their
absolute PCR product lengths differ. For gel source data, see Supplementary Fig. 1.
Frieda et al.
Page 17
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Extended Data Figure 4 |. Examples of lineage reconstruction for ten colonies.
Data for ten colonies that reconstructed with > 70% of pairwise relationships correctly
identified are shown here. The bubble chart shows the number of barcode transcripts
detected (bubble size) and the uncollapsed fraction (colour scale). Matrix of cell-to-cell
barcode distance (dissimilarity) scores were computed from the data. Low (blue) values
indicate more similar barcoded scratchpad collapse patterns. Note that sisters and cousins
tend to have lower distance scores than second cousins, creating a block diagonal pattern in
the distance matrix. Lineage trees were reconstructed based on the distance matrix using an
agglomerative hierarchical clustering algorithm (see Methods). Cluster distances from the
reconstruction algorithm are shown as branch heights in the reconstructed linkage trees.
Percentages on the linkage trees represent frequencies of clade occurrence from a barcode
resampling bootstrap. The percentage of correct relationships identified by the depicted
lineage reconstruction is shown as a percentage and the actual tree is reported as [(
x y
)(
x y
)]
[(
x y
)(
x y
)], where sister pairs are denoted as (
x y
) and cousins are grouped in brackets
([...]).
Frieda et al.
Page 18
Nature
. Author manuscript; available in PMC 2019 April 29.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript