of 12
1
Scientific
Data
|
(2020) 7:78
|
https://doi.org/10.1038/s41597-020-0415-9
www.nature.com/scientificdata
a NWB-based dataset and
processing pipeline of human
single-neuron activity during a
declarative memory task
N.
Chandravadia
1
, D.
Liang
2
, a.
G.
P.
Schjetnan
3
, a.
Carlson
1
, M.
Faraut
1
, J.
M.
Chung
4
,
C.
M.
Reed
4
, B.
Dichter
5,6
, U.
Maoz
2,7
, S.
K.
Kalia
8,3
, t
. a.
Valiante
3,8
, a.
N.
Mamelak
1
&
U.
Rutishauser
1,4,7,9,10
A challenge for data sharing in systems neuroscience is the multitude of different data formats used.
Neurodata Without Borders: Neurophysiology 2.0 (NWB:N) has emerged as a standardized data format
for the storage of cellular-level data together with meta-data, stimulus information, and behavior.
a key
next step to facilitate NWB:N adoption is to provide easy to use processing pipelines to import/export
data from/to NWB:N. Here, we present a NWB-formatted dataset of 1863 single neurons recorded
from the medial temporal lobes of 59 human subjects undergoing intracranial monitoring while they
performed a recognition memory task. We provide code to analyze and export/import stimuli, behavior,
and electrophysiological recordings to/from NWB in both MATLAB and Python. The data files are
NWB:N compliant, which affords interoperability between programming languages and operating
systems.
this combined data and code release is a case study for how to utilize NWB:N for human
single-neuron recordings and enables easy re-use of this hard-to-obtain data for both teaching and
research on the mechanisms of human memory.
Background & Summary
In-vivo
experiments in awake, behaving animals produce a large and complex mixture of different types of
data, which is typically stored in a heterogeneous set of files formatted in a variety of equipment-or labora
-
tory specific data formats. As a result, it is a challenge to share such data for re-use by others to, for example,
perform meta-analysis across datasets. To enable the wide-reuse and sharing of systems neuroscience data, it
is instrumental to utilize a standardized data format capable of storing all elements associated with an exper
-
iment. Key requirements for such a standard
1
3
include the ability to store large-scale complex data and meta
data, language-and platform independent accessibility, extensibility for custom use cases, and easy usability for
neuroscientists.
While various file formats have been introduced to address the requirements noted
4
9
, a universally accepted
standard has yet to emerge for the storage of cellular-level data. A comprehensive, standardized data format sat-
isfying these requirements that is suitable for the storage of cellular-level imaging and electrophysiology data has
recently emerged: the Neurodata Without Borders: Neurophysiology 2.0 format (NWB:N)
1
3
. NWB:N is designed
to store both raw and processed data and associated metadata for diverse types of imaging and electrophysiology
1
Department of Neurosurgery, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
2
Institute for Interdisciplinary
Brain and Behavioral Sciences, Crean College of Health and Behavioral Sciences, Schmid College of Science and
Technology, Chapman University, Orange, CA, USA.
3
Krembil Brain Institute, Toronto Western Hospital, Toronto,
canada.
4
Department of Neurology, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
5
Biological Systems &
Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
6
Department of neurosurgery,
Stanford University, Stanford, CA, USA.
7
Division of Biology and Biological Engineering, California Institute of
Technology, Pasadena, CA, USA.
8
Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto,
canada.
9
Computational and Neural Systems Program, California Institute of Technology, Pasadena, CA, USA.
10
Center for Neural Science and Medicine, Department of Biomedical Science, Cedars-Sinai Medical Center, Los
Angeles, CA, USA.
e-mail:
urut@caltech.edu
Data De
SCR
iPtoR
oPeN
2
Scientific
Data
|
(2020) 7:78
|
https://doi.org/10.1038/s41597-020-0415-9
www.nature.com/scientificdata
www.nature.com/scientificdata/
data. NWB:N provides APIs for both Python and MATLAB to store, query, and retrieve data in a platform and
programming language independent manner. NWB:N utilizes HDF5 (Hierarchical Data Format) (see
https://
www.hdfgroup.org/solutions/hdf5/
) as a storage backend, which is well-suited to store large amounts of data and
which is supported by many programming languages (including Python, MATLAB, and C
++
), assuring acces-
sibility and interoperability of NWB:N (see
https://neurodatawithoutborders.github.io/storage_hdf
). Within
NWB:N, data is organized according to the following primitives: Groups (which are similar to a folder), Datasets
(n-Dimensional Data tables), Attributes (the meta-data), and Links (References to Datasets). The NWB standard
makes use of these primitives to organize all data associated with an experiment (see
https://neurodatawithout-
borders.github.io/schemalanguage
).
Here, we describe how we exported a complex, large dataset of single neuron recordings from the
human medial temporal lobe and behavior to the NWB:N format
10
,
11
and show how to import and use the
NWB:N-formatted data to perform single-neuron analysis. The goal of this release is four-fold: (i) to demonstrate
the feasibility of using NWB:N for human single-neuron studies, (ii) to demonstrate that the resulting NWB:N
files are fully interoperable between programming languages, (iii) to provide MATLAB and Python code tem-
plates that can be used by others, and (iv) to release a large human single-neuron dataset as NWB:N (as part of a
new NIH BRAIN initiative consortium, we added 17 subjects and 288 neurons, including from a new study site,
relative to our previously released dataset, which used a proprietary format
10
). All NWB operations were executed
using the standard NWB:N Python (PyNWB, version 1.1.0) and MATLAB (MatNWB, version 0.2.1) APIs, which
we utilized to both export our data as well as to re-import it for analysis.
The data described here was recorded extracellularly from individual neurons in the human medial temporal
lobe (MTL) in patients with intractable epilepsy
12
,
13
. Patients were implanted with hybrid depth electrodes with
embedded microwires for the purpose of identifying their seizure focus
12
,
14
. We recorded the activity of single
neurons during the administration of a new/old recognition memory task that we and others have used exten-
sively to investigate the neural basis of declarative memory
10
,
11
,
15
17
.
Together, this data descriptor and the publicly available code and data demonstrate the utility of NWB:N as an
instrument to store, retrieve, and share cell-based electrophysiology data together with all associated meta data,
stimulus information and behavior. This release additionally provides tools in both MATLAB and Python that
will facilitate the adoption of NWB:N in the community of human intracranial recordings. Lastly, the experimen-
tal results shown confirm the reproducibility of previous results on the selectivity of MTL cells during the new/
old task at a new study site, together with 17 new subjects that were not previously released.
Methods
Although described extensively elsewhere
10
, here we briefly summarize details of the dataset, followed by
NWB:N-specific methods which are specific to this data descriptor.
Subjects.
In total, we recorded from patients across 89 sessions (see Online-only Table 1) during intracra-
nial monitoring of seizure activity in the epilepsy monitoring unit (EMU). Patients were admitted to the EMU
to localize their seizure focus for potential surgical excision. Each patient has a recording-site specific identifier
(H
=
Huntington Memorial Hospital, C
=
Cedars-Sinai Medical Center, T
=
Toronto Western Hospital). The
number of sessions that an individual patient performed was variable. If the patient performed more than one
session, a different variant of the task (with new images) was administered, thus allowing the patient to perform
various versions of the task (either 1, 2, or 3 with different stimuli). All patients provided written informed con-
sent to participate in the study. All protocols were approved by the Institutional Review Boards of the California
Institute of Technology, the Huntington Memorial Hospital, Cedars-Sinai Medical Center, and Toronto Western
Hospital.
task.
The task consists of two parts: an encoding and a recognition phase
10
. In the encoding phase, subjects
were presented with 100 novel images chosen from distinct visual categories (houses, landscapes, mobility,
phones, animals, fruits, kids, military, space, cars, food, people, and spatial). Subsequently, in the recognition
phase, subjects were presented with 50 “novel” images and 50 “old” images. During the recognition phase, sub-
jects indicated whether they thought that the image was “novel” (never seen before), or “old” (seen during encod-
ing) together with confidence ratings on a 1–6 scale. During the encoding phase, subjects indicated for each
image whether it contained an animal or not (yes or no).
Data acquisition.
To isolate the activity of single neurons in the human MTL, we utilized hybrid depth elec-
trodes with eight embedded microwires each (Ad-Tech Medical) as described previously
12
. The signal from each
microelectrode was locally referenced to one of the eight microelectrodes. The continuously acquired raw signal
was recorded with a Neuralynx ATLAS or Neuralynx Cheetah System (Neuralynx Inc.). Signals were recorded
broadband (0.1 to 9000
Hz) and sampled at 32
kHz. Offline, each channel (i.e., microelectrode) was band-passed
filtered from 300–3000
Hz before spike sorting.
Spikes were detected using threshold crossings of the local energy, or power, of the filtered signal, and sorted
offline with the semiautomatic template-matching algorithm Osort
18
. To classify the detected clusters as putative
units, we assessed the following criteria: (1) shape of mean waveform, (2) interspike interval distribution, (3)
violation of the refractory period (
<
3% of the spikes have an ISI of less than 3
ms), and (4) stable firing rate and
waveform amplitude during the task. For each isolated cluster, we computed several quality metrics for further
analysis and quantification of spike sorting quality (isolation distance, mean waveform, and signal-to-noise ratio).
3
Scientific
Data
|
(2020) 7:78
|
https://doi.org/10.1038/s41597-020-0415-9
www.nature.com/scientificdata
www.nature.com/scientificdata/
Data Records
NWB:N workflow: export.
Our goal is to create NWB:N files that include all data used and acquired dur
-
ing the experiment as well as accompanying meta data that is needed for subsequent analysis (Fig.
1
). In our case,
the source data (stored in proprietary formats) that is exported includes: the stimuli (pictures) shown to subjects,
behavioral responses (choices, reaction times), NEV (Neuralynx Event) files that indicate event markers (TTLs),
spike times and waveforms from the OSort spike sorting software (‘Ax_cells.mat’ files, where x is the channel
number), and information from the raw CSC (Continuously Sampled Channel) Neuralynx files. A variety of
customized code is needed to read these files from their original data format. We use these tools to import the
data either into MATLAB or Python and then utilize the NWB:N APIs to re-export the data for storage inside an
NWB:N file (Fig.
1
, left). This yields a single NWB:N file for each recorded session of the experiment. All data in
both NWB and the native format have been deposited online
19
.
Structure of the NWB file.
At the top-most level, an NWB file consists of several main groups, each of
which are a container (similar to a directory) for different subsets of the data (see Fig.
2a
for a summary). The
main groups of interest here are acquisition (recorded raw data streams), intervals (epochs/trials), stimulus (stim-
ulus data), units (spike times of isolated neurons), and general (metadata on devices, electrodes, and subject).
Within each main group, different sets of pre-defined variables are part of the NWB:N specification. Each variable
in NWB:N is of a pre-specified type, called ‘neurodata_types’. For each pre-specified type, a certain set of variables
are mandatory, assuring standard compliance. For example, each Group is of type NWBContainer. Similarly, each
Dataset specification within each Group is represented by the type NWBData, which all other base types, includ-
ing Image, VectorData, DynamicTableRegion, and Index, inherit. Below, we next describe the elements that we
utilized within the different top-level Groups (Fig.
2a
).
A key goal of the NWB:N standard is to include all meta-data of each experiment within each NWB file. To
achieve this, we have utilized the various meta-data fields within the NWB:N file to specify all the pertinent infor
-
mation needed to understand and analyze an experiment. Note that, in particular, many of the pre-specified data
fields within the NWB file have a free text ‘description’ field that we utilized to add additional information. There
are both structured/required meta-data fields such as the start time of the experiment (e.g., ‘session_start_time’),
and descriptive/unstructured free text explanatory fields such as ‘description’ (a field that is part of many of the
NWB data types used). Note that in order to protect PHI (patient health information), we had to omit or modify
a small subset of the metadata provided. For instance, in the field session_start_time, we set only the year and
month of the experiment but defaulted the actual day of the experiment to the first of the month for all sessions.
NWB file content: acquisition group.
The \acquisition Group contains the raw data and meta-data col-
lected for each session that is essential to align the behavioral markers with the processed data. Two streams are
included: \acquisition\events (‘events’) and \acquisition\experiment_ids (‘experiment_ids’). Both streams include
the same number of entries in the same order.
Fig. 1
Overview of NWB workflow. Data is first acquired and stored in equipment/laboratory specific formats
(left). This data is then read into MATLAB (top row) or Python (bottom row) and exported into NWB (middle).
Subsequently, either MATLAB or Python can be used to read the NWB files and analyze the data (right). The
example data loaded and plotted is the mean waveform of an individual neuron separately for the two phases of
the task.
4
Scientific
Data
|
(2020) 7:78
|
https://doi.org/10.1038/s41597-020-0415-9
www.nature.com/scientificdata
www.nature.com/scientificdata/
\Events stores data and timestamps along with a meta-data field (‘description’) that details the meaning of the
behavior markers. Data stores the event markers (i.e., TTLs) of the experiment (see Table
1
for a summary). The
following TTL values are used: Start of Experiment (55), Stimulus Onset (1), Stimulus Offset (2), Question Screen
Onset (3), New/Old Response (20 or 21), Confidence of Response (31–36), End of Trial (6), End of Experiment
(66). For the learning block, at the time marked as “Question Screen Onset” (TTL
=
3), the question “Is this an
animal?” is shown. There are two possible answers, which are encoded as either 20 (Yes, this is an animal) or 21
(No, this is not an animal). For the recognition block, at the time marked as “Question Screen Onset” (TTL
=
3),
the question “Have you seen this image before?” is shown. There are six possible answers, which are encoded
as TTLs 31–36 [31 (new, confident), 32 (new, probably), 33 (new, guess), 34 (old, guess), 35 (old, probably), 36
(old, confident)]. The timestamps (recorded in seconds relative to start of the experiment) record the time each
experiment marker occurred.
For every entry in \Events, there is also an entry in \experiment_ids that stores the following attributes: data
and timestamps. Here, data refers to the trial type, either learning or recognition with the corresponding times-
tamps (events and experiment_ids has the same number of entries, thereby assigning each TTL to an experi
-
ment). This information is used to designate which block a trial corresponds to. The learning block is labeled with
only one of the following: 80, 83, or 88, while the recognition block is labeled with only one of the following: 81,
84, or 89 (see Table
2
for a summary). The experiment_ids vary only so that different runs of the same experiment
can be disambiguated.
Fig. 2
Organization of an NWB file when used for storing human single-neuron data. (
a
) Top-level structure of
an NWB file. The top-level groups are acquisition, general, intervals, stimulus, and units. (
b
) Illustration of the
\units (top) and \electrode (bottom) table. Shown are three example units (top) and three example electrodes
(bottom). Notice how the electrode table refers to the Device table (right).
5
Scientific
Data
|
(2020) 7:78
|
https://doi.org/10.1038/s41597-020-0415-9
www.nature.com/scientificdata
www.nature.com/scientificdata/
NWB file content: general group.
Second, the \general Group contains metadata about the experiment
(Fig.
2a
). There are several sub-groups: general\devices (‘devices’), general\extracellular_ephys (‘extracellular_
ephys’), and general\subject (‘subject’). Devices documents the device(s) used for signal acquisition, which here is
the Neuralynx Inc. amplifier (“Neuralynx-Atlas”) or (“Neuralynx-cheetah”). Other signal acquisition systems can
be indicated here accordingly by adding a new entry to ‘devices’. General/extracellular_ephys contains informa-
tion about the electrodes recorded from, including their location (brain area and coordinates), impedance, and
filters used (Fig.
2b
, bottom). This information is combined in the electrodes table, which is part of the extracel-
lular_ephys group. For example (see Fig.
2b
), the \electrodes table identifies that ‘neuron1’ has id 0, was recorded
in the Left Hippocampus (
location
) with (
19.0 mm,
12.2 mm,
13.3
mm) as the MNI coordinates (
x, y, z
), and
the filter applied before spike sorting was 300–3000
Hz. The origChannel (a custom column) refers to the hard
-
ware channel that was used to record from this electrode. An explicit object reference in the ‘group’ column of the
\electrodes table links to an ElectrodeGroup, which contains additional information about the electrodes used.
Here, the information provided is that the electrodes were microwires. The ‘device’ soft link (Fig.
2b
, lower right)
within the ElectrodeGroup contains an object reference to the Device group (/general/devices), which provides
additional metadata about the electrodes and recording system used (here, we used one entry to describe the
combination of both). Lastly, the general/subject group contains meta-data about the subject (age, description,
sex, species, and subject id).
NWB file content: interval group.
The \intervals Group contains information about individual trials
in the field \trials. It contains the following trial attributes: start_time, stop_time, delay1_time, response_time,
delay2_time, new_old_labels_recog, response_value, category_name, stimCategory, and stim_phase. There is one
entry for every trial. Start_time is the time of stimulus onset of each trial, and stop_time is the time of stimulus
offset. Delay1_time is the time of the question screen onset, and response_time records the time the subject pro-
vided a response. Delay2_time indicates the end of the trial. All times are in seconds. The remaining attributes
provide additional information about each trial: response_value is the response (button press) given by the sub-
ject to the stimuli shown (see acquisition group for details on the response values), while response_time indicates
the time of the response relative to the start of the experiment, stim_phase describes the part of the experiment
this trial belongs to (learning or recognition), category_name and stimCategory indicates the visual category the
image shown belongs to (as a string and number, respectively). New_old_labels_recog provides the ground truth
label of whether the trial showed a new or old stimulus during the recognition phase (0 is old, 1 is new).
NWB file content: stimuli group.
The \stimuli Group stores the stimuli (i.e., images) presented during the
experiment. Each stimulus is listed within stimuli\presentation\ as stimuli_learn_x and stimuli_recog_x, with
x
=
1
...
100. The actual image is stored within each as the data attribute. There are a total of 200 trials (100 encod-
ing trials and 100 recognition trials). The order corresponds to the order of stimuli presented during the task with
the category of each stimulus specified within \intervals\category_name.
NWB file content: units group.
The \units Group contains information about all recorded units (“sin-
gle neurons”) after spike sorting, including their electrophysiological features (e.g., spikes, waveforms, etc.).
Event ID
Description
55
Start of Experiment
1
Stimulus Onset
2
Stimulus Offset
3
Question Screen Onset
20, 21
Response During Learning
a
31–36
Response During Recognition
b
6
End of Trial
66
End of Experiment
Ta b l e 1
.
Event markers (“TTLs”) used.
a
During the learning phase, subjects are instructed to respond to the
following question: “Is this an animal?” in each trial. Response are encoded as “Yes, this is an animal” (20)
and “No, this is not an animal” (21).
b
During the recognition phase, subjects are instructed to respond to
the following question: “Have you seen this image before?” in each trial. Responses are encoded as: 31 (new,
confident), 32 (new, probably), 33 (new, guess), 34 (old, guess), 35 (old, probably), 36 (old, confident). The
‘description’ field within \acquisition\events of the NWB file also contains the information listed in this table.
Experiment ID
Description
80, 83, 88
Learning Phase
81, 84, 89
Recognition Phase
Ta b l e 2
.
Experiment IDs used. The learning and recognition phase are denoted by the IDs listed. A session will
have one of the following ID pairs (learning, recognition): (80, 81), (83,84) or (88,89). The ‘description’ field
within \acquisition\experiment_ids of the NWB file indicates the experiment ID used for each phase of the
experiment of that particular session.