Deep learning acceleration of multiscale superresolution localization photoacoustic imaging

Kim et al.

Light: Science & Applications

(2022) 11:131

fi

cial journal of the CIOMP 2047-7538

https://doi.org/10.1038/s41377-022-00820-w

www.nature.com/lsa

ARTICLE

Open Access

Deep learning acceleration of multiscale

superresolution localization photoacoustic imaging

Jongbeom Kim

,GyuwonKim

,LeiLi

, Pengfei Zhang

, Jin Young Kim

1,4

,YeonggeunKim

,HyungHamKim

Lihong V. Wang

✉

,SeungchulLee

✉

and Chulhong Kim

1,4

✉

Abstract

A superresolution imaging approach that localizes very small targets, such as red blood cells or droplets of injected

photoacoustic dye, has signi

fi

cantly improved spatial resolution in various biological and medical imaging modalities.

However, this superior spatial resolution is achieved by sacri

fi

cing temporal resolution because many raw image

frames, each containing the localization target, must be superimposed to form a suf

fi

ciently sampled high-density

superresolution image. Here, we demonstrate a computational strategy based on deep neural networks (DNNs) to

reconstruct high-density superresolution images from far fewer raw image frames. The localization strategy can be

applied for both 3D label-free localization optical-resolution photoacoustic microscopy (OR-PAM) and 2D labeled

localization photoacoustic computed tomography (PACT). For the former, the required number of raw volumetric

frames is reduced from tens to fewer than ten. For the latter, the required number of raw 2D frames is reduced by 12

fold. Therefore, our proposed method has simultaneously improved temporal (via the DNN) and spatial (via the

localization method) resolutions in both label-free microscopy and labeled tomography. Deep-learning powered

localization PA imaging can potentially provide a practical tool in preclinical and clinical studies requiring fast temporal

and

fi

ne spatial resolutions.

Introduction

Photoacoustic imaging (PAI), a hybrid imaging tech-

nology employing optical excitation and ultrasonic

detection, enables multiscale in vivo imaging on scales

from organelles to organs

. PAI generates ultrasonic

waves by shining short laser pulses onto biomolecules,

which absorb the excitation light pulses, undergo tran-

sient thermo-elastic expansion, and transform their

energy into ultrasonic waves, called photoacoustic (PA)

waves. The induced PA waves are detected by an

ultrasound (US) transducer. Depending on the light

illumination pattern, the US

transducer frequency, and

the target imaging depth, the PAI modality is commonly

divided into two modes: photoacoustic microscopy

(PAM) and photoacoustic computed tomography

(PACT). Thus, PAI can provide multiscale and multi-

parametric imaging solutions covering resolutions from

nano to millimeters at imaging depths from hundreds of

micrometers to several centimeters. From single cells to

organs in vivo, preclinical PAI systems have been widely

used to obtain several types of information: molecular

(e.g., biomarkers, contrast agents, and gene expres-

sions), anatomical (e.g., vasculatures, lymphatic net-

works, and organs), and functional (e.g., oxygen

saturation, blood

fl

ows, metabolic rates, brain activity,

and responses to drug delivery and treatment)

–

.PAI

has also demonstrated its utility in clinical studies of

Open Access

This article is licensed under a Creative Commons Attribution 4.0 Internat

ional License, which permits use, sharing, adaptation, distribution and

reproduction

in any medium or format, as long as you give appropriate credit to the origina

l author(s) and the source, provide a li

nktotheCreativeC

ommons license,

and indicate if

changes were made. The images or other third party material in this article are included in the article

’

s Creative Commons license, unless indicated

otherwise in a credit line to the material. If

material is not included in the article

’

s Creative Commons license and your intended use is not permitted by sta

tutory regulation or exceeds the permitted use, you will need to obtain

permission directly from the copyright hol

der. To view a copy of this license, visit

http://creativecommons

.org/licenses/by/4.0/

Correspondence: Lihong V. Wang (

LVW@caltech.edu

) or Seungchul Lee

(

seunglee@postech.ac.kr

) or Chulhong Kim (

chulhong@postech.edu

)

Departments of Electrical Engineering, Mechanical Engineering, Convergence

IT Engineering, and Interdisciplinary Bioscience and Bioengineering, Graduate

School of Arti

fi

cial Intelligence, Medical Device Innovation Center, Pohang

University of Science and Technology (POSTECH), 77 Cheongam-ro, Nam-gu,

Pohang, Gyeongbuk 37673, Republic of Korea

Caltech Optical Imaging Laboratory, Andrew and Peggy Cherng Department

of Medical Engineering, Department of Electrical Engineering, California

Institute of Technology, 1200 E. California Blvd., MC 138-78, Pasadena, CA

91125, USA

Full list of author information is available at the end of the article

These authors contributed equally: Jongbeom Kim, Gyuwon Kim, Lei Li

1234567890():,;

various cancers, brain diseases, intestinal diseases, and

peripheral diseases

–

Until now, multiscale PAI systems have evolved by

improving their spatial and/or temporal resolutions. For

example, in optical-resolution PAM (OR-PAM), the

temporal resolution has been technically improved by

faster scanning and/or laser systems

. Theoretically, the

lateral spatial resolution is limited by optical diffraction,

while the bandwidth of the US transducer determines the

axial resolution

. Over the last decade, nonlinear PA

effects or localization methods,

fi

rst popularized through

single-molecule localization in

fl

uorescence microscopy,

such as photoactivated localization microscopy (PALM)

and stochastic optical reconstruction microscopy

(STORM), have been adapted in OR-PAM to improve its

limited spatial resolution

–

. Notably, a label-free

approach to localization OR-PAM using red blood cells

(RBCs) has provided superior spatial resolution without

any contrast agent

. However, obtaining a localized

image requires tens of 3D OR-PAM images, which can be

infeasible. Inescapably, to signi

fi

cantly improve the spatial

resolution, the temporal resolution must be sacri

fi

ced. In

PACT systems, the temporal resolution is technically

restricted by their multi-element US detection and the

laser pulse repetition rates, and acoustic diffraction fun-

damentally limits the spatial resolution

.Recently,

PACT systems using external contrast agents for locali-

zation have been actively explored in live animals, in an

effort to improve the spatial resolution while maintaining

the imaging depth

–

. Localizing and superimposing

the externally introduced agents in consecutive regular

PACT frames enables superresolution imaging beyond

the acoustic diffraction limit. However, similar to locali-

zation in OR-PAM, localization in PACT requires that

hundreds of thousands of images be overlapped, sig-

fi

cantly slowing the temporal resolution.

Computational strategies based on a deep neural net-

work (DNN) have proved effective in improving such

biomedical imaging modalities as optical microscopy, US

imaging, magnetic resonance angiography (MRI), and

computed tomography (CT)

–

. An especially interest-

ing emerging application minimizes data acquisition times

by reconstructing dense data from spatially or temporally

undersampled sparse data

. Here, we introduce DNN-

based frameworks to expedite localization-based PAI by

reconstructing dense images from sparse information for

both 3D label-free localization OR-PAM and 2D labeled

localization PACT. Without using any simulated data, we

train and validate the DNNs with only in vivo 3D OR-

PAM and 2D PACT images. Using only a few frames, our

3D DNN successfully reconstructs 3D dense super-

resolution OR-PAM images from sparse images, whereas

such a dense image generally requires tens of frames

to reconstruct. The 2D DNN synthesizes 2D dense

superresolution PACT images from sparse images with

12x fewer localized sources than those used for dense

images. Our DNN-based localization approach to PAI

simultaneously improves the temporal and spatial reso-

lutions, and it could signi

fi

cantly contribute to preclinical

and clinical studies requiring fast and

fi

ne imaging.

Results

Use of a DNN to reconstruct label-free and labeled

localization-based superresolution PA images from sparse

ones

Figure

shows an overview of our deep-learning (DL)-

based framework that reconstructs a high-density locali-

zation-based PA image that includes approximately the

same microvascular structural information as a dense

localization-based PA image. As ground truth, a high-

quality dense localization-based image is created by

superimposing

frames in OR-PAM or

target dye

droplet images in PACT. As an input of generators, a

poor-quality sparse localization-based image is produced

by superimposing

(

) frames in OR-PAM or

dro-

plets in PACT, which are randomly selected among the

frames or

droplets (Fig.

a, b). Due to the difference in

the localization processes of label-free OR-PAM and

labeled PACT, we reconstructed sparse localization-based

images for each case in different ways (Supplementary

Text and Figs. S1, S2). For localization OR-PAM, a regular

OR-PAM frame was translated into a localization frame

(Fig. S1). Then, we randomly selected the translated

localization frames to reconstruct sparse localization OR-

PAM images. Unlike the OR-PAM localization process, in

localization PACT, exogenous absorbers were extracted

from regular PACT images. Localized points were then

randomly picked to produce a sparse localization PACT

image (Fig. S2).

Our framework employs two types of DNNs to cover

both label-free localization OR-PAM and labeled locali-

zation PACT. Our network for localization OR-PAM

contains 3D convolutional layers to maintain the 3D

structural information of the volumetric OR-PAM ima-

ges, and our network for labeled localization PACT has

2D convolutional layers because PACT images are 2D

planar images. The DNNs, which are adapted from a

pix2pix framework based on a generative adversarial

network (GAN) with U-Net

–

, learn voxel-to-voxel or

pixel-to-pixel transformations from either a sparse

localization-based PA image or a dense one. The GAN

framework generally consists of a generator network that

reconstructs a synthetic image and a discriminator net-

work that outputs the probability that the input image is

real or synthetic

. Both networks are simultaneously

trained by competing against each other, and as training

progresses, the distribution of real images is learned to

synthesize new images more similar to real ones. In our

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 2 of 12

GANs, generators are designed based on U-net (Fig.

c),

which has recently proven effective for multiscale image

learning, especially PA image reconstruction

The generator for 3D OR-PAM images contains 17 3D

convolutional layers and roughly 43 million trainable

parameters (Table S1). The generator network for 2D

PACT images shares the same structure as the 3D net-

work, with 3D operations replaced with 2D operations,

and it contains roughly 102 million trainable parameters

(Table S1). One structural difference is that we adopted

the pixel shuf

fl

e operation in the expansion layer for the

2D localization PACT network, because utilizing the

transposed convolution operation resulted in unwanted

checkerboard artifacts (Table S1)

. We additionally

adopted both short skip connections (via element-wise

summation) and long skip connections (via channel-wise

concatenation) to the generator to help converge the

training quickly and recover the full spatial resolution

(Table S1)

. Especially in the short connection, we used

a max-pooling layer to emphasize the local maximum in

learning a residual representation of the input data. For

the 3D model, we concatenated the volumetric sparse

localization OR-PAM image and the volumetric regular

OR-PAM image and used them as input to the generator

to compensate for the vascular structure lacking in the

sparse localization-based images

. On the other hand,

for PACT, the performance was rather poor due to the

difference in the spatial resolutions of the regular and

sparse images, so the corresponding dense localization-

based image was fed as the target into the generator. Our

discriminators consist of

fi

ve convolutional layers con-

nected in series and contain approximately 5 million

trainable parameters for the 3D network and 1.5 million

trainable parameters for the 2D network (Fig. S3 and

Table S2). The dense localization-based image and the

image synthesized from the generator were used as inputs

for the discriminator. It is worth mentioning that we

fi

rst

trained our 2D network with localization OR-PAM

...

RBCs

# of droplets:

Droplets

Dense local.

Sparse local.

3D Label-free localization OR-PAM

2D Labeled localization PACT

Dense local.

Sparse local.

2D & 3D U-Net localization network

Encoder network

Decoder network

1. Regular PAM

2. Sparse local.

PAM

1. Sparse local.

PACT

3D dense local.

OR-PAM

2D dense local.

PACT

Contraction

Short skip

Bridge

Long skip

Expansion

Output convolution

2 (PAM)

1 (PACT)

256

512

1024

512

256

1024

128

32 1

128

# of frames:

# of droplets:

3D OR-PAM

2D PACT

Fig. 1 Overview of 3D-2D hybrid deep-learning localization imaging.

Acquisition of localization

OR-PAM dataset and

PACT dataset. Dense

localization-based images are generated using N frames in OR-PAM or N dye droplets in PACT. A sparse localization-based image is constructed using

k randomly selected images in OR-PAM or k droplets in PACT (k < N).

Visual representation of the customized 2D and 3D U-Net generator network

architecture. Either 3D sparse localization-based and regular OR-PAM images or a 2D sparse PACT localization image are fed as inputs to the

generator.

OR-PAM

optical-resolution photoacoustic microscopy,

PACT

photoacoustic computed tomography,

Sparse local.

sparse localization-based,

Dense local.

dense localization-based

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 3 of 12

maximum amplitude projection (MAP) images, and then

fi

ne-tuned the network using the localization PACT

dataset to compensate for the relatively small amount of

data in PACT compared to OR-PAM. We incorporated

the training strategy since the two angiographic datasets

share similar feature spaces that could provide useful

guidance to the networks during training. By adopting

this transfer learning technique, we could further

enhance the 2D networks

’

reconstruction ability

.While

training the network, to save a checkpoint, we evaluated

the network at every epoch using a validation set, which

consisted of 36 segmented volumetric images with 64 ×

64 × 64 pixels (for a 3D network) or 30 planar images

with 896 × 1024 pixels (for a 2D network). Network

training ended at 200 epochs, and the trained networks

were evaluated on an independent test set.

3D label-free localization OR-PAM based on a 3D DNN

Figure

represents representative 3D network outputs,

where regular OR-PAM images were obtained from a

mouse ear in vivo, and sparse images reconstructed with

the frame count of 5 were used as input. The total imaging

time for the dense localization-based image was 30 s,

whereas for the sparse image, it was just 2.5 s (Fig.

a, b).

The DNN localization OR-PAM images consist of

12 segmented volumetric images measuring 64 × 64 × 64

pixels along the x, y, and z axes, respectively. In Fig.

a, we

display PA MAP images with an amplitude-based color

map that enables comparing PA amplitude pro

fi

les.

Additionally, Fig.

b shows PA MAP images represented

with a depth-encoded color map

The 3D structural information is well-inherited from

the volumetric sparse images, thanks to the 3D opera-

tions in our DNN. To emphasize the reconstruction

ability of our trained network for producing 3D volu-

metric superresolution OR-PAM images, we enlarged the

region outlined by the green dotted boxes

“

in Fig.

which include two adjacent micro blood vessels. It is

qualitatively observed that the sparse localization-based

MAP image has a lower signal-to-noise ratio (SNR) and

sparser vessel connectivity than the dense and generated

DNN images. Furthermore, the line pro

fi

les of the

regions indicated by the white dashed lines in the mag-

fi

ed images are qualitatively comparable between the

DNN MAP images (Fig.

c). The two adjacent blood

vessels are clearly resolved in the DNN and dense

localization-based images, whereas they are not in the

regular OR-PAM image. The pro

fi

le from the sparse

image indicates a lower SNR.

To demonstrate the advantage of using our 3D net-

works to reconstruct volumetric superresolution OR-

PAM images, we also extracted B-scan images in the

regions highlighted by the blue dashed lines

“

in Fig.

(Fig.

d). The pro

fi

les were measured in the regions

highlighted by the white dashed lines in the B-mode

images. Similar to the pro

fi

les in the MAP images, the

unbranched blood vessels in the regular PA image are well

distinguished in the pro

fi

les of the DNN and dense

localization-based images. Notably, a blood vessel in the

sparse image is invisible, whereas the same blood vessel is

revealed with high contrast in the DNN localization-based

image. Also note that our network helps visualize vessel

connectivity. A blood vessel highlighted by the white

dashed circles, in which the sparse image has a low SNR,

is well-restored in the DNN localization-based image.

Even though the sparse image does not contain the ves-

sels, they are restored in the DNN localization-based

Depth [mm]

0.0

0.4

Min

Max

Norm. PA amp. [a.u.]

Regular

Sparse local.

(5 frames)

Dense local.

(60 frames)

DNN local.

iii

200

iiii

MAP

Depth-encoded

Close-up

MAP (i)

Close-up

B-mode (ii)

04080

0.12

0.04

0.08

Sparse local.

Regular

DNN local.

Dense local.

Norm. pixel

value [a.u.]

Distance [

ROI (ii): B-mode

ROI (i): MAP

0.1

0.3

0.5

120

Distance [

Fig. 2 Performance of 3D deep-learning localization OR-PAM.

MAP and

depth-encoded mouse ear images of regular, sparse

localization, DNN localization, and dense localization OR-PAM. Frame

counts of 60 and 5 are used for the dense and sparse localization-

based images, respectively. Close-up views of the regions outlined by

the green dashed boxes and cross-sectional B-mode images of the

region highlighted by the blue dashed lines

are displayed. Pro

fi

les of

the PA amplitude are indicated by the white dashed lines in the close-

up view of

MAP and

B-mode images, respectively.

OR-PAM

optical-

resolution photoacoustic microscopy,

MAP

maximum amplitude

projection,

photoacoustic,

Sparse local.

sparse localization-based PA

image,

DNN local.

deep neural network localization-based PA image,

Dense local. PA

dense localization-based PA image,

ROI

region of

interest,

Norm. PA amp.

normalized PA amplitude

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 4 of 12

image because our network is based on 3D convolutions,

allowing for the reference of adjacent pixels in 3D space.

These results prove that our DL-based framework can

reconstruct a dense 3D super-resolved OR-PAM image

from a sparse one, and can reduce the imaging time for an

agent-free localization OR-PAM image by a factor of 12

(Movie S1).

The number of frames used for the reconstruction of

agent-free 3D localization OR-PAM images directly

determines the quality of the superresolution localization-

based image. We prepared training, validation, and test

datasets with 2, 3, 4, 5, 6, 8, 10, 15, and 30 frames to

compare the output qualities and trained nine generator

networks for the 3D localization OR-PAM (Fig.

). Each

trained generator was applied to the test set, including

240 segmented volumetric images with pixel counts of

64 × 64 × 64 along the

,and

axes, respectively, which

were reconstructed with a frame count corresponding to

the training set. The results are summarized in Fig.

.The

sparse localization-based images are reconstructed with

frame counts of 2, 6, 10, 15, and 30 (Fig.

a), and their

corresponding DNN localization-based images (Fig.

are displayed. A dense localization-based image was

reconstructed with a frame count of 60 (Fig.

c). For the

input frame count of 2, the overall blood vessel structures

are well-restored, but the blood vessels are clumped in the

enlarged image. As the frame count increases, the clumped

vessels disappear, and the DNN localization-based images

become similar to the dense localization OR-PAM image.

Additionally, the 3D peak signal-to-noise ratio (PSNR) and

3D multiscale structural similarity (MS-SSIM) between the

DNN or sparse images and the dense images were calcu-

lated with frame counts of 2, 3, 4, 5, 6, 8, 10, 15, and 30

(Fig.

d, e)

. Both the PSNR and MS-SSIM increase with

the number of repetitions (Fig.

d). A PSNR value of

40.70 dB and MS-SSIM of 0.97 are achieved at a frame

count of 5 for the DNN localization-based images, while

corresponding metrics for the sparse images are 38.47 and

0.89, respectively. Our network achieved MS-SSIM values

of above 0.98 for input frame counts above 10.

To demonstrate the extrapolation ability of our trained

networks on datasets with various numbers of frames, we

compared the evaluation metrics (3D PSNR and 3D MS-

SSIM) obtained with all combinations of the frame

counts of trained networks and sparse images (Fig. S4a, b

and Table S3). In each column containing the scores

obtained with various counts of frames of the sparse

images and

fi

xed counts of frames of the trained net-

works, the top three scores are bolded in green. Scores

lower than that of the sparse images are bolded in red.

0.86

0.92

0.98

Min

Max

Norm. PA

amp. [a.u.]

200

Dense local.

Sparse local.

DNN local.

24 81530

Frame count

3D MS-SSIM

Frame count

3D PSNR [dB]

Sparse local.

DNN local.

2481530

Fig. 3 Performance of 3D deep-learning localization OR-PAM depending on frame counts. a

Sparse localization OR-PAM images reconstructed

with 2, 6, 10, 15, and 30 frames.

DNN localization OR-PAM images generated from sparse images.

A dense localization OR-PAM image

reconstructed with 60 frames. All images correspond to images in Fig.

. Graphs for

3D PSNR and

3D MS-SSIM evaluation metrics for frame counts

of 2, 3, 4, 5, 6, 8, 10, 15, and 30.

OR-PAM

optical-resolution photoacoustic microscopy,

photoacoustic,

Sparse local.

sparse localization-based PA

image,

DNN local.

deep neural network localization-based PA image,

Dense local.

dense localization-based PA image,

Rep.

repetition count,

Norm. PA

amp.

normalized PA amplitude,

PSNR

peak signal-to-noise ratio,

MS-SSIM

multiscale structural similarity

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 5 of 12

Note that the test dataset with 30 frames does not always

enhance the image quality in each column, because the

input images are already perceptually similar to the

ground truth. Both metrics have high values in cases

where the frame count of the dataset used in training is

similar to that of the test dataset image, which follows

intuitively. Although scores with a large difference

between the frame counts of the training and test sets

were lower than the score of the input image, network

results were further improved in most combinations. The

results demonstrate that our DNN framework can

improve the quality of a sparse image, even if the quality

of the sparse image used for training differs from that of

an actual input image to be tested. Thus, to some extent,

our 3D DNNs can extrapolate to data not included in the

training dataset.

2D labeled localization PACT based on a 2D DNN

Representative 2D network results, including regular

PACT, sparse localization-based, DNN localization-based,

and dense localization-based images, are displayed in

Fig.

, where regular PACT images were obtained from a

mouse brain in vivo. The dense localization-based image

was reconstructed with 240,000 dye droplets, whereas

20,000 droplets were used to generate the sparse

localization-based image measuring 896 × 1024 pixels

along the

and

axes, respectively (Fig.

a). Obtaining

the dense localization PACT image took half an hour

but only 2.5 min were required to acquire the sparse

PACT image. We enlarged the two areas indicated by

the green and blue dotted boxes in Fig.

a to observe the

synthetic ability of our network in detail. The connectivity

of blood vessels can be compared in the magni

fi

ed images:

it is dif

fi

cult to recognize the vascular morphology in the

regular and sparse localization-based images, whereas the

DNN and dense images exhibit microvasculatures. Fur-

thermore, we obtained the pro

fi

les of the regions indi-

cated by the white dotted lines in the magni

fi

ed images to

qualitatively compare the improvement (Fig.

b, c). The

graphs for the DNN and dense localization PACT images

depict two blood vessels not captured in the regular and

sparse images. The amplitudes of the blood vessels in the

DNN and dense localization-based images are also larger

than those in the regular and sparse images, which means

that the network can provide a higher SNR and contrast

than the sparse image. These results suggest that our DL-

based framework can provide the super-resolved PACT

image 12× faster than a conventional method (Movie S2).

To investigate the effect of the number of droplets on

the quality of output images synthesized by our DL net-

work, as in the study on localization OR-PAM, we used

various numbers of droplets (i.e., 1/32, 1/28, 1/20, 1/24,

1/16, 1/12, 1/8, 1/4, and 1/2 of the dense images

’

droplet

counts) (Fig.

). Each trained generator was applied to the

test set, consisted of 200 planar images measuring 896 ×

1024 pixels along the

and

axes, respectively, recon-

structed with a droplet count corresponding to that of the

training set. Sparse localization-based images used as

input were reconstructed with droplet counts of 7.5k, 15k,

30k, 60k, and 120k (Fig.

a), and the corresponding DNN

localization-based images as output are synthesized

(Fig.

b). A dense localization PACT image reconstructed

with a droplet count of 240k is displayed (Fig.

c). For a

more detailed comparison, we zoomed in on a speci

fi

area in each image. Although the droplet count of 7.5k

shows poor qualitative comparisons to the dense

localization-based image, it is con

fi

rmed that the sparse

images with droplet counts of above 15k were restored

similarly to the dense image. Additionally, we compared

Min

Max

Norm. PA amp. [a.u.]

Regular

Sparse local.

(20k droplets)

Dense local.

(240k droplets)

DNN local.

3 mm

300

iiii

300

iii

Close-up (i)

Close-up (ii)

Sparse local.

Regular

DNN local.

Dense local.

Norm. pixel value

[a.u.]

250

500

Distance [

0.05

0.15

0.25

ROI (ii)

300

600

ROI (i)

0.05

0.20

0.35

Distance [

Fig. 4 Performance of 2D deep-learning localization PACT.

Regular, sparse localization, DNN localization, and dense

localization PACT images of a mouse brain. Droplet counts of

240,000 and 20,000 are used for the dense and sparse localization-

based images, respectively. Close-up views of the regions outlined

by the

green and

blue dashed boxes in

are displayed. Pro

fi

les

of the PA amplitude indicated by the dashed lines in (

)

and (

)

images, respectively.

PACT

photoacoustic computed tomography,

photoacoustic,

Sparse local.

sparse localization-based PA image,

DNN local.

deep neural network localization-based PA image,

Dense

local.

, dense localization-based PA image,

ROI

region of interest,

Norm. PA amp.

normalized PA amplitude

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 6 of 12

the 2D PSNR and 2D MS-SSIM evaluation metrics to

quantify the ability of the 2D networks (Fig.

d, e). As the

droplet count of the sparse image increases, the localiza-

tion PACT image becomes denser, and thus the PSNR

and MS-SSIM increase. The results demonstrate that our

DL-based framework can reconstruct high-quality

superresolution localization PACT images within a

much shorter imaging time than typical localization

PACT imaging.

Similar to the extrapolation study in localization OR-

PAM, we compared the PSNR and MS-SSIM evaluation

metrics with various droplet counts of the sparse images

used for network training and the test set (Figs. S4c, d and

Table S4). The top three values in each column are bolded

in green, and output scores lower than the input are

bolded in red. Contrary to the results from localization

OR-PAM, the test datasets with high numbers of droplets

show high scores in most columns. Most of the generated

outputs also produced higher evaluation metric values

than sparse images, proving the extrapolation ability of

the 2D network. A possible reason for the improved

generalizability performance compared to the 3D network

is that we incorporated a transfer learning strategy when

training the 2D networks

. Thus, the 2D networks were

trained with datasets from a broader range of feature

spaces (localization OR-PAM and PACT), enabling

improved generalizability performance. The results

demonstrate that our 2D networks are robust to data

variations regarding the localized droplet count.

Discussion

For use with label-free OR-PAM and labeled PACT,

we introduce fast localization-based PA imaging based

on a DL method that reduces the need for large numbers

of images. Conventional localization methods for both

OR-PAM and PACT achieve super-resolved micro-

vasculatureimagesbycontinuouslyimagingatargetand

then localizing the absorber (i.e., RBCs for label-free

OR-PAM and dye droplets for labeled PACT). However,

consecutive imaging slows down temporal resolution,

limiting the widespread use of the technique in pre-

clinical and clinical applications requiring fast imaging.

The realized DL-based framework synthesizes dense

localization OR-PAM/PACT images from sparse

0.85

0.95

0.90

30k

15k

Min

Max

Norm. PA

amp. [a.u.]

240k

3 mm

Dense local.

Sparse local.

DNN local.

60k

7.5k

120k

7.5

120

2D PSNR [dB]

Droplet count (×10

)

Droplet count (×10

)

2D MS-SSIM

Sparse local.

DNN local.

7.5

120

30k

15k

60k

7.5k

120k

Fig. 5 Performance of 2D deep-learning localization PACT as a function of droplet counts. a

Sparse localization PACT images reconstructed

with droplet counts of 7.5k, 15k, 30k, 60k, and 120k.

DNN localization PACT images generated from parse images.

A dense localization PACT

image reconstructed with 240k droplets. All images correspond to images in Fig.

. Graphs for

2D PSNR and

2D MS-SSIM evaluation metrics for

droplet counts of 7.5k, 8.6k, 10k, 12k, 15k, 20k, 30k, 60k, and 120k.

PACT

photoacoustic computed tomography,

photoacoustic,

Sparse local.

sparse

localization-based PA image,

DNN local.

deep neural network localization-based PA image,

Dense local.

, dense localization-based PA image,

Norm. PA

amp.

normalized PA amplitude,

PSNR

peak signal-to-noise ratio,

MS-SSIM

multiscale structural similarity

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 7 of 12

reconstructed ones with tens of times fewer frames or

dye droplets than used in conventional dense images.

Our framework can reduce the data acquisition time by

12-fold for both localization OR-PAM (MS-SSIM >

0.97) and localization PACT (MS-SSIM > 0.92). These

results demonstrate that our technique could dramati-

cally enhance the temporal resolution of both super-

resolution localization OR-PAM and PACT without

qualitative sacri

fi

ces.

In detail, the framework consists of two subnetworks,

which are developed with 2D and 3D layers, respectively,

to cover both label-free volumetric localization OR-PAM

images and labeled planar localization PACT images.

Each subnetwork is adapted from the pix2pix frame-

work, whose generator is based on the U-net archi-

tecture

. In the training process of the 2D network,

the localization OR-PAM MAP images were

fi

rstusedas

input because of the relatively small amount of data in

PACT compared to OR-PAM. After pre-training with

the localization OR-PAM dataset, the network was

fi

ne-

tuned with the localization PACT dataset, a process

called transfer learning

. This training method allowed

us to train the 2D networks successfully with relatively

small amounts of PACT data.

Prior to our work, DNNs have been utilized in super-

resolution localization

fl

uorescence microscopy (i.e.,

PALM and STORM) to accelerate the localization ima-

ging process by reducing the total number of frames and

localizations that are required to reconstruct a super-

resolution localization image

. However, our work differs

in that PAI is scalable from microscopy to CT, covering

images on scales from micro to millimeters. Thus, our

framework can extend to preclinical/clinical applications

on various scales. Furthermore, we have shown feasibility

for not only 2D image data but also for 3D volume

structure (OR-PAM) by designing 3D convolutional

neural networks, which has not been demonstrated in

previous works.

An important caveat in our framework is the limited

memory size of the graphical processing unit (GPU).

Preprocessing is necessary because our DNNs use pre-

processed sparse localization OR-PAM or PACT images

as input, rather than using regular images. For the 3D

network, we use 3D convolutional layers to keep 3D

structural information intact; therefore, 3D volumetric

images are used as input. However, 3D images contain

many more pixels than 2D images, and in addition, 3D

convolutional kernels store more trainable parameters

than 2D kernels. Therefore, a sparse localization-based

image is used as an input to synthesize a dense

localization-based image, instead of using multiple regular

OR-PAM images. For localization PACT, a total of 36,000

regular PACT images are used to synthesize dense

localization-based images, and at least 1125 images are

used to synthesize sparse localization-based images.

Because using regular PACT images as input will over

fl

the GPU memory, we instead use a preprocessed sparse

localization-based image as input. Localization pre-

processing is also cumbersome and time-consuming, and

the framework can be much more user-friendly if regular

PA images are used as input instead of sparse localization-

based images. Using an auxiliary recurrent neural network

(RNN) to predict the

fl

ow positions of absorbers with a

minimum number of frames might construct a framework

with regular PA images as input and accelerate our fra-

mework further, a topic for future work.

Another scope of future research includes further

investigating the black-box mechanism of the proposed

DNNs, thus strengthening the reliability and interpret-

ability aspect of our method. Saliency mapping algorithms

(e.g., gradient-based class activation mapping

and layer-

wise relevance propagation

) can be utilized to better

understand how the highly nonlinear 2D and 3D con-

volutional

fi

lters operate to reconstruct dense images

from sparse. Such studies could shed valuable insight to

design a DNN model that is more robust to problems

such as false blood

fl

ow generation.

Although our initial study was conducted with OR-

PAM images of mouse ears and PACT images of mouse

brains, we believe that our established networks could, to

a certain extent, extrapolate to similar angiographic data

since microvascular pro

fi

les share morphological analo-

gies between similar sample types and structures (e.g.,

mammal retina, ear, brain, and subcutaneous micro-

vessels)

. Therefore, we aim to continually re

fi

ne our

DL frameworks

’

generalizability by training with more

images from various in vivo sample types and angio-

graphic structures. Furthermore, by combining our

established framework with transfer learning techni-

ques

, acquiring a large amount of data required for

retraining can be circumvented.

By reducing the image count needed in localization-

based PA methods, our DL framework enhances the

promising potential of existing in vivo label-free locali-

zation OR-PAM and labeled localization PACT. This

framework provides superresolution PA images tens of

times faster than conventional methods, so it can be used

to study phenomena such as immediate drug responses

that cannot be observed with conventional localization

methods. For superresolution OR-PAM images, dense

localization-based images are synthesized with the intact

3D structural information of sparse localization OR-PAM

images. One practical result is that this new method can

be used in diagnosing skin conditions and skin diseases,

such as skin tumors, warts, and fungal infections that

require accurate structural information. Utilizing the

framework can also signi

fi

cantly reduce the irradiated

laser and imaging time, reducing the subject

’

s burden

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 8 of 12

during imaging. In addition, it also increases the potential

utility of localization PA imaging in neuroscience, mon-

itoring brain hemodynamics and neuronal activity. The

improved temporal resolution makes high-quality mon-

itoring possible by sampling at a higher rate, allowing

analysis of fast changes that cannot be observed with

conventional low temporal resolution.

Materials and methods

Volumetric localization OR-PAM image acquisition and

preprocessing

Volumetric image data were obtained from a galvan-

ometer scanner OR-PAM system (OptichoM, Opticho,

South Korea), shown in Fig. S5. The system imaged a

region of interest (ROI) in a mouse ear over two hundred

times. The obtained volumetric data measured 256 pixels

along the

axis, with a pixel size of 3

μ

m. The pixel sizes

along the

and

axes were 3.75

μ

mand5

μ

m, respec-

tively. To use GPU memory ef

fi

ciently, we reduced the

number of pixels in the axial direction by four times with

bicubic downsampling and antialiasing in the B-mode

images. Considering the theoretical axial resolution limit

of over 114

μ

m for OR-PAM systems, this reduction

increased the training ef

fi

ciency of the 3D DL networks,

which had limited GPU memory (Supplementary Mate-

rials and Methods). Our previously reported agent-free

localization imaging process was used in the current

work (Supplementary Text)

. As in the previously

reported study, volumetric localization OR-PAM images

were reconstructed from 60 frames randomly selected

from the obtained data. The reconstructed image, called a

dense localization OR-PAM image, is the target for

training and ground truth for evaluation. A correspond-

ing regular OR-PAM image was randomly selected

among the 60 images. Using the same imaging process, a

corresponding sparse localization OR-PAM image was

reconstructed with

< 60 randomly selected images

among 60 images. Regular, sparse localization, and dense

localization OR-PAM images were paired. To standardize

the image pixel size, we cropped the volume images with

different pixel dimensions to 150 × 150 × 64 pixels. Before

being fed into our DNNs, the volumetric localization OR-

PAM images were augmented by random cropping to a

size of 64 × 64 × 64 pixels and random

fl

ipping in the

and

axes (with a

fl

ip probability of 0.5). A total of ~3000

pairs were prepared.

Planar localization PACT image acquisition and

preprocessing

RF signals acquired from the 512-channel DAQ systems

were

fi

rst jitter-corrected by using the PA signals from the

surfaces of the ultrasonic transducer elements as refer-

ence timings (Fig. S7). The conventional PACT images

were constructed using the dual-speed-of-sound universal

back-projection algorithm, with a pixel size of 25

μ

To trace injected dye droplets in the brain, we applied our

previously reported algorithm to the conventional PACT

images, precisely localizing the center of each droplet

(Supplementary Text)

. Adding up all the

droplets

yielded a superresolution image, called a dense localiza-

tion PACT image, de

fi

ned as the target for training and

the ground truth. Among the

droplets,

droplets

(

) were randomly selected to reconstruct a sparse

localization PACT image. A pixel size of 5

μ

m was used in

the superresolution image reconstruction. The sparse and

dense localization PACT images were paired. To mimic

localization OR-PAM MAP images and accommodate the

transfer learning process, the PACT images were reduced

from 2000 × 2400 pixels to 896 × 1024 pixels. The images

were cropped to 512 × 768 pixels for the training set to

utilize only regions with rich vascular pro

fi

les and

fl

ipped

in the

and

axes (with a

fl

ip probability of 0.5) for

augmentation. A total of ~500 pairs were prepared.

Arti

fi

cial neural network

The suggested framework is customized from the

pix2pix architecture

, a special conditional GAN for

image-to-image problems. The framework consists of two

distinct DNN models: (1) a 3D model built with 3D

operations for volumetric OR-PAM images, and (2) a 2D

model built with 2D operations for planar PACT images.

Although each model employs different dimensions of

operations, their architectures are uni

fi

ed (Figs.

, and S3

and Tables S1, S2). Each model includes a generator

network

and a discriminator network

. The generator

network

, adapted from U-net, consists of an encoder

network (downsampling blocks in Fig.

) and a decoder

network (up-sampling blocks in Fig.

). Each network is

further presented in Fig.

c and Table S1. In the 3D

model, the encoder takes two-channel images, including a

regular OR-PAM image and a sparse localization OR-

PAM image. In contrast, a sparse localization PACT

image is fed into the encoder in the 2D model. Each

model adopts different up-sampling methods: transposed

convolution for 3D, and pixel shuf

fl

e for 2D

. In the 2D

model, the spatial dropout

and batch normalization

layers were omitted because the operations deteriorated

the results (Table S1). The discriminator network consists

of four convolution blocks in series using the leaky rec-

fi

ed linear unit

function as the main activation func-

tion and an output convolution layer with a sigmoid

activation function (Fig. S3 and Table S2).

DL training is generally performed by minimizing the

objective function (also called the loss function). We

designed our loss functions using an adversarial training

scheme consisting of a generator network

and a dis-

criminator network

, which we optimized in an alter-

nating manner to solve the adversarial min-max problem

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 9 of 12

and boost the reconstruction performance:

min

max

data y

ðÞ

logD y

ðÞ

½þ

data

log 1

DGx

ðÞ

where

denotes the sparse localization PAI image used as

input, and

denotes the corresponding dense PAI image

used as the ground truth. The idea is that we train our

generator network

to fool the discriminator that

distinguishes the reconstructed PAIs from their dense

localization counterparts. The adversarial training strat-

egy allows our generator network

to create perceptually

superior images residing in the manifold of the real dense

PAIs. The adversarial loss function for our 3D localization

OR-PAM network is de

fi

ned as follows:

:

́

ðÞ

logD G x

ðÞ

where

denotes the number of pixels in each OR-PAM

image. We implemented the loss function by combining

the mean absolute error (MAE) with the adversarial loss

instead of the mean squared error, which yields poor

results in image-to-image translation tasks

. For the 2D

localization PACT network, we additionally incorporated

the MS-SSIM loss because it better preserved the contrast

in high-frequency regions

. The pre-training loss func-

tion for the transfer learning process is de

fi

ned as follows:

:

́

ðÞ

:

́

MSSSIM y

;

ðÞ

where

denotes transfer learning, and the

MSSSIM

function calculates the corresponding metric. After pre-

training the generator networks, we further trained the

networks with the PACT dataset, using the full adversarial

loss de

fi

ned as follows:

:

́

ðÞ

:

́

MSSSIM y

;

ðÞ

logD G x

ðÞ

The MS-SSIM loss was not used when training 3D

networks: using only the MAE loss provided better results

with stable performance. All trainable parameters were

initialized using the He normal initialization method

and optimized using the Adam optimizer

. In addition,

an L2 regularization technique was incorporated to avoid

over

fi

tting the network parameters

. To set model

checkpoints, we calculated the MS-SSIM metrics of the

validation set during training. All hyper-parameters,

including the loss function coef

fi

cients, were searched

using a grid search approach and were found suf

fi

cient for

all established networks (Table S5). All networks were

implemented using Python 3.8.3 with a PyTorch backend.

The 3D localization OR-PAM network training was

conducted on NVIDIA RTX 3090 GPUs and an Intel

®

Core

™

i9-10900X CPU. The 2D localization PACT net-

work training was conducted on an NVIDIA TITAN Xp

GPU and an Intel

®

Core

™

i5-8400 CPU.

PAI of animals in vivo

For OR-PAM, animal procedures in all experiments

followed the regulations of the National Institutes of

Health Guide for the Care and Use of Experimental

Animals, with permission from the Institutional Animal

Care and Use Committee of Pohang University of Science

and Technology (POSTECH). During PAI, female Balb/c

mice, 3

–

8 weeks old, were anesthetized by inhalation of

4% iso

fl

urane gas at a 1.0 L/min

fl

ow rate. A silicone

heating pad under the mouse kept the animal

’

s body

warm. The imaging experiments used a 532 nm wave-

length laser with a pulse energy of 10 mJ/cm

less than

the ANSI safety limit of 20 mJ/cm

. Before imaging, hair

was removed with a depilatory agent to maximize the PA

signal. The ultrasonic gel was applied between the poly-

vinyl chloride membrane of the water tank and the ear of

the mouse to match the impedances between the ear and

the ultrasonic transducer. For PACT, all experimental

procedures were conducted according to laboratory ani-

mal protocol (IA20-1737) approved by the Institutional

Animal Care and Use Committee of the California Insti-

tute of Technology. In PACT animal experiments,

–

8 weeks old female mice (Swiss Webster, Invigo) were

used. The left carotid artery of the mouse was cannulated

with a polytetra

fl

uoroethylene catheter, through which

the droplet suspension was injected to administer droplets

into the brain. The cannulation procedure followed the

protocol reported previously

. Before brain imaging, the

hair on the mouse head was removed by depilatory cream,

and the scalp was cut open, but the skull was kept intact.

During in vivo imaging, the mouse was

fi

xed on a lab-

made animal holder with its cortical plane oriented hor-

izontally and was anesthetized by 1.5% iso

fl

urane at an

air

fl

ow rate of 1 L/min. The temperature of the mouse

was regulated ~38 degrees. A piece of plastic Saran

™

wrap

was used to seal the bottom of the full-ring ultrasonic

transducer array, and the chamber was

fi

lled with water

for acoustic coupling. The mouse was placed under the

water chamber of the imaging system, and US gel was

applied between the skull and the plastic wrap for acoustic

coupling. The holder was then lifted until the brain

’

cortical layer was in the focal plane of the transducer

array. The maximum light

fl

uence on the surface of the

animal was ~30 mJ cm

−

, which is below the American

National Standards Institute safety limit at 780 nm.

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 10 of 12

Acknowledgements

J.K. would like to thank Joongho Ahn for fruitful discussions about the

operating software of the OR-PAM system. This research was supported by

Basic Science Research Program through the National Research Foundation of

Korea (NRF), funded by the Ministry of Education (2020R1A6A1A03047902),

supported by National R&D Program through the NRF funded by the Ministry

of Science and ICT (MSIT) (2020M3H2A1078045), supported by the NRF grant

funded by the Korea government MSIT (No. NRF-2019R1A2C2006269 and No.

2020R1C1C1013549). This work was partly supported by the Institute of

Information & communications Technology Planning & Evaluation (IITP) grant

funded by the Korea government MSIT (No. 2019-0-01906, Arti

fi

cial Intelligence

Graduate School Program (POSTECH)) and Korea Evaluation Institute of

Industrial Technology (KEIT) grant funded by the Ministry of Trade, industry

and Energy (MOTIE). This work was also supported by the Korea Medical

Device Development Fund grant funded by the MOTIE (9991007019,

KMDF_PR_20200901_0008). It was also supported by the BK21 Four project.

Author details

Departments of Electrical Engineering, Mechanical Engineering, Convergence

IT Engineering, and Interdisciplinary Bioscience and Bioengineering, Graduate

School of Arti

fi

cial Intelligence, Medical Device Innovation Center, Pohang

University of Science and Technology (POSTECH), 77 Cheongam-ro, Nam-gu,

Pohang, Gyeongbuk 37673, Republic of Korea.

Caltech Optical Imaging

Laboratory, Andrew and Peggy Cherng Department of Medical Engineering,

Department of Electrical Engineering, California Institute of Technology, 1200

E. California Blvd., MC 138-78, Pasadena, CA 91125, USA.

School of Precision

Instruments and Optoelectronics Engineering, Tianjin University, 92 Weijin

Road, Nankai District, Tianjin 300072, China.

Opticho, 532, CHANGeUP

GROUND, 87 Cheongam-ro, Nam-gu, Pohang, Gyeongsangbuk 37673,

Republic of Korea

Author contributions

C.K. and J.K. conceived and designed the study. J.K., J.Y.K., Y.K., and L.L.

constructed the imaging systems. J.K., L.L., and P.Z. contributed to managing

the imaging systems for collecting the raw data. J. K., G.K., and L.L. developed

the image processing algorithms and DL networks. J.K. and G.K. contributed to

perform the training of the DNNs and analyze the results. C.K. supervised the

entire project. J.K., G.K., and L.L. prepared the

fi

gures and wrote the manuscript

under the guidance of C.K., L.V.W., and S.L. All authors contributed to the

critical reading and writing of the manuscript.

Data availability

All data are available within the Article and Supplementary Files or available

from the authors upon request.

Con

fl

ict of interest

C. Kim and J.Y. Kim have

fi

nancial interests in Opticho and the OR-PAM system

(i.e., OptichoM) was supported by Opticho. L.V. Wang has

fi

nancial interests in

Microphotoacoustics, Inc., CalPACT, LLC, and Union Photoacoustic

Technologies, Ltd., which did not support this work.

Supplementary information

The online version contains supplementary

material available at

https://doi.org/10.1038/s41377-022-00820-w

Received: 21 November 2021 Revised: 24 April 2022 Accepted: 26 April

2022

References

1. Wang, L. V. & Hu, S. Photoacoustic tomography: in vivo imaging from orga-

nelles to organs.

Science

335

, 1458

–

1462 (2012).

2. Jeon, S. et al. Review on practical photoacoustic microscopy.

Photoacoustics

, 100141 (2019).

3. Jeon, S. et al. In vivo photoacoustic imaging of anterior ocular vasculature: a

random sample consensus approach.

Sci. Rep.

, 4318 (2017).

4. Kim, H. et al. PAExM: label-free hyper-resolution photoacoustic expansion

microscopy.

Opt. Lett.

, 6755

–

6758 (2020).

5. Baik,J.W.etal.Superwide-

fi

eld photoacoustic microscopy of animals and

humans in vivo.

IEEE Trans. Med. Imaging

,975

–

984 (2020).

6. Kim, J. Y. et al. Fast optical-resolution p

hotoacoustic microscopy using a 2-axis

water-proo

fi

ng MEMS scanner.

Sci. Rep.

, 7932 (2015).

7. Wong, T. T. W. et al. Label-free automated three-dimensional imaging of

whole organs by microtomy-assis

ted photoacoustic microscopy.

Nat. Com-

mun.

, 1386 (2017).

8. Shi, J. H. et al. High-resolution, high-

contrast mid-infrared imaging of fresh

biological samples with ultraviolet-l

ocalized photoacoustic microscopy.

Nat.

Photonics

, 609

–

615 (2019).

9. Yao, J. J. et al. High-speed label-free functional photoacoustic microscopy of

mouse brain in action.

Nat. Methods

,407

–

410 (2015).

10. Kim, J. et al. Superresolution localiz

ation photoacoustic microscopy using

intrinsic red blood cells as contrast absorbers.

Light. Sci. Appl.

, 103 (2019).

11. Baik, J. W. et al. Intraoperative label-free photoacoustic histopathology of

clinical specimens.

Laser Photonics Rev.

, 2100124 (2021).

12. Ahn, J. et al. High-resolution functional photoacoustic monitoring of vascular

dynamics in human

fi

ngers.

Photoacoustics

, 100282 (2021).

13. Cho, S. W. et al. High-speed photoaco

ustic microscopy: a review dedicated on

light sources.

Photoacoustics

, 100291 (2021).

14. Park, J. et al. Quadruple ultrasound, photoacoustic, optical coherence, and

fl

uorescence fusion imaging with a tr

ansparent ultrasound transducer.

Proc.

NatlAcad.Sci.USA

118

, e1920879118 (2021).

15. Lin, L. et al. Single-breath-hold photoacoustic computed tomography of the

breast.

Nat. Commun.

, 2352 (2018).

16. Park, B. et al. 3D wide-

fi

eld multispectral photoacoustic imaging of human

melanomas in vivo: a pilot study.

J. Eur. Acad. Dermatol. Venereol.

, 669

–

676

(2021).

17. Na, S. et al. Massively parallel functio

nal photoacoustic computed tomography

of the human brain.

Nat. Biomed. Eng.

–

9(2021).

18. Kim, J. et al. Multiparametric photoac

oustic analysis of human thyroid cancers

in vivo.

Cancer Res.

, 4849

–

4860 (2021).

19. Choi, W. et al. Clinical photo

acoustic imaging platforms.

Biomed.Eng.Lett.

139

–

155 (2018).

20. Yao,J.J.&Wang,L.V.Photoacousticmicroscopy.

Laser Photonics Rev.

758

–

778 (2013).

21. Yao, J. J. et al. Photoimprint photoacoustic microscopy for three-dimensional

label-free subdiffraction imaging.

Phys. Rev. Lett.

112

, 014302 (2014).

22. Betzig, E. et al. Imaging intracellular

fl

uorescent proteins at nanometer reso-

lution.

Science

313

,1642

–

1645 (2006).

23. Rust, M. J., Bates, M. & Zhuang, X. W. Sub-diffraction-limit imaging by sto-

chastic optical reconstruction microscopy (STORM).

Nat. Methods

, 793

–

796

(2006).

24. Danielli, A. et al. Label-fre

e photoacoustic nanoscopy.

J. Biomed. Opt.

086006 (2014).

25. Zhang, P. F. et al. In vivo superresolution photoacoustic computed tomo-

graphy by localization of single dyed droplets.

Light. Sci. Appl.

,36(2019).

26. Dean-Ben, X. L. & Razansky, D. Localization optoacoustic tomography.

Light.

Sci. Appl.

, 18004 (2018).

27. Vilov,S.,Arnal,B.&Bossy,E.Overco

ming the acoustic diffraction limit in

photoacoustic imaging by the localization of

fl

owing absorbers.

Opt. Lett.

4379

–

4382 (2017).

28. Choi, W. & Kim, C. Toward in vivo trans

lation of super-resolu

tion localization

photoacoustic computed tomography using liquid-state dyed droplets.

Light.

Sci. Appl.

, 57 (2019).

29. Zhao, H. X. et al. Deep learning enables superior photoacoustic imaging at

ultralow laser dosages.

Adv. Sci.

, 2003097 (2021).

30. Ouyang, W. et al. Deep learning massively accelerates super-resolution loca-

lization microscopy.

Nat. Biotechnol.

,460

–

468 (2018).

31. DiSpirito, A. et al. Reconstructing undersampled photoacoustic microscopy

images using deep learning.

IEEE Trans. Med. Imaging

,562

–

570 (2021).

32. Wang, H. D. et al. Deep learning enable

s cross-modality super-resolution in

fl

uorescence microscopy.

Nat. Methods

,103

–

110 (2019).

33. Nehme, E. et al. DeepSTORM3D: dense

3D localization microscopy and PSF

design by deep learning.

Nat. Methods

,734

–

740 (2020).

34. Qiao, C. et al. Evaluation and develo

pment of deep neural networks for image

super-resolution in optical microscopy.

Nat. Methods

,194

–

202 (2021).

35. Milecki, L. et al. A deep learning framework for spatiotemporal ultrasound

localization microscopy.

IEEE Trans. Med. Imaging

,1428

–

1437 (2021).

36. Masutani,E.M.,Bahrami,N.&Hsia

o, A. Deep learning single-frame and

multiframe super-resolution for cardiac MRI.

Radiology

295

,552

–

561 (2020).

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 11 of 12

37. Brady,S.L.etal.Improvingimagequ

ality and reducing radiation dose for

pediatric CT by using deep learning reconstruction.

Radiology

298

,180

–

188

(2021).

38. Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for bio-

medical image segmentation. In: Proceedings of the 18th International Con-

ference on Medical Image Computing and Computer-Assisted Intervention.

Munich: Springer, 234

–

241 (2015).

39. Goodfellow, I. J. et al. Generative adversarial nets. In: Proceedings of the 27th

International Conference on Neural Information Processing Systems. Montreal:

MIT Press, 2672

–

2680 (2014).

40. Isola, P. et al. Image-to-image translat

ion with conditional adversarial networks.

In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern

Recognition. Honolulu: IEEE, 5967

–

5976 (2017).

41. Vu, T. et al. Deep image prior for undersampling high-speed photoacoustic

microscopy.

Photoacoustics

, 100266 (2021).

42. Shi, W. Z. et al. Real-time single ima

ge and video super-resolution using an

fi

cient sub-pixel convolutional neural network. In: Proceedings of 2016 IEEE

Conference on Computer Vision and P

attern Recognition. Las Vegas: IEEE,

1874

–

1883 (2016).

43. Drozdzal, M. et al. The importance of skip connections in biomedical

image segmentation. In: Proceedings of the 1st International Workshop on

Deep Learning in Medical Image Analysis. Athens. Greece: Springer,

179

–

187 (2016).

44. Raghu, M. et al. Transfusion: unders

tanding transfer learning for medical

imaging.

Adv. Neural Inf. Process. Syst.

,3347

–

3357 (2019).

45. Cho,S.etal.3DPHOVIS:3Dphoto

acoustic visualization studio.

Photoacoustics

, 100168 (2020).

46. Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for

image quality assessment. In: Proceedings of the Thrity-Seventh

Asilomar Conference on Signals, Systems & Computers. Paci

fi

cGrove:

IEEE, 1398

–

1402 (2003).

47. Selvaraju, R. R. et al. Grad-CAM: vis

ual explanations from deep networks via

gradient-based localizatio

n. In: Proceedings of 2017 IEEE International Con-

ference on Computer Vision. Venice: IEEE, 618

–

626 (2017).

48. Bach, S. et al. On pixel-wise explanations for non-linear classi

fi

er decisions by

layer-wise relevance propagation.

PLoS One

, e0130140 (2015).

49. Li, L. et al. Single-impulse panoramic

photoacoustic computed tomography of

small-animal whole-body dynamics at

high spatiotemporal resolution.

Nat.

Biomed. Eng.

–

11 (2017).

50. Srivastava, N. et al. Dropout: a simple way to prevent neural networks from

over

fi

tting.

J.Mach.Learn.Res.

,1929

–

1958 (2014).

51. Ioffe, S. & Szegedy, C. Batch normalizat

ion: accelerating deep network training

by reducing internal covariate shift. In: Proceedings of the 32nd International

Conference on International Conference on Machine Learning. Lille, France:

PMLR, 448

–

456 (2015).

52. He, K. M. et al. Delving deep into recti

fi

ers: surpassing human-level perfor-

mance on imagenet classi

fi

cation. In: Proceedings of 2015 IEEE International

Conference on Computer Vision. Santiago, Chile: IEEE, 1026

–

1034 (2015).

53. Zhao, H. et al. Loss functions for image restoration with neural networks.

IEEE

Trans. Comput. Imaging

,47

–

57 (2017).

54. Kingma, D. P. & Ba, L. J. Adam: a meth

od for stochastic o

ptimization. In:

Proceedings of the 3rd International Conference on Learning Representations.

San Diego, 2015.

55. Goodfellow, I., Bengio, Y. & Courville, A.

Deep Learning.

(MIT Press, Cambridge,

2016).

56. Feng, J. et al. Catheterization of the carotid artery and jugular vein to perform

hemodynamic measures, infusions and blood sampling in a conscious rat

model.

J. Vis. Exp.

, 51881 (2015).

Kim et al.

Light: Science & Applications

(2022) 11:131

Page 12 of 12