of 7
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING
IEEJ Trans
2024;
19
: 535 – 541
Published online in Wiley Online Library (wileyonlinelibrary.com). DOI:10.1002/tee.23997
Paper
Systematic Face Pareidolia Generation Method Using Cycle-Consistent
Adversarial Networks
Yoshitaka Endo
*
,
Non-member
Rinka Asanuma
**
,
Non-member
Shinsuke Shimojo
***
,
Non-member
Takuya Akashi
****
a
,
Member
Pareidolia is a psychological tendency of perceiving a face in non-face stimulus. As a majority of people globally experience
this tendency, it has been extensively studied and measured in terms of tendencies, such as frequencies. However, no study
has investigated the systematic manipulation of stimulus owing to the lack of a systematic image-generation method. Therefore,
herein, we generated face pareidolia stimuli using a face data set with annotated data. We employed cycle-consistent adversarial
networks (CycleGAN), an image-to-image-style translation framework, to generate stimuli for translating natural-image styles
from face images. We manipulated the weight of the cycle-consistency loss in the CycleGAN via an experiment to evaluate
the image generated using the CycleGAN. Thus, we found that the weight value of the evaluation experiment correlated with
the pareidolia-inducing power when the preprocessing of the face data set was applied to the blur process. As a result, we
achieved to systematically generate pareidolia stimuli.
©
2024 The Authors.
IEEJ Transactions on Electrical and Electronic
Engineering
published by Institute of Electrical Engineer of Japan and Wiley Periodicals LLC
Keywords:
face pareidolia; cycle consistent adversarial networks; psychophysics
Received 28 July 2023; Revised 2 December 2023; Accepted 9 January 2024
1. Introduction
We often experience a psychological phenomenon called parei-
dolia, the psychological tendency to perceive specific patterns in
natural scenes, in our daily life. As shown in Fig.
1
, pareidolia
can be perceived from a specific object, such as an outlet, in front
of the car. DeepDream, which can create psychedelic images
based on the pareidolia concept using deep learning, has been
developed. A virtual reality video based on DeepDream can
induce a subjective experience similar to that of a real psychedelic
[
1
]. Therefore, pareidolia is associated with human perception
and the field of neuropsychology. As humans perceive faces even
in static objects, in this study, we propose an image-generation
technique to provide a systematic generation framework.
Pareidolia can be associated with certain illnesses, for example,
Lewy body dementia [
2
]. Patients with Lewy body dementia
experience more pareidolia compared with healthy individuals.
Thus, a diagnosis method called the pareidolia test was developed
a
Correspondence to: Takuya Akashi. E-mail: akashi@iwate-u.ac.jp
*
Graduate School of Science and Engineering, Department of Design
and Media Technology, Iwate University, 4-3-5, Ueda Morioka, Iwate,
020-8551, Japan
**
Graduate School of Arts and Sciences, Division of Science and Engi-
neering, Iwate University, 4-3-5 Ueda Morioka, Iwate, 020-8551, Japan
***
Division of Biology and Biological Engineering, California Institute of
Technology, Pasadena, California, 91125, USA
****
Faculty of Science and Engineering, Iwate University, 4-3-5 Ueda
Morioka, Iwate, 020-8551, Japan
to determine whether a patient is suffering from Lewy body
dementia. Similarly, patients with Parkinson’s disease also
experience more pareidolia compared with healthy individuals [
3
].
Both these illnesses feature symptoms of visual hallucinations,
thus suggesting that pareidolia may be closely related to visual
hallucinations. By contrast, certain illnesses induce less pareidolia
experience compared with that experienced by healthy individuals.
One typical example for this is autism spectrum disorder [
4,5
]
whereby patients perceive a face as less developed compared with
those without autism. Essentially, they cannot perceive the face
from the facial component.
Several types of pareidolia stimuli have been used to investigate
face pareidolia perception. One of the stimulus type is an artifact
that is artificially made of an image and photograph; ‘Alcimbold’
and ‘Mooney Face Test’ are typical examples of artifacts. Based on
these artifacts, new stimuli have been generated to investigate face
pareidolia perception [
6,7
]. Artificially-generated stimuli are also
widely used [
8
]. Another approach involves using image stimuli
that can induce pareidolia [
9
].
The pareidolia-inducing power of these stimuli is uncertain
because the stimuli are generated artificially, and a systematic
method to manipulate pareidolia-inducing power in comparable
images is lacking. The pareidolia-inducing power may differ
according to the pareidolia type and form; essentially, only similar
stimuli can be compared when estimating the pareidolia-inducing
power.
To the best of our knowledge, no study thus far has generated
pareidolia stimuli. Therefore, in this study, we systematically
generated face pareidolia stimuli using a face data set for which
©
2024 The Authors.
IEEJ Transactions on Electrical and Electronic Engineering
published by Institute of Electrical Engineer of Japan and Wiley Periodicals LLC.
This is an open access article under the terms of the
Creative Commons Attribution
License, which permits use, distribution and reproduction in any medium, provided the
original work is properly cited.
Y. ENDO
ET AL
.
Face...?
Fig. 1. An example of pareidolia. The actual object in the image
is a bag. It can appear as a face because the orange buckles can
be regarded as the eyes of the face and the handle can appear as
the mouth of the face
annotation data were available [
10
]. The annotation data was
constructed using the coordinates of face parts. We believe that
the annotation data can be used to generate the same form of
the face pareidolia stimuli. Using annotation data, the locations
of the elements causing pareidolia were clearly determined, thus
facilitating the creation of pareidolia stimuli that can manipulate
pareidolia position.
Further, we employed cycle-consistent adversarial networks
(CycleGAN) for translating between unpaired data sets to sys-
tematically generate pareidolia. We attempted to generate face
pareidolia stimuli by image-style translation from a face image to
a natural image. Evidently, the cycle-consistency loss, a parameter
of the CycleGAN, was found to affect the performance of the
generated image.
2. Related Work
2.1. Face pareidolia
As described in Section
1
, pareido-
lia is a psychological tendency experienced by humans globally.
The frequency of experiencing pareidolia tends to be higher for
the face category [
11
], and our brain can react to a face-like object
in approximately 170 ms [
12
]. Rhesus monkeys also experience
face pareidolia [
13
] thus suggesting that they too can perceive
face outlines. Faces are socially-important information in daily
life. The frequency of face pareidolia differs based on gender and
personality. However, to the best of our knowledge, research on
face pareidolia-inducing power is currently lacking. Notably, face
pareidolia can be investigated in detail if an indicator of face
pareidolia-inducing power is established, such an indicator can
quantify the power of pareidolia-inducing power and thus be used
to compare different stimuli. The threshold for face pareidolia
can be defined based on this indicator.
2.2. Image style translation
Image-style translation
is commonly performed in computer vision. Approaches using
generative adversarial networks (GANs) [
14
] are well-known.
CycleGAN, which can translate between two different image
styles [
15
], is a typical framework for image-style translation
using GANs. The CycleGAN framework is illustrated in Fig.
2
.
CycleGAN comprises two types of networks: discriminators and
generators. The function of the generator is to translate the style of
the input image, which differs from the original GANs. The Cycle-
GAN can mutually translate image styles, such as a combination
of zebra and horse, and painted art and realistic pictures.
Fig. 2. The CycleGAN framework on data sets
A
and
B
.The
G
A
B
aims to translate from the
A
style to the
B
style. The
G
B
A
behave similarly from data set
B
style to data set
A
style. The
D
A
aims to determine whether the input image is from the data set
A
or generated by the
G
B
A
.The
D
B
behave likewise on data set
B
CycleGAN introduces cycle-consistency loss and restricts the
number of translation-mapping candidates. Herein, we employed
CycleGAN owing to its unique ability to translate between two
unpaired data sets. Using CycleGAN, an image translated into the
style of another data set reproduced the original input image by
translating it into the original style.
3. Proposed Method
3.1. Face pareidolia generation
We used CycleGAN
to generate face pareidolia stimuli. CycleGAN cannot learn
specific objects; thus, we can generate face pareidolia from the
facial features. We used real-face and natural-image datasets
to generate stimuli. Additionally, if we simply apply pixel
synthesis, the generation result contains unnatural colors, such
as red lips and eyes. In this study, we attempted to generate
the pareidolia stimuli that resemble the after-translation style,
including the eyes and lips, using training data. Given that our
objective was to use face attribute information (such as eyes
and mouth), the CelebAMask-HQ dataset was employed herein.
The CelebAMask-HQ dataset includes the label information of
facial attributes such as eyes, hair, and skin. Additionally, natural
images collected from Flickr [
16
] were also used. Figure
3
shows
the exemplar images used for training styles.
Cycle-consistency loss is a loss function of CycleGAN. As
shown in Fig.
4
, the CycleGAN calculates the difference between
the original input image and the image generated to be similar to
the original image. The difference regards the cycle-consistency
loss and makes the generators learn the after-translation image can
be translated to the original input image. CycleGAN calculates
the difference between the original input image and generated
image, which is related to the cycle-consistency loss, and makes
the generators learn. The cycle-consistency loss function reduces
the style translation-mapping solutions. Following [
15
], the overall
objective function is shown in (
1
):
L
(
G
A
B
,
G
B
A
,
D
A
,
D
B
)
=
L
GAN
(
G
A
B
,
D
B
,
A
,
B
)
+
L
GAN
(
G
B
A
,
D
A
,
B
,
A
)
+
λ
L
cyc
(
G
A
B
,
G
B
A
)
(1)
Here,
L
GAN
is the loss function of the original generative
adversarial networks (GANs) and
L
cyc
is the cycle-consistency
loss. The weight of the cycle-consistency loss
(λ)
affects the
performance of the generation. This effect might be associated with
pareidolia-inducing power; thus, we trained the CycleGAN on the
536
IEEJ Trans
19
: 535 – 541 (2024)
19314981, 2024, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/tee.23997 by California Inst of Technology, Wiley Online Library on [20/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SYSTEMATIC FACE PAREIDOLIA GENERATION METHOD USING CYCLE-CONSISTENT ADVERSARIAL NETWORKS
Fig. 3. Examples of natural image for style training. The above
picture: public domain. The below picture: ‘Fairyland Mesclun
Mix, 2008’ by Brian Boucheron is licensed under CC BY 2.0
The difference regards as loss
2
4
6
1
3
5
The difference regards as loss
Fig. 4. The cycle consistency loss. The 3 and 6 are the images
to generate alike the original image
(
1and4
)
from the firstly
translated data set
B
(
2and5
)
. The difference between 1 and 3
is regarded as the cycle consistency loss. The difference between
4 and 6 treats likewise
aforementioned data set with various
λ
values, and experimented
with the generated stimuli evaluation using the model trained on
these various
λ
values.
3.2. Preprocessing for the face image
The original
face images include the face, hair, and background. We prepro-
cessed the images to reduce the effect of generation caused by
the color difference of each part. Facial contours are often located
at the borders of the skin, hair, and background. As described
in Subsection
3.1
, CycleGAN cannot learn specific objects.
Additionally, we confirmed that the translated image changed
significantly, with changes in the image color. Therefore, after
translation, the image strongly retained the facial contours. To
overcome this problem, we applied two preprocessing, namely
noise processing and blurring, to the original face data set. As to
(a)
(b)
(c)
(d)
(e)
Fig. 5. Preprocessed image example. (a) Original image, (b)
Blur process excluding the pareidolia-elements, (c) Blur process
including the pareidolia-elements, (d) Noise process excluding
the pareidolia-elements, and (e) Noise process including the
pareidolia-elements
the noise process, the random RGB noise replace the pixels of the
original face images except for the regions of eyes and mouth. As
to the blurring process, we utilize a Gaussian blur, with its filter
size of 51
×
51, and
σ
of 8. An image example illustrating the
preprocessing steps is shown in Fig.
5
. Preprocessing helps reduce
the effect of the pareidolia region [
17
]. Because our primary
objective was to generate both the stimuli that caused pareidolia
and that which did not, we processed the elements that induce
pareidolia: the eyes and mouth. The face image dataset contained
several images with the face present at the center of the image.
Therefore, the participants may have developed a bias toward
finding the face from the center of the stimulus. To overcome this
problem, we moved the components that induce face pareidolia.
3.3. Evaluation of the generated stimuli
A standard
psychophysical method for pareidolia-inducing power has not
yet been developed. In addition, the frequency and sensitivity
of pareidolia differ between individuals. Thus, we examined
the experiments conducted previously on pareidolia to design
the experimental procedure. In particular, the case of the noise
pareidolia test [
2
] was found to be similar to that of our research
objective. The procedure for the noise pareidolia test was as
follows. First, the investigator of the noise pareidolia test pre-
sented the experimental stimulus to the participants. Next, the
participants answered whether the face or a specific object was in
the stimulus. If the participants identified the face in the stimulus,
the investigator instructed them to point out the facial region.
In this study, we investigated the pareidolia-inducing power of
the generated stimuli. Therefore, the participants determined
whether the stimulus included pareidolia and we instructed them
to evaluate the face pareidolia intensity for the stimulus.
4. Experiment
We performed two types of experiments and data analyses.
The first was a pareidolia stimuli-generation experiment; we
537
IEEJ Trans
19
: 535 – 541 (2024)
19314981, 2024, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/tee.23997 by California Inst of Technology, Wiley Online Library on [20/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Y. ENDO
ET AL
.
generated face pareidolia image stimuli using CycleGAN. The
second experiment involved evaluating the generated stimuli.
Pareidolia may differ with the sensitivity of pareidolia-inducing
power between individuals; therefore, we developed an analytical
method to normalize sensitivity.
4.1. Face pareidolia generation
We used CycleGAN
to generate the images. As described in Subsection
3.3
, the training
dataset comprised a natural image dataset and preprocessed face
image dataset. Each training image dataset comprised 100 images
of resolution 256
×
256. The images were randomly selected. The
number of learning iterations (epochs) was 1000, and the learning
rate for CycleGAN was initially set to 0.0002 until the 900th
epoch and linearly decayed to 0 from the 901st to 1000th epochs.
Further, the
λ
values used for training were 2, 10, and 20. The
two preprocessing steps described in Subsection
3.2
were applied
to the images, which were used as training data. In training, the
face images used were not preprocessed in terms of the elements
causing pareidolia. CycleGAN cannot learn specific objects; there-
fore, even if the input image did not contain elements that cause
pareidolia, the output image was considered to be slightly affected.
4.2. Evaluation experiment
4.2.1. Evaluation procedure
The generated stimuli were
evaluated to investigate their systematic generation. For qualitative
evaluation, the participants scored the generated images based on
all the
λ
values and preprocessing. We evaluated face strength as
face-pareidolia-inducing power. A total of 210 images were evalu-
ated in the experiment. First, we randomly select 15 images that are
not used for training from the CelebAMask-HQ dataset as the input
images. Then, for each input image, we generate four different
stimulus images, which depend on including/excluding the parei-
dolia elements or using blur/noise preprocessing. In total, we gen-
erate 60 images with respect to a value of
λ
. Finally, 180 stimulus
images are evaluated because we investigate three different values
of
λ
. Additionally, we extracted 15 images each from a face and
natural images without using them for training the style translation.
First, the participants answered whether the face is contained in the
displayed stimuli. If the participant reported ‘yes’, the participant
pointed out the face region by enclosing it in an ellipse. Thereafter,
the participant scored the pareidolia-inducing power, which ranged
from 1 to 99. After the scoring or the participant reporting ‘no’,
the displayed image was changed to the next image, and the eval-
uation was repeated for all the image stimuli. The monitor used
for the experiment was a 21.5-in monitor, BenQ, G2222HDL. We
analyzed the result based on the evaluation by each participant.
4.2.2. Participant
We enrolled 11 participants (nine males,
and two females, aged between 21 and 24 years). All participants
consented to the experiment and signed the relevant agreement.
4.3. Data analysis
The results were analyzed using the
signal detection theory (SDT) [
18
]. SDT is effective in investigat-
ing whether the presented stimuli include specific information. In
this study, the specific information is set to the face, and regarded
as ‘signal’. The natural image is regarded as ‘no-signal’ because
of the exclusion of specific information. Additionally, the case of
inclusion of pareidolia elements regards as ‘signal’ (Fig.
6(b) – (d)
and
(j) – (l)
), and that of exclusion regards as ‘no-signal’
(Fig.
6(f) – (h)
and
(n) – (p)
). Responses in SDT can be classified
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
Fig. 6. Generation result example. (a) Input (blur, including the
pareidolia-elements), (b)
λ
=
2, (c)
λ
=
10, and (d)
λ
=
20, (e) Input
(blur, excluding the pareidolia-elements), (f)
λ
=
2, (g)
λ
=
10,
(h)
λ
=
20, (i) Input (noise, including the pareidolia-elements),
(j)
λ
=
2, (k)
λ
=
10, (l)
λ
=
20, (m) Input (noise, excluding the
pareidolia-elements), (n)
λ
=
2, (o)
λ
=
10 and (p)
λ
=
20
into four types: miss (M), hit (H), false alarm (FA), and correct
rejection (CR). When the participant reported ‘yes’ by observing
a stimulus with signal (e.g., Fig.
6(b) – (d)
and Fig.
6(j) – (l)
), the
response is classified as H, and ‘no’ is classified as M. Similarly,
if the participant reported ‘yes’ when no signal is present in
the displayed stimulus (e.g., Fig.
6(f) – (h)
and Fig.
6(n) – (p)
), it
will be classified as FA, and if the subject reported ‘no’, it is
classified as CR. In SDT, the ability of discriminating the stimuli
is parameterized as
d

, and calculated by (
2
):
d

=
Z
(
1
P
(
FA
))
Z
(
1
P
(
H
))
(2)
Here,
Z
()
calculates the z-score of each probability. Also,
P
(
H
)
and
P
(
FA
)
are the probability of H and FA, respectively.
We recorded the response time, the center of the ellipse, major
axis, and minor axis. Following the above SDT, the response of
participants is classified into four types: M, H, FA, and CR. Based
on the recorded ellipse information, this study determined whether
the image was intended pareidolia. The mask image corresponding
to the face image to which the migration process was applied was
moved by the same amount as in the previous method, and the
538
IEEJ Trans
19
: 535 – 541 (2024)
19314981, 2024, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/tee.23997 by California Inst of Technology, Wiley Online Library on [20/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SYSTEMATIC FACE PAREIDOLIA GENERATION METHOD USING CYCLE-CONSISTENT ADVERSARIAL NETWORKS
degree of overlap with the ellipse was determined. The pareidolia
was considered acceptable when more than 90% was enclosed
and the presented stimulus was treated as H. If the overlap ratio
was less than 90% and the presented stimulus included pareidolia
elements, the presented stimulus was treated as FA.
5. Result
5.1. Face pareidolia generation result
Examples of
the results of the generation experiments are shown in Fig.
6
.
These images are embedded in the face pareidolia structure and
can be perceived as faces. We consider that the same form of
pareidolia stimuli can be generated using face annotation data.
The face form appears to remain in the image when the input
image includes pareidolia elements (Fig.
6(a) – (d)
and
(i) – (l)
).
Conversely, the face form does not seem to remain in the
image when the input image does not include pareidolia elements
(Fig.
6(e) – (h)
and
(m) – (p)
).
5.2. Evaluation experiment result
Pareidolia is a psy-
chological tendency; however, whether the stimuli can be per-
ceived as a face differs between individuals. Figures
7
and
8
show
the hit transition on each preprocess.
The reported number increased monotonically as the value of
λ
increased for 7 of the 11 participants on blur preprocessing
(Fig.
7
). Further, the reported number increased monotonically as
the value of
λ
increased for only 1 of the 11 participants on noise
preprocessing (Fig.
8
). One participant could not report the real
face because of an error in the evaluation experiment.
5.3. ROC curve
In the experiment, almost all partici-
pants have sometimes reported ‘face’ on the not-intended parei-
dolia stimuli, FA. There are two possible causes of factors. One
of the reasons is the low ability to discriminate whether the par-
ticipant can perceive the face. This task itself is too difficult to
discriminate objectively, and the
d

is low. Another reason is the
abnormal setting of the internal criterion; the participants tend to
set the threshold to report ‘face’ too low. We calculate the
d

and
draw the receiver operating characteristic (ROC) curve based on
Fig. 7. Hit number transition on blur preprocessing. The red
and blue lines represent the monotonically and nonmonotonically
increasing participant intensity, respectively. The orange bar chart
represents the average of the reported number of participants. The
error bar represents 95% confidence interval
Fig. 8. Hit number transition on noise preprocessing
Fig. 9. ROC curve for each participant
the intensity to reveal the reason. The drawn ROC curve is shown
in Fig.
9
.
The ‘inf’ of
d

means that the participant reports ‘face’ on the
intended pareidolia stimuli perfectly without FA. When the
d

is
‘inf’, the line of the ROC curve cannot be drawn. All participants
have the ability to discriminate from the ROC curve and
d

.
Therefore, the internal criterion for the face signal is suggested to
be abnormal.
5.4. Data analysis result
The pareidolia-inducing power
scores differed between individuals. Almost all participants evalu-
ated as score: 99 for real facial stimuli. On the other hand, in some
cases, the participant cannot perceive the face (score: 0). In this
part, this case is not considered because we focus on the perceived
case. Minimum score was distributed between participants. There-
fore, we applied min – max normalization based on each minimum
value to normalize the score range. The formula for processing the
minimum value to 1 and maximum value to 99 after normalization
is shown in (
3
).
Intensity
=
Score-Intensity
min
Intensity
max
-Intensity
min
×
98
+
1(3)
539
IEEJ Trans
19
: 535 – 541 (2024)
19314981, 2024, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/tee.23997 by California Inst of Technology, Wiley Online Library on [20/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Y. ENDO
ET AL
.
Here, Intensity
min
is the minimum value of each participant,
and Intensity
max
is the maximum value of each participant. The
result after the min-max normalization process is shown in Figs.
10
and
11
.
If the intensity of each participant monotonically increases
with respect to the increase of
λ
, the lines of the according
subjects in Figs.
10
and
11
are drawn in red. As shown in
Fig.
10
, when blurring is applied as the preprocessing for stimuli
generation, 8 out of 11 lines show monotonic increase. When
the noise process is applied as the preprocessing, all subjects are
nonmonotonically increasing, which can be observed from Fig.
11
.
In addition, we investigate the correlation coefficient between the
λ
value and the average participant’s intensity with respect to
each preprocessing method based on Figs.
10
and
11
, respectively.
When the preprocessing is blurring, the correlation coefficient is
0.92. However, when the preprocessing is noise, the correlation
coefficient degrades to 0.59. Based on the correlation coefficient,
we can confirm that the strong correlation between the
λ
value and
the average intensity exists in the case of blur preprocessing. On
the other hand, we cannot intuitively observe whether a significant
difference exists between
λ
=
10 and
λ
=
20 in Fig.
10
. Therefore,
Fig. 10. Blur intensity transition. The red and blue lines represent
the monotonically and nonmonotonically increasing participant
intensity, respectively. The orange bar chart represents the average
of the participants’ intensity. The error bar represents 95%
confidence interval
Fig. 11. Noise intensity transition. The red and blue lines represent
the monotonically and nonmonotonically increasing participant
intensity, respectively
we investigate whether there is a significant difference using the
Wilcoxon signed-rank test [
19
]. As a result, the test statistic
(
T
)
is
12 under a significance level of 10%. As the number of participants
is 11, there is a significant difference when the significance level
is set to 10%. This result suggests that when the preprocess
was blurring, the weight
(λ)
was associated with the pareidolia-
inducing power. In each category, the tendency of the scores was
different from that of the faces, thus indicating that the stimuli were
generated with different characteristics from those of the faces.
Figures
7
and
10
have same intensity trend, which is the same
as that in Figs.
8
and
11
. The correlation coefficient between the
hit number and the average participant’s intensity of generated
stimuli in the case of blur preprocessing is 0.98, and the correlation
coefficient of noise preprocessing on generated stimuli is 1.00.
These observations imply that the pareidolia report number might
correlate to the pareidolia-inducing power.
6. Conclusion
Herein, we systematically investigated the generation of facial
pareidolia through the pareidolia-inducing power. We manipulated
the weight of the cycle-consistency loss and generated stimuli.
The pareidolia-inducing power can be manipulated by the cycle-
consistency loss of CycleGAN. Herein, we employed a human face
data set and generated pareidolia stimuli that had the same form as
the face using the annotation data of the face data set. We trained
the natural and face images, and systematically generated parei-
dolia stimuli. The results of the evaluation experiment revealed a
correlation between cycle-consistency loss and pareidolia-inducing
power when the blurring process was applied as preprocessing.
This suggests that preserving features such as eyes and mouth
is critical for inducing pareidolia. Also, the ROC curve reveals
that the cause of pareidolia is mostly due to internal criterion. In
future work, we intend to apply the proposed method to existing
pareidolia stimuli. Further, it can also be applied to more versatile
applications by systematic generation in the form of existing parei-
dolia stimuli, except for faces. For example, opposite tendencies
of pareidolia, prosopagnosia, are known [
20
]. Some people cannot
experience pareidolia even if pareidolia-inducing power is strong.
In such case, prosopagnosia may have occurred. In this study,
healthy controls were tested. In future work, we must investigate
the correlation between pareidolia-inducing power and pareidolia
reported number, and extend our experiment to patients suffering
from not only Lewy body dementia but also prosopagnosia.
Acknowledgments
The authors thank Chao Zhang, Faculty of Engineering, University of
Fukui, for providing advice on research and useful information. This work
was supported by JSPS KAKENHI Grant Number 22K11547, 19K11515,
and 16KK0069.
Funding information
This work was supported by JSPS KAKENHI Grant Number
22K11547, 19K11515, and 16KK0069.
References
(1) Suzuki K, Roseboom W, Schwartzman DJ, Seth AK. A deep-dream
virtual reality platform for studying altered perceptual phenomenol-
ogy.
Scientific Reports
2017;
7
(1):1 – 11.
540
IEEJ Trans
19
: 535 – 541 (2024)
19314981, 2024, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/tee.23997 by California Inst of Technology, Wiley Online Library on [20/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SYSTEMATIC FACE PAREIDOLIA GENERATION METHOD USING CYCLE-CONSISTENT ADVERSARIAL NETWORKS
(2) Yokoi K, Nishio Y, Uchiyama M, Shimomura T, Iizuka O, Mori E.
Hallucinators find meaning in noises: Pareidolic illusions in dementia
with lewy bodies.
Neuropsychologia
2014;
56
:245 – 254.
(3) Uchiyama M, Nishio Y, Yokoi K, Hosokai Y, Takeda A, Mori E.
Pareidolia in parkinson’s disease without dementia: A positron
emission tomography study.
Parkinsonism & Related Disorders
2015;
21
(6):603 – 609.
(4) Pavlova MA, Guerreschi M, Tagliavento L, Gitti F, Sokolov AN,
Fallgatter AJ, Fazzi E. Social cognition in autism: Face tuning.
Scientific Reports
2017;
7
(1):1 – 9.
(5) Rahman M, van Boxtel JJ. Seeing faces where there are none:
Pareidolia correlates with age but not autism traits.
Vision Research
2022;
199
:108071.
(6) Kobayashi M, Otsuka Y, Nakato E, Kanazawa S, Yamaguchi
MK, Kakigi R. Do infants recognize the arcimboldo images as
faces? Behavioral and near-infrared spectroscopic study.
Journal of
Experimental Child Psychology
2012;
111
(1):22 – 36.
(7) Pavlova MA, Scheffler K, Sokolov AN. Face-n-food: Gender differ-
ences in tuning to faces.
PLoS One
2015;
10
(7):e0130363.
(8) Kato M, Mugitani R. Pareidolia in infants.
PLoS One
2015;
10
(2):e0118539.
(9) Wardle SG, Taubert J, Teichmann L, Baker CI. Rapid and dynamic
processing of face pareidolia in the human brain.
Nature Communi-
cations
2020;
11
(1):1 – 14.
(10) Lee CH, Liu Z, Wu L, Luo P. Maskgan: Towards diverse and inter-
active facial image manipulation.
Proceedings of IEEE Conference
on Computer Vision and Pattern Recognition (CVPR)
. 2020.
(11) Uchiyama M, Nishio Y, Yokoi K, Hirayama K, Imamura T, Shimo-
mura T, Mori E. Pareidolias: Complex visual illusions in dementia
with Lewy bodies.
Brain
2012;
135
(8):2458 – 2469.
(12) Hadjikhani N, Kveraga K, Naik P, Ahlfors SP. Early (m170)
activation of face-specific cortex by face-like objects.
Neuroreport
2009;
20
(4):403 – 407 (in En).
(13) Taubert J, Wardle SG, Flessert M, Leopold DA, Ungerleider LG.
Face pareidolia in the rhesus monkey.
Current Biology
2017;
27
(16):2505 – 2509.
(14) Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D,
Ozair S, Courville A, Bengio Y. Generative adversarial networks.
Communications of the ACM
2020;
63
(11):139 – 144.
(15) Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image
translation using cycle-consistent adversarial networks. Proceedings
of the IEEE international conference on computer vision; 2223 – 2232.
2017.
(16) Flickr.
https://www.flickr.com/
.
(17) Omer Y, Sapir R, Hatuka Y, Yovel G. What is a face? Critical
features for face detection.
Perception
2019;
48
(5):437 – 446 pMID:
30939991.
(18) Green DM, Swets JA.
Signal Detection Theory and Psychophysics
,
vol.
1
. Wiley: New York; 1966.
(19) Wilcoxon F. Individual comparisons by ranking methods.
Biometrics
1945;
1
:196 – 202.
(20) Bornstein B, Kidron D. Prosopagnosia.
Journal of Neurology, Neu-
rosurgery & Psychiatry
1959;
22
(2):124 – 131.
Yoshitaka Endo
(Non-member) received a B.E. in faculty of engi-
neering from Iwate University in 2019. He
is now a Ph.D. candidate at the Graduate
School of Science and Engineering, Depart-
ment of Design and Media Technology,
Iwate University, and has received an M.E.
His research interests include image genera-
tion, defect detection, and neuropsychology.
He is a member of IEEE and IEICE.
Rinka Asanuma
(Non-member) received a B.E. in faculty
of science and engineering from Iwate
University in 2023. She is now a mas-
ter course student at the Graduate School
of Integrated Arts and Sciences, Graduate
Course in Design and Media Technol-
ogy, Iwate University. Her research inter-
ests include pareidolia detection and mixed
reality.
Shinsuke Shimojo
(Non-member) is an experimental psycholo-
gist/cognitive neuroscientist, Gertrude Bal-
timore Professor in Division of Biology
& Biological Engineering/Computation &
Neural Systems at California Institute of
Technology. He earned a Master’s Degree
from the University Tokyo, and a PhD from
the Massachusetts Institute of Technology.
His research has focused on sensory percep-
tion, its development and adaptation, sensory-motor coordination,
multisensory integration, attention and consciousness, emotional
decision making, etc. He is the recipient of awards, including
the most creative research award (Japanese Society of Cogni-
tive Science), Tokizane Memorial Award (Japanese Neuroscience
Society), etc.
Takuya Akashi
(Member) received his Ph.D. degree in sys-
tem design engineering from the University
of Tokushima in 2006. Since April 2009,
he has been at the Department of Electri-
cal Engineering, Electronics and Computer
Science, Iwate University. In 2015, he was
a visiting associate at California Institute
of Technology. He is currently an associate
professor at Iwate University. His research
interests include evolutionary algorithms, image processing, and
human sensing. He is a member of the IEEE, RISP, IEICE,
and IEEJ.
541
IEEJ Trans
19
: 535 – 541 (2024)
19314981, 2024, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/tee.23997 by California Inst of Technology, Wiley Online Library on [20/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License