of 4
Thought Graph: Generating Thought Process for Biological
Reasoning
Chi-Yang Hsu
University of Texas at Austin
Austin, Texas, USA
ch52669@utexas.edu
Kyle Cox
University of Texas at Austin
Austin, Texas, USA
kylecox@utexas.edu
Jiawei Xu
University of Texas at Austin
Austin, Texas, USA
jiaweixu@utexas.edu
Zhen Tan
Arizona State University
Tempe, Arizona, USA
ztan36@asu.edu
Tianhua Zhai
University of Pennsylvania
Philadelphia, Pennsylvania, USA
tianhua.zhai@pennmedicine.upenn.edu
Mengzhou Hu
University of California San Diego
La Jolla, California, USA
mhu@health.ucsd.edu
Dexter Pratt
University of California San Diego
La Jolla, California, USA
depratt@health.ucsd.edu
Tianlong Chen
The University of North Carolina at
Chapel Hill
Chapel Hill, North Carolina, USA
tianlong@cs.unc.edu
Ziniu Hu
California Institute of Technology
Pasadena, California, USA
acbull@caltech.edu
Ying Ding
University of Texas at Austin
Austin, Texas, USA
ying.ding@ischool.utexas.edu
ABSTRACT
We present the Thought Graph as a novel framework to support
complex reasoning and use gene set analysis as an example to
uncover semantic relationships between biological processes. Our
framework stands out for its ability to provide a deeper understand-
ing of gene sets, significantly surpassing GSEA by 40.28% and LLM
baselines by 5.38% based on cosine similarity to human annotations.
Our analysis further provides insights into future directions of bio-
logical processes naming, and implications for bioinformatics and
precision medicine. Here’s our
Github Code.
CCS CONCEPTS
Applied computing
Bioinformatics;
Computing method-
ologies
Natural language processing.
KEYWORDS
large language model, natural language processing, semantic web
biological process, gene ontology, bioinformatics
ACM Reference Format:
Chi-Yang Hsu, Kyle Cox, Jiawei Xu, Zhen Tan, Tianhua Zhai, Mengzhou
Hu, Dexter Pratt, Tianlong Chen, Ziniu Hu, and Ying Ding. 2024. Thought
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
WWW ’24 Companion, May 13–17, 2024, Singapore, Singapore.
©
2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0172-6/24/05
https://doi.org/10.1145/3589335.3651572
Graph: Generating Thought Process for Biological Reasoning. In
Companion
Proceedings of the ACM Web Conference 2024 (WWW ’24 Companion), May
13–17, 2024, Singapore, Singapore.
ACM, New York, NY, USA, 4 pages. https:
//doi.org/10.1145/3589335.3651572
1 INTRODUCTION
The systematic study of human disease necessitates an in-depth
understanding of the links between diseases, drugs, phenotypes,
genes, and biological processes [
4
]. Analyzing gene sets that share
common biological functions, locations, or regulatory mechanisms
can reveal patterns in gene behavior across health and disease
states, contributing to the advancement of precision medicine for
cancer treatment [
7
]. Yet, the task of identifying biological processes
from gene sets is fraught with challenges. Individual genes often
display weak signals, and when strong signals are present, they
rarely converge on a singular biological theme [
7
]. This complexity
is compounded when different research groups studying the same
biological systems arrive at vastly divergent conclusions.
In response to these challenges, our paper introduces the
Thought
Graph
framework that aims to address two critical aspects: firstly,
it adopted a Tree-of-Thought (ToT) [
11
] architecture to facilitate
thought expansion for Large Language Models (LLMs), ensuring
inclusive yet precise coverage of biological processes across varying
specificity levels. Thought expansion is strategically directed with
the assistance of a voter LLM, which guides the decision-making for
future steps. This design aims to mitigate the potential discrepan-
cies in human annotations encountered by researchers, yet ensure
the quality of the generated processes. Second, our framework prior-
itizes the integration of domain-specific external knowledge bases
to understand the semantics of connections within the Thought
Graph. Consequently, it creates semantic relationships like “is-a”
537
WWW ’24 Companion, May 13–17, 2024, Singapore, Singapore.
Hsu, et al.
Thought
Graph
(TG)
Cellular
communication
Transport
Regulation of pH
I
on transport
Amino acid
transport
Drug efflux
Gap junction
assembly
Intercellular
Transport
Carbon dioxide
transport
Anion
transmembrane
transport
Bicarbonate
transport
Proton
transmembrane
transport
Cystine import
across plasma
membrane
Proton
-
coupled
oligopeptide
transport
Regulation of
intracellular amino
acid concentration
Sulfate
transmembrane
transport
Chloride
transmembrane
transport
Bicarbonate
transmembrane
transport
Intestinal
oligopeptide
absorption
Renal oligopeptide
reabsorption
Regulation of
cellular amino acid
homeostasis
Brush border
oligopeptide
translocation
Proton
-
coupled
oligopeptide
symport
Postprandial
oligopeptide
uptake regulation
Proximal tubule
oligopeptide
reabsorption
Regulation of renal
oligopeptide
reabsorption by pH
ATP
-
dependent
regulation of renal
oligopeptide
reabsorption
...
SLC7A11 SLC25A39 SLC26A6 ABCB9
SLC15A4 ABCC5 CDH17
You are given a set of genes, and your task is
to propose at least five high
-
level BPs that
may be likely to be performed by the system
involving the expression of these genes.
Here is the set of genes:
Genes: [GENE SET]
Initial prompt:
Biological Processes
(BPs) Candidate Generation
I
I
S
S
S
S
S
S
S
S
Voter
GPT
-
4
BP
12211
BP
12212
BP
12213
BP
12221
BP
12222
BP
12223
Voter
Round 1
BP
1
,
BP
2
Round 2
BP
2,
BP
1
Round 3
BP
1
,
BP
3
Voter
Round 1
BP
12,
BP
11
Round 2
BP
12
, BP
11
Round 3
BP
11
,
BP
12
Voter
Round 1
BP
111
,
BP
122
Round 2
BP
123,
BP
122
Round 3
BP
122,
BP
111
Voter
Round 1
BP
1221
,
BP
1222
Round 2
BP
1221,
BP
1222
Round 3
BP
1221,
BP
1222
Voter
Round 1
BP
12211,
BP
12213
Round 2
BP
12212,
BP
12211
Round 3
BP
12211,
BP
12213
Given a set of genes and proposed BPs
describing the system, your task is to vote on
the two best BPs describing the system.
Here is the set of genes: Genes: [GENE
SETS].
Here are the BPs for you to vote on: BPs
[CANDIDATES]
Vote prompt:
BPs Vote
V
V
V
V
V
V
BP
Predict
Brush border oligopeptide translocation
BP
Ground
Truth
Oligopeptide transmembrane transport
Output
Ground Truth
Cosine
(
BP
Predict
,
BP
Ground
Truth
) = 0.697
Similarity
Similarity Percentile
Percentile
(
BP
Predict
,
BP
All
) = 0.999
Validation
GPT
-
4
GPT
-
4
GPT
-
4
GPT
-
4
GPT
-
4
is a
part of
has part
regulates
Edge semantics
between BPs
Gene Set
Given a set of genes and proposed biological
processes describing the system, your task is
to generate more specific biological
processes describing the system.
Here is the set of genes: Genes: [GENE
SET].
Proposed BPs: [BPs]
Subsequent prompt:
BPs Candidate Generation
S
BP
1
BP
3
BP
2
BP
11
BP
12
BP
13
BP
21
BP
22
BP
23
BP
111
BP
112
BP
113
BP
121
BP
122
BP
123
BP
1111
BP
1112
BP
1113
BP
1221
BP
1222
BP
1223
Figure 1: The flowchart presents the application of the Thought Graph to the Gene Ontology (GO) database. First, Thought
Graph uses a gene set and initial prompt to generate three Biological Processes (BPs). Then, a voter evaluates and selects the
best BP (dark green) and second best BP (light green), which are more accurately descriptive of the gene set. Each chosen
BP, along with a subsequent prompt, is utilized to generate two additional, more specific BPs. This procedure is conducted
recursively until Thought Graph has reached five layers. Finally, a voter chooses the final answer from the last layer.
and “part-of” among various thought steps. This strategy not only
facilitates complex decision-making processes but also ensures a
more nuanced and interconnected understanding of biological sys-
tems, facilitating data interoperability and knowledge integration.
Our novel contributions can be summarized as follows:
(1)
We propose Thought Graph as a complex reasoning frame-
work that generates diverse yet precise entities to tackle
potential annotations discrepancies in biological processes.
(2)
Thought Graph can generate thought graphs with edge se-
mantics by recalling external knowledge (e.g., Gene Ontol-
ogy) to build rich semantics among thought steps.
(3)
We have successfully applied Thought Graph in biological
process generation with significant improvement compared
to SOTA methods, surpassing GSEA by 40.28% and LLM base-
lines by 5.38% in cosine similarity score, and identified the
optimal steps of complex reasoning by balancing specificity
and accuracy.
2 RELATED WORK
2.1 LLM Reasoning
Prompt strategies attempt to decompose a complicated problem into
a sequence of smaller sub-problems to make the problem more man-
ageable [
12
]. One popular line of study is the Chain-of-Thought
(CoT) [
9
] series, structuring prompts to encourage the LLM to
step through its reasoning process, such as Least-to-Most prompt-
ing [
12
], and Self-Consistency with CoT (CoT-SC) [
8
]. However,
these prompting strategies only utilize linear reasoning paths and
struggle in tasks that require exploration and strategic lookahead.
Alternatively, Tree of Thoughts (ToT) [
11
] and Graph of Thoughts
(GoT) [
3
] excel in these sorts of tasks. LLM-based prompting frame-
works’ effectiveness is hindered by inherent limitations such as
self-bias and hallucination. To address this, our work introduces the
semantics of edges within our Thought Graph through in-context
learning, offering structural information.
2.2 Knowledge Graph for LLM Reasoning
LLMs exhibit limitations in integrating new knowledge and occa-
sionally generate hallucinations. A survey [
1
] on knowledge-graph-
based knowledge augmentation in LLMs reveals using knowledge
graphs (KGs) as a source of external information has promising
results in reducing hallucinations. For example, MindMap [
10
] has
developed a prompt pipeline enabling LLMs to comprehend and
integrate KG input with their implicit knowledge. In our approach,
we give LLM examples from the gene ontology knowledge graph
to enable the edge semantics.
2.3 LLM Reasoning in Biomedical Domain
With the rise of LLMs, recent studies explore LLMs’ application
in various biomedical tasks. The gene set biological process was
538
Thought Graph: Generating Thought Process for Biological Reasoning
WWW ’24 Companion, May 13–17, 2024, Singapore, Singapore.
formulated by [
5
] as inputting a gene set to an LLM and outputting
a biological process name that is predominant in the system and
correctly describing the function of the gene set. It’s challenging
because it requires the LLM to accurately understand and interpret
complex biological concepts, including the nuanced roles of genes
in various cellular contexts and their interactions within intricate
biological networks. Although their results [
5
] have shown that
GPT-4 provides better biological process names than the conven-
tional Gene Set Enrichment Analysis (GSEA) [
7
], the performance
is still far from perfect.
3 METHODOLOGY
3.1 Problem Formulation
Given a gene set
=
{
1
,푥
2
, ...,푥
}
, where each
is a gene, the
objective
=
(
)
is to design a framework
to generate a tree
structure graph
=
(
푁,퐸
)
that represents the terms (e.g., biological
processes or pathways) associated with the genes in
. In this graph,
is the set of nodes, and
is the set of edges between these nodes.
3.2 Infrastructure of Thought Graph
Our framework Thought Graph adapts ToT [
11
] as a graph gen-
erator to generate a curated tree graph
, named Thought Graph.
Thought Graph contains terms as the nodes
and their dependen-
cies as edges
. ToT uses self-reflection to prune and only explore
relevant paths. The result, after exploration, is a graph Thought
Graph that illustrates the reasoning path and a final answer selected
from the last layer of the graph as the term that best describes the
gene set
=
{
1
,푥
2
, ...,푥
}
.
3.2.1 Thoughts expansion.
Thought Graph process with
steps
proceeds in a breadth-first fashion to generate a tree of depth
.
At each step, the process expands the tree by generating a set of
candidate nodes. The first step generates a set of general “high-level”
terms that describe the gene set, and subsequent steps iterate on
the candidate terms by proposing more specific but related terms.
Step 1 (Initial Expansion).
The first step is unique from all
subsequent steps because its task is to generate the initial set of
candidate terms
=
1
1
, . . .,푡
1
, where
denotes the term
from
layer
. This set of candidate terms is generated with an “initial
prompt” that takes the gene set as input:
(
1
1
...푘
|
1
. . .,푥
)
.
Subsequent Steps (Recursive Expansion).
In step
, we use
a Voter (
) to examine and vote across the candidate terms
:
(
,푇
)(
)
=
1
[
=
∗]
, where a good term
∗∼
푣표푡푒
(
∗|
)
is based on comparing the candidate terms
in the vote prompt,
and select two best terms. For each selected term from the previous
step,
1
and gene set
are added to the “subsequent prompt” for
the LLM generates
new terms:
{
1
, . . .,푡
}∼
(
1
...푘
|
1
...푛
,푡
1
)
.
This process will be conducted recursively for
1
times (minus
the initial expansion). For the final layer,
1
are presented to
the LLM to choose the final answer.
3.3 Thought Graph
The Thought Graph output provides a representation of the step-
wise reasoning process and integrates edge and node semantics
for domain-specific context. Each node
is a unique bio-
logical process, arranged hierarchically to reflect varying levels of
specificity. The edges
represent the relationships between these
processes. Specifically, we use four pre-defined relations from the
Gene Ontology (GO):
is a
,
part of
,
has part
, and
regulates.
These
relations establish a hierarchy where, for instance, if A
is a
subtype
of B, A is deemed more specific than B. This approach helps to
elucidate the nuanced relationships between different biological
processes, as detailed in the GO database.
1
4 EXPERIMENT & EVALUATION
4.1 Data Collection
The GO database [
2
] forms the basis of our study. We specifically
use a dataset compiled by Hu et al. [
5
] from the Biological Process
branch of Gene Ontology consisting of 12,214 human gene sets,
each annotated with a biological process name and description.
Due to constraints in financial and computational resources, we
randomly select 100 samples from this dataset for evaluation.
4.2 Baselines and Model Description
Our evaluation framework includes one domain-specific tool and
five LLM baselines. GSEA (gene set enrichment analysis) [
7
] is
a statistical method for associating the expression of groups of
genes with biological processes. Our LLM baselines involve differ-
ent approaches. Input-Output (IO) Prompting with zero-shot and
zero-shot-9 prompts generate one and nine unique terms for a single
gene set, respectively, with no examples, while few-shot includes
five question-answer examples. Chain-of-Thought (CoT) employs
the two top pathways from Thought Graph for detailed step-by-step
prompting. The approach by Hu et al. [
5
] integrates expert-curated
prompts with specific guidelines that solicit post-hoc critical anal-
ysis. For all LLM instances, we use GPT-4 (
gpt-4-1106-preview
) in
Chat Completion mode with temperature 0.7. In Thought Graph,
we set the number of steps to five and vote on two samples at each
step to proceed.
4.3 Evaluation Methods
We use two evaluation metrics: cosine similarity and similarity
percentile. Cosine similarity measures the semantic similarity of the
predicted term to the ground-truth term from 0 (no similarity) to 1
(identical). We calculate similarity using embeddings from SapBERT
[
6
], a masked language model trained to model medical entity
relations. After calculating the similarity between the predicted
and ground-truth terms, we also calculate the similarity between
the predicted term and all 12,214 terms in our dataset to form
a null distribution. The percentile score is the percentile of the
similarity between the predicted and ground-truth terms in our null
distribution. We also include the proportion of similarity percentiles
greater than 99% as a proxy for accuracy.
Among the nine nodes that receive positive votes (indicated as
green nodes in Fig. 1), the one with the highest similarity score is
selected as the best score (b), while the score of the node predicted
by Thought Graph is recorded as the predicted score (p). To estab-
lish a fair baseline comparison, we implemented IO zero-shot-9 to
generate nine answers, and select the best for evaluation.
1
https://geneontology.org/docs/ontology-relations/
539
WWW ’24 Companion, May 13–17, 2024, Singapore, Singapore.
Hsu, et al.
1
2
3
4
5
Layer Number
0.0
0.2
0.4
0.6
0.8
1.0
Mean Similarity Score
Thought Graph Distribution of Mean Similarity by Layer
Figure 2: The distribution of the mean similarity score at
each layer using Thought Graph (p). The blue line denotes
the median of layer 3.
Method
Similarity
Percentile
Percentile > 99%
GSEA
24.78%
52.00%
17%
IO zero-shot
45.75%
77.00%
27%
IO zero-shot-9 (b)
59.68%
91.42%
61%
IO few-shot
48.73%
81.85%
32%
CoT
28.83%
43.71%
0%
Hu et al. [5]
52.31%
84.44%
43%
Thought Graph (p)
48.53%
80.90%
42%
Thought Graph (b)
65.06%
95.05
%
65%
Table 1: Mean cosine similarity, mean cosine similarity per-
centile, and proportion percentile above 99% of a domain-
specific tool and seven LLM methods on 100 GO data samples.
4.4 Performance Evaluation
Overall Performance:
Table 1 indicates that Thought Graph (b)
achieves the top performance in both cosine similarity (65.06%) and
similarity percentile (95.05%). In particular, we want to posit that
IO zero-shot learning emphasizes coverage across a wide range
of biological process names (diversity), while the CoT focuses on
an in-depth exploration of these names (specificity), whereas our
framework is designed to balance both. Thought Graph (b) outper-
forms IO zero-shot-9 (b) and CoT, indicating that depth without
breadth, or vice versa, is insufficient. Thought Graph and other LLM
baselines outperform GSEA, and we also noticed GSEA cannot pro-
vide any terms for 26% of the time, highlighting the advantage
of the LLMs. In addition, Thought Graph (p) scores lower than
few-shot and Hu et al. baselines. This may result from our deci-
sion to constrain the final answer to the last layer. However, that
Thought Graph (b) outperforms all baselines, including zero-shot-9,
assures us that our approach to generating candidate sets of terms
is promising, and that it is adept at generating a correct answer, but
further optimization is needed.
Thought Graph Analysis:
Layer-by-layer analysis in Fig.2
demonstrates increasing performance from layers 1 to 3, followed
by a decrease in layers 4 and 5. This trend suggests a trade-off
between specificity and accuracy, with layer 3 the optimal level by
a small margin. While the performance at layer 1 is lower, this is
largely because our initial prompt specifically requests “high-level”
terms, generating only three of them. As expected, the variance
in mean similarity scores increases with the number of layers, as
deeper layers explore deeper and more distant parts of the ontol-
ogy, but stabilize after layer 3. In the latter layers, more specific
terms are often voted out in favor of more accurate, general terms,
demonstrating the ability of the voting mechanism to dynamically
moderate specificity. Though our results reflect a modest sample
size, layer 3 emerges as an early candidate for the optimal depth.
5 CONCLUSION
Thought Graph represents an advancement in the field of gene
ontology and bioinformatics. Integrating gene set analysis with
semantic graphs allows for a more nuanced and comprehensive
understanding of biological processes. The effectiveness of the
Thought Graph in mapping complex gene interactions and func-
tions has been demonstrated, showing its potential to outperform
existing methods. This novel method not only enhances the ac-
curacy of gene set analysis but also opens avenues for research
in understanding genetic influences on various BPs. Future work
can expand on this foundation, exploring broader applications and
measuring uncertainty in complex reasoning.
6 ACKNOWLEDGEMENT
We thank the support from NIH (OTA-21-008, R01LM014306-01)
and NSF (NSF 2303038, NSF 2333703).
REFERENCES
[1]
Garima Agrawal, Tharindu Kumarage, Zeyad Alghami, and Huan Liu.
2023. Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey.
arXiv:2311.07914 [cs.CL]
[2]
M. Ashburner, C.A. Ball, Judith Blake, David Botstein, Heather Butler, and J.
Cherry. 2000. Gene ontology: Tool for the unification of biology.
The Gene
Ontology Consortium. Nat Genet
25 (01 2000), 25–29.
[3]
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi,
Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski,
Piotr Nyczyk, and Torsten Hoefler. 2023. Graph of Thoughts: Solving Elaborate
Problems with Large Language Models. arXiv:2308.09687 [cs.CL]
[4] Payal Chandak, Kexin Huang, and Marinka Zitnik. 2023. Building a knowledge
graph to enable precision medicine.
Scientific Data
10, 1 (2023), 67.
[5]
Mengzhou Hu, Sahar Alkhairy, Ingoo Lee, Rudolf T. Pillich, Robin Bachelder,
Trey Ideker, and Dexter Pratt. 2023. Evaluation of large language models for
discovery of gene set function. arXiv:2309.04019 [q-bio.GN]
[6]
Fangyu Liu, Ehsan Shareghi, Zaiqiao Meng, Marco Basaldella, and Nigel Col-
lier. 2021. Self-Alignment Pretraining for Biomedical Entity Representations.
arXiv:2010.11784 [cs.CL]
[7]
Aravind Subramanian, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee,
Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, Scott L. Pomeroy,
Todd R. Golub, Eric S. Lander, and Jill P. Mesirov. 2005.
Gene set en-
richment analysis: A knowledge-based approach for interpreting genome-
wide expression profiles.
Proceedings of the National Academy of Sci-
ences
102, 43 (2005), 15545–15550.
https://doi.org/10.1073/pnas.0506580102
arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.0506580102
[8]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang,
Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain
of Thought Reasoning in Language Models. arXiv:2203.11171 [cs.CL]
[9]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia,
Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits
Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]
[10]
Yilin Wen, Zifeng Wang, and Jimeng Sun. 2023.
MindMap: Knowledge
Graph Prompting Sparks Graph of Thoughts in Large Language Models.
arXiv:2308.09729 [cs.AI]
[11]
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao,
and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving
with Large Language Models. arXiv:2305.10601 [cs.CL]
[12]
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang,
Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, and Ed Chi. 2023.
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.
arXiv:2205.10625 [cs.AI]
540