SciPost Phys. 18, 040 (2025)
Learning the simplicity of scattering amplitudes
Clifford Cheung
1
⋆
, Aurélien Dersy
2,3
†
and Matthew D. Schwartz
2,3
‡
1
Walter Burke Institute for Theoretical Physics,
California Institute of Technology, 91125 Pasadena, CA, USA
2
Department of Physics, Harvard University, 02138 Cambridge, MA, USA
3
NSF Institute for Artificial Intelligence and Fundamental Interactions
⋆
clifford.cheung@caltech.edu
, †
adersy@g.harvard.edu
, ‡
schwartz@g.harvard.edu
Abstract
The simplification and reorganization of complex expressions lies at the core of scientific
progress, particularly in theoretical high-energy physics. This work explores the appli-
cation of machine learning to a particular facet of this challenge: the task of simplifying
scattering amplitudes expressed in terms of spinor-helicity variables. We demonstrate
that an encoder-decoder transformer architecture achieves impressive simplification ca-
pabilities for expressions composed of handfuls of terms. Lengthier expressions are
implemented in an additional embedding network, trained using contrastive learning,
which isolates subexpressions that are more likely to simplify. The resulting framework
is capable of reducing expressions with hundreds of terms—a regular occurrence in
quantum field theory calculations—to vastly simpler equivalent expressions. Starting
from lengthy input expressions, our networks can generate the Parke-Taylor formula
for five-point gluon scattering, as well as new compact expressions for five-point am-
plitudes involving scalars and gravitons. An interactive demonstration can be found at
https:
//
spinorhelicity.streamlit.app.
Copyright C. Cheung
et al
.
This work is licensed under the Creative Commons
Attribution 4.0 International License.
Published by the SciPost Foundation.
Received
Accepted
Published
2024-09-06
2024-12-19
2025-02-03
Check for
updates
doi:10.21468
/
SciPostPhys.18.2.040
Contents
1 Introduction
2
2 Notation and training data
4
2.1 Spinor-helicity formalism
5
2.2 Target data set
6
2.3 Input data set
7
2.4 Analytic simplification
9
3 One-shot learning
9
3.1 Network architecture
10
3.2 Results
11
3.3 Embedding analysis
14
1
SciPost Phys. 18, 040 (2025)
4 Sequential simplification
15
4.1 Contrastive learning
16
4.2 Grouping terms
17
4.3 Simplifying long expressions
19
4.4 Physical amplitudes
20
5 Conclusion
22
A Parsing amplitudes
23
B Training data composition
24
C Nucleus sampling calibration
25
D Training on intricate amplitudes
26
E Integer embeddings
27
F Cosine similarity for dissimilar terms
28
G Physical amplitudes
29
References
38
1 Introduction
The modern scattering amplitude program involves both the computation of amplitudes as well
as the study of their physical properties. Are there better, more efficient, or more transparent
ways to compute these objects? The dual efforts to devise powerful techniques for practical
calculation and to then use those results to glean new theoretical structures have led to sus-
tained progress over the last few decades. An archetype of this approach appears in the context
of QCD, whose Feynman diagrams yield famously cumbersome and lengthy expressions. For
example, even for the relatively simple process of tree-level, five-point gluon scattering, Feyn-
man diagrams produce hundreds of terms. However, in the much-celebrated work of Parke and
Taylor
[
1
]
, it was realized that this apparent complexity is illusory. These hundreds of terms at
five-point—and more generally, for any maximally helicity violating configuration—simplify
to a shockingly compact monomial formula,
A
(
1
+
2
+
3
+
···
i
−
···
j
−
···
n
+
)=
〈
i j
〉
4
〈
12
〉〈
23
〉···〈
n
1
〉
,
(1)
shown here in its color-ordered form. The simplicity of the Parke-Taylor formula strongly
suggests an alternative theoretical framework that directly generates expressions like Eq. (1)
without the unnecessarily complicated intermediate steps of Feynman diagrams.
This essential fact—that on-shell scattering amplitudes are simple and can illuminate hid-
den structures in theories—has led to new physical insights. Indeed, shortly after
[
1
]
it was
realized that Eq. (1) also describes the correlators of a two-dimensional conformal field the-
ory
[
2
]
, which is a pillar of the modern-day celestial holography program
[
3
]
. Much later,
Witten deduced from Eq. (1) that Yang-Mills theory is equivalent to a certain topological string
2
SciPost Phys. 18, 040 (2025)
theory in twistor space
[
4
]
, laying the groundwork for a vigorous research program that even-
tually led to the twistor Grassmanian formulation
[
5,6
]
and amplituhedron
[
7
]
. Examples like
this abound in the amplitudes program—structures like double copy
[
8, 9
]
and the scattering
equations
[
10–12
]
were all derived from staring directly at amplitudes, rather than from the
top-down principles of quantum field theory.
Progress here has hinged on the existence of
simple
expressions for on-shell scattering
amplitudes. We are thus motivated to ask whether there is a more systematic way to recast
a given expression from its raw form into its most compact representation. For example, a
complicated spinor-helicity expression can often be simplified through repeated application of
Schouten identities
|
1
〉〈
23
〉
+
|
2
〉〈
31
〉
+
|
3
〉〈
12
〉
=
0 ,
(2)
together with total momentum conservation of
n
-point scattering
|
1
〉
[
1
|
+
|
2
〉
[
2
|
+
···
+
|
n
〉
[
n
|
=
0 .
(3)
However, the search space for these operations is expansive and difficult to navigate even
with the help of existing computer packages
[
13, 14
]
, and, to our knowledge, there exists
no canonical algorithmic way to inform which operations simplify complicated expressions
analytically. This is where recent advances in machine learning (ML) offer a natural advantage.
The role of ML in high-energy physics has grown dramatically in recent years
[
15
]
. In the
field of scattering amplitudes, much of the work to date has focused on reproducing the nu-
merical output of these amplitudes using neural networks
[
16–19
]
. However, recent advances
in ML have led to the development of powerful architectures, capable of handling increasingly
complex datasets, including those that are purely symbolic. In particular, the transformer
architecture
[
20
]
has allowed for practical applications across a wide range of topics, includ-
ing jet tagging
[
21
]
, density estimation for simulation
[
22, 23
]
, and anomaly detection
[
24
]
.
The appeal of transformers comes from their ability to create embeddings for long sequences
which take into account all of the objects composing that sequence. In natural language pro-
cessing, where transformers first originated, this approach encodes a sentence by mixing the
embeddings of all of the words in the sentence. These powerful representations have been a
key driver for progress in automatic summarization, translation tasks, and natural language
generation
[
25–27
]
. Since mathematical expressions can also be understood as a form of
language, the transformer architecture has been successfully repurposed to solve certain in-
teresting mathematical problems. For those problems, the validity of a model’s output can
often be confirmed through explicit numerical evaluation of the symbolic result, allowing one
to easily discard any model hallucinations. From symbolic regression
[
28
]
to function integra-
tion
[
29
]
, theorem proving
[
30
]
, and the amplitudes bootstrap
[
31
]
, transformers have proven
to be effective in answering questions that are intrinsically analytical rather than numerical. In
particular, transformers have been adapted to simplify short polylogarithmic expressions
[
32
]
and it is natural to expect that the same methodology can be extended to our present task,
which is the simplification of spinor-helicity expressions.
A common bottleneck for transformer-based approaches is the length of the mathemati-
cal expression that can be fed through the network. Typical amplitude expressions can easily
have thousands of distinct terms and processing the whole expression at once quickly be-
comes intractable. The self-attention operation in a transformer scales quadratically in time
and memory with the sequence length and it is therefore most efficiently applied to shorter ex-
pressions. For instance, the Longformer and BigBird architectures
[
33,34
]
implement reduced
self-attention patterns, using a sliding window view on the input sequence and resorting to
global attention only for a few select tokens. In the context of simplifying mathematical ex-
pressions, it is quite clear that humans proceed similarly: we start by identifying a handful
of terms that are likely to combine and then we attempt simplification on this subset. In this
3
SciPost Phys. 18, 040 (2025)
Simplify
Transformer
〈
12
〉〈
34
〉
[13][25]
Simplify
Transformer
〈
14
〉〈
34
〉
[13][45]
+
Complicated Amplitude
〈
13
〉〈
24
〉
[13][25]
− 〈
14
〉〈
23
〉
[13][25]
+
〈
14
〉〈
34
〉
[25][34] +
〈
14
〉〈
34
〉
[23][45]
+
〈
14
〉〈
24
〉
[12][45] +
· · ·
Projection
Transformer
Iteration Loop
Figure 1: Spinor-helicity expressions are simplified in several steps. To start, indi-
vidual terms are projected into an embedding space (grey sphere). Using contrastive
learning, we train a “projection” transformer encoder to learn a mapping that groups
similar terms close to one another in the embedding space. After identifying similar
terms we use a “simplify” transformer encoder-decoder to predict the corresponding
simple form. After simplifying all distinct groups, this procedure is repeated with the
resulting expression, iterating until no further simplification is possible.
paper, we mimic this procedure by leveraging contrastive learning
[
35–39
]
. As illustrated in
Fig. 1 we train a network to learn a representation for spinor-helicity expressions in which
terms that are likely to simplify are close together in the learned embedding space. Grouping
nearby terms, we then form a subset of the original expression which is input into yet another
transformer network trained to simplify more moderately-sized expressions. By repeating the
steps of grouping and simplification we are then able to reduce spinor-helicity expressions with
enormous numbers of distinct terms.
Our paper is organized as follows. We begin in Section 2 with a brief review of the spinor-
helicity formalism and its role in scattering amplitude calculations. We describe the physical
constraints that amplitudes must satisfy, as well as the various mathematical identities that
can relate equivalent expressions. In Section 3 we introduce a transformer encoder-decoder
architecture adapted to the simplification of moderately-sized spinor-helicity expressions. We
describe our procedure for generating training data and discuss the performance of our net-
works. Afterwards, in Section 4 we present the concept of contrastive learning and describe
how it arrives at a representative embedding space. We present an algorithm for grouping
subsets of terms that are likely to simplify in lengthier amplitude expressions. We then show-
case the performance of our full simplification pipeline on actual physical amplitudes, in many
cases composed of hundreds of terms.
1
Finally, we conclude with a brief perspective on the
prospects for ML in this area.
2 Notation and training data
In this section, we review the mechanics of the spinor-helicity formalism and then describe the
generation of training data for our models. Our notation follows
[
40
]
, though a more detailed
exposition can also be found in
[
41–43
]
and references within.
1
Our implementation, datasets and trained models are available at https:
//
github.com
/
aureliendersy
/
spinorhelicity. This repository also contains a faster local download of our online interactive demonstration, hosted
at https:
//
spinorhelicity.streamlit.app. This application reduces amplitudes following the procedure described in
Fig. (1) and has the ability to simplify the amplitude expressions quoted in this paper.
4
SciPost Phys. 18, 040 (2025)
2.1 Spinor-helicity formalism
The basic building blocks of spinor-helicity expressions are
helicity spinors
, which are two-
component objects whose elements are complex numbers. Left-handed spinors transform in
the
(
1
2
, 0
)
representation of the Lorentz group and are written as
λ
α
. Right-handed spinors
transform in the
(
0,
1
2
)
representation of the Lorentz group and are written as
̃
λ
̇
α
. A general
four-momentum transforms in the