CaltechAUTHORS
  A Caltech Library Service

LHC physics dataset for unsupervised New Physics detection at 40 MHz

Govorkova, Ekaterina and Puljak, Ema and Aarrestad, Thea and Pierini, Maurizio and Woźniak, Kinga Anna and Ngadiuba, Jennifer (2022) LHC physics dataset for unsupervised New Physics detection at 40 MHz. Scientific Data, 9 . Art. No. 118. ISSN 2052-4463. PMCID PMC9070018. doi:10.1038/s41597-022-01187-8. https://resolver.caltech.edu/CaltechAUTHORS:20220405-282432600

[img] PDF - Published Version
Creative Commons Attribution.

1MB
[img] PDF - Submitted Version
See Usage Policy.

2MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220405-282432600

Abstract

In the particle detectors at the Large Hadron Collider, hundreds of millions of proton-proton collisions are produced every second. If one could store the whole data stream produced in these collisions, tens of terabytes of data would be written to disk every second. The general-purpose experiments ATLAS and CMS reduce this overwhelming data volume to a sustainable level, by deciding in real-time whether each collision event should be kept for further analysis or be discarded. We introduce a dataset of proton collision events that emulates a typical data stream collected by such a real-time processing system, pre-filtered by requiring the presence of at least one electron or muon. This dataset could be used to develop novel event selection strategies and assess their sensitivity to new phenomena. In particular, we intend to stimulate a community-based effort towards the design of novel algorithms for performing unsupervised new physics detection, customized to fit the bandwidth, latency and computational resource constraints of the real-time event selection system of a typical particle detector.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1038/s41597-022-01187-8DOIArticle
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9070018/PubMed CentralArticle
https://arxiv.org/abs/2107.02157arXivDiscussion Paper
ORCID:
AuthorORCID
Govorkova, Ekaterina0000-0003-1920-6618
Puljak, Ema0000-0002-6011-9965
Aarrestad, Thea0000-0002-7671-243X
Pierini, Maurizio0000-0003-1939-4268
Woźniak, Kinga Anna0000-0002-4395-1581
Ngadiuba, Jennifer0000-0002-0055-2935
Additional Information:© The Author(s) 2022. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Received 02 September 2021; Accepted 02 February 2022; Published 29 March 2022. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772369) and the ERC-POC programme (grant No. 996696). Contributions: J.N. conceived the idea of publishing the dataset and creating a data challenge on it; M.P. created the data in raw format; E.P. and E.G. applied the event selection and produced the dataset in its final format; T.A., J.N. and K.W. conceived the package with example code; E.P. designed the example autoencoder; all drafted the paper. The authors declare no competing interests.
Funders:
Funding AgencyGrant Number
European Research Council (ERC)772369
European Research Council (ERC)996696
Subject Keywords:Experimental particle physics; Research data
PubMed Central ID:PMC9070018
DOI:10.1038/s41597-022-01187-8
Record Number:CaltechAUTHORS:20220405-282432600
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20220405-282432600
Official Citation:Govorkova, E., Puljak, E., Aarrestad, T. et al. LHC physics dataset for unsupervised New Physics detection at 40 MHz. Sci Data 9, 118 (2022). https://doi.org/10.1038/s41597-022-01187-8
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:114151
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:06 Apr 2022 15:57
Last Modified:09 May 2022 16:32

Repository Staff Only: item control page