A Caltech Library Service

Large Scale Job Management and Experience in Recent Data Challenges within the LHC CMS experiment

Evans, D. and Mason, D. and Gutsche, O. and Metson, S. and Wakefield, S. and Hufnagel, D. and Hassan, A. and Mohapatra, A. and Miller, M. and van Lingen, F. (2008) Large Scale Job Management and Experience in Recent Data Challenges within the LHC CMS experiment. In: XII Advanced Computing and Analysis Techniques in Physics Research (ACAT08). Proceedings of Science. No.070. SISSA , Trieste, Italy, Art. No. 032.

[img] PDF - Published Version
Creative Commons Attribution Non-commercial Share Alike.


Use this Persistent URL to link to this item:


From its conception the job management system has been distributed to increase scalability and robustness. The system consists of several applications (called ProdAgents) which manage Monte Carlo, reconstruction and skimming jobs on collections of sites within different Grid environments (OSG, NorduGrid, LCG) and submission systems such as GlideIn, local batch, etc... Production of simulated data in CMS mainly takes place on so called Tier2s (small to medium size computing centers) resources. Approximately ~50% of the CMS Tier2 resources are allocated to running simulation jobs. While the so-called Tier1s (medium to large size computing centers with high capacity tape storage systems) will be mainly used for skimming and reconstructing detector data. During the last one and a half years the job management system has been adapted such that it can be configured to convert Data Acquisition (DAQ) / High Level Trigger (HLT) output from the CMS detector to the CMS data format and manage the real time data stream from the experiment. Simultaneously the system has been upgraded to facilitate the increasing scale of the CMS production and adapting to the procedures used by its operators. In this paper we discuss the current (high level) architecture of ProdAgent, the experience in using this system in computing challenges, feedback from these challenges, and future work including migration to a set of core libraries to facilitate convergence between the different data management projects within CMS that deal with analysis, simulation, and initial reconstruction of real data. This migration is important, as it will decrease the code footprint used by these projects and increase maintainability of the code base.

Item Type:Book Section
Related URLs:
URLURL TypeDescription
Additional Information:Copyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-ShareAlike. This work is partly supported by US Department of Energy grant DOE DE-FG02-06ER86271 and US National Science Foundation grant NSF PHY-0533280. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and don’t necessarily reflect the views of the Department of Energy or NSF.
Funding AgencyGrant Number
Department of Energy (DOE)DE-FG02-06ER86271
Series Name:Proceedings of Science
Issue or Number:070
Record Number:CaltechAUTHORS:20180910-131321917
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:89499
Deposited By: George Porter
Deposited On:10 Sep 2018 21:05
Last Modified:16 Nov 2021 00:35

Repository Staff Only: item control page