Overview
Standard Model (SM) of particle physics describes the interactions of fundamental particles in terms of an underlying quantum field theory. It would be difficult to overstate the success of the SM; however, the SM is not a complete theory of nature. For example, it does not explain the existence of dark matter or the dominance of matter over anti-matter observed in the universe today. The Large Hadron Collider (LHC) was built to search for solutions to these puzzles, and to find and study the last missing piece of the SM, the Higgs boson. The first data-taking run (Run 1, 2010-2012) of the LHC was a huge success, producing over 1000 journal articles, highlighted by the discovery of the Higgs boson. Run 2 (2015–present) has already produced many world-leading results. However, despite some anomalous observations, no indisputable evidence for physics beyond the Standard Model (BSM) has been reported. The primary goal of this project is developing and deploying software utilizing Machine Learning (ML) that will enable the LHCb experiment to significantly extend its BSM physics reach in Run 3 by allowing it to more fully process a much larger data set.
The data sets collected by the LHC experiments are some of the largest in the world. For example, the sensor arrays of the LHCb experiment, on which both PIs work, produce about 100 terabytes of data per second, close to a zettabyte of data per year. Even after drastic data-reduction performed by custom-built read-out electronics, the data volume is still about 10 exabytes per year, comparable to the largest-scale industrial data sets. Such large data sets cannot be stored indefinitely; therefore, all high energy physics (HEP) experiments employ a data-reduction scheme executed in real time by a data-ingestion system—referred to as a trigger system in HEP—to decide whether each event is to be persisted for future analysis or permanently discarded. Trigger-system designs are dictated by the rate at which the sensors can be read out, the computational power of the data-ingestion system, and the available storage space for the data.
Many potential explanations for dark matter and the matter/anti-matter asymmetry of our universe are currently inaccessible due to trigger-system limitations. As HEP computing budgets are projected to be nearly uniform moving forward, the LHCb trigger system must be redesigned for the experiment to realize its potential. This redesign must go beyond scalable technical upgrades; radical new strategies are needed.
The LHCb detector is being upgraded for Run 3 (2021–2023), when the trigger system will need to process 25 exabytes per year. Currently, only 0.3 of the 10 exabytes per year processed by the trigger is analyzed using high-level computing algorithms; the rest is discarded prior to this stage using simple algorithms executed on FPGAs. To process all the data on CPU farms, ML will be used to develop and deploy new trigger algorithms. The specific objectives of this proposal are to more fully characterize LHCb data using ML and build algorithms using these characterizations
- to replace the most computationally expensive parts of the event pattern recognition; and
- to increase the performance of the event-classification algorithms; and
- to reduce the number of bytes persisted per event without degrading physics performance.
In addition to enabling the physics programs of the PIs, these advances are critically important to enabling the full physics program of the entire 800-member LHCb collaboration which publishes more than 50 physics papers per year. The project will also serve as a “proof-of-principle” for the ATLAS and CMS experiments whose major software upgrades will take place between Runs 3 and 4, at which point they will need to address the same qualitative issues LHCb is addressing now. Similarly, the success of this project will inform the design of trigger systems for the proposed Electron-Ion Colliders.