Master thesis defense by Midori Kato

Towards Self-Supervised Discoveries with LHC Open Data

Abstract

The recent success of large language models across various scientific domains has prompted active development of foundation model methodology for particle physics, both for general purpose representation learning and for searches of physics beyond the Standard Model (BSM). Progress along this line faces several field-specific obstacles. High-precision generation of physics observables remains difficult, most existing models are trained for a single downstream task, and the great majority of published results rely on clean Monte Carlo simulations rather than recorded collider data.

In this thesis we introduce a Flow Matching Transformer diffusing on the on-shell manifolds of physics objects, trained directly on the ATLAS Open Data 13 TeV pp release. A single instance of the model, trained once without any task-specific supervision, recovers a broad range of physical structure. The single particle kinematic distributions of electrons, muons, tau, photons, jets, and large-R jets are reproduced with reasonable accuracy. Inter-particle observables not seen as explicit training targets, including the dilepton angular separations ∆R and ∆φ and the dijet pseudorapidity, are learned from the data alone. The same model also reconstructs the J/ψ, Υ, and Z resonance peaks in the opposite-sign same-flavor (OSSF) dilepton invariant-mass spectrum without ever being given the masses of those particles.

These results suggest that a single generative model can internalise non-trivial physical relationships from real collider data.
The trained model can serve as a natural starting point for a wider class of BSM search and event-generation tasks that have so far required dedicated analyses.

Zoom Link

https://us04web.zoom.us/j/78415738811?pwd=xkmDCaq6uJbn7bMcuvOI6EGMq17PBj.1

Supervisors

Oleg Ruchayskiy, Inar Timiryasov

Censor

Rasmus Mackeprang

Niels Bohr Institute