Skip to content

stages.flux.daemon_flux module: remove numba jit decorators, clean up and add documentation#930

Open
thehrh wants to merge 6 commits intomasterfrom
daemon_flux_no_numba
Open

stages.flux.daemon_flux module: remove numba jit decorators, clean up and add documentation#930
thehrh wants to merge 6 commits intomasterfrom
daemon_flux_no_numba

Conversation

@thehrh
Copy link
Copy Markdown
Collaborator

@thehrh thehrh commented Apr 22, 2026

On top of resolving #929, this PR fixes missing documentation and cleans up the module.

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 22, 2026

Failing workflows are due to #927

thehrh added 2 commits April 22, 2026 21:45
…owed any longer), no need to set calc_mode in setup_function, cosmetics
…> 5 statements), no need to set representation in compute_function, turn default energy grid into global variable with units [no ci]
@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 23, 2026

Timings as of now:

branch (single CPU): flux daemon_flux
- setup:    Total time (s): +0.000, n calls: 1
- compute:  Total time (s): 74.671, n calls: 100, time/call (s): mean 0.747, max. 0.784, min. 0.735
- apply:    Total time (s): +0.000, n calls: 100, time/call (s): mean +0.000, max. +0.000, min. +0.000

master (single CPU): flux daemon_flux
- setup:    Total time (s): +0.000, n calls: 1
- compute:  Total time (s): 76.168, n calls: 100, time/call (s): mean 0.762, max. 2.039, min. 0.738
- apply:    Total time (s): +0.000, n calls: 100, time/call (s): mean +0.000, max. +0.000, min. +0.000

master (3 CPUs): flux daemon_flux
- setup:    Total time (s): +0.000, n calls: 1
- compute:  Total time (s): 76.544, n calls: 100, time/call (s): mean 0.765, max. 2.095, min. 0.740
- apply:    Total time (s): +0.000, n calls: 100, time/call (s): mean +0.000, max. +0.000, min. +0.000

Let's see the how much 1) the external flux calculation by daemonflux, 2) the splining, and 3) the spline evaluation contribute (calc_mode = events using the 3y DeepCore events file). Here's a fairly typical example of the durations:

daemonflux numu flux generation duration (s): 3.8e-02
splining duration (s): 1.6e-03
daemonflux antinumu flux generation duration (s): 3.8e-02
splining duration (s): 1.5e-03
daemonflux nue flux generation duration (s): 3.9e-02
splining duration (s): 1.6e-03
daemonflux antinue flux generation duration (s): 4.0e-02
splining duration (s): 1.5e-03
PISA spline evaluation time (s): 6.0e-01

One can tell that evaluating the splines in event-by-event mode takes roughly 75% of the total time.

If we compare the pipeline IceCube_3y_neutrinos_daemon.cfg to IceCube_3y_neutrinos.cfg, the latter has the services flux.honda_ip + flux.barr_simple instead of flux.daemon_flux and performs its flux computations on a 200 x 200 grid.

Between these two analysis configurations, the template generation time of the one using daemonflux is ~0.5 s slower on a single CPU.

If the daemonflux pipeline employs the same 200 x 200 grid instead, the above PISA spline evaluation time drops from 0.6 s-> 0.4 s (template generation time still ~0.3 s slower than for IceCube_3y_neutrinos.cfg).

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 29, 2026

@marialiubarska @jpyanez
Given the significant contribution of the daemon_flux service to template generation times (I presume not just for this 3-year DRAGON pipeline, timings above) and to inform possible searches for alternatives or optimisation attempts, I'm wondering which reasons guided the choice of creating a 2D interpolant with RectBivariateSpline here? For example, was speed a factor at all?

@marialiubarska
Copy link
Copy Markdown
Contributor

Hey, I'm not sure I understood if your question is on the reason we do 2d interpolation or on the choice of sepcific interpolation method.

The reason for 2d interpolation is speed, because of how daemonflux package uses two 1d interpolations, which is not ideal for our case where each neutrino has random coszen and energy, using it directly was much slower then re-interpolating in 2d for each set of updated parameters.

In terms of the specific method, I don't remember why RectBivariateSpline was chosen, I don't think I looked into spped comparison with other interpolation methods.

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 29, 2026

The choice of RectBivariateSpline is what I'm interested in, but the background on the 2D choice is useful to know also. I presume you can confirm that daemon_flux is a significant contributor to the time it takes to generate expected distributions during any given fit?

@marialiubarska
Copy link
Copy Markdown
Contributor

Hmm, I'm not 100% sure, but I think it's not just daemonflux, 2d interpolation does take at least comparable amount of time to the grid evaluation. It's just that it was much faster then looping over events.

There was a third method: for N events, sort them by energy and coszen into NxN grid, then use it to get flux from daemonflux, then keep the diagonal and discard other values. This approach was a bit faster then 2d interpolation, but we didn't use it because with larger numbers of events it could lead to memory issues

@marialiubarska
Copy link
Copy Markdown
Contributor

I remember we had a discussion with @afedynitch regarding this, maybe he also has some thoughts

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 29, 2026

Hmm, I'm not 100% sure, but I think it's not just daemonflux, 2d interpolation does take at least comparable amount of time to the grid evaluation. It's just that it was much faster then looping over events.

There was a third method: for N events, sort them by energy and coszen into NxN grid, then use it to get flux from daemonflux, then keep the diagonal and discard other values. This approach was a bit faster then 2d interpolation, but we didn't use it because with larger numbers of events it could lead to memory issues

Based on my timings above, obtaining the fluxes from the external daemonflux dependency and creating the RectBivariateSpline objects is much less important than evaluating the latter (all in event-by-event mode), which is why I'm interested in optimising the PISA side (=interpolants created from daemonflux.Flux.flux) at this point.

The other aspect is that the total template generation time on a single core is of the order of twice the daemon_flux PISA service's compute time , so an $X\,\%$ performance gain here should result in an overall $\sim X/2\,\%$ gain. I would assume this relation to be roughly independent of the underlying event sample.

Any timings from a fit including the daemon_flux service similar to https://icecube.github.io/pisa/notebooks/IceCube_3y_oscillations_example.html#profiling (which uses honda+barr instead) would be very useful to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants