Skip to content

Profiling / Optimization - to do list #30

@Jammy2211

Description

@Jammy2211

I'm now in the processing of profiling the code, and comparing run-speeds to FORTRAN. I'll note in this issue aspects of the code which are too slow, and how we can speed them up.

Convolution via frame convolver:

ORIGINAL:

psf = 21x21

lsst_solution: 0.7838027477264404
euclid_solution: 2.7468605041503906
hst_solution: 9.063872337341309
hst_up_solution: 20.911508083343506

psf = 41x41

lsst_solution: 3.622267246246338
euclid_solution: 10.063962459564209
hst_solution: 36.11542248725891
hst_up_solution: 121.993239402771

JITTED:

psf = 21x21

lsst_solution: 0.03985404968261719
euclid_solution: 0.09821200370788574
hst_solution: 0.30489659309387207
hst_up_solution: 0.7619879245758057

psf = 41x41

lsst_solution: 0.07627749443054199
euclid_solution: 0.1665787696838379
hst_solution: 0.45037412643432617 ( FORTRAN = 0.04)
hst_up_solution: 1.0230822563171387 (FORTRAN = 0.096)

NOTES:

  • Numba is key to speeding convolution up to reasonable run-times.
  • The frame convolver is x10 slower than FORTRAN - and is giving run-speeds that will be problematic for a realistic lens analysis.

The frame convolver should not be slower than FORTRAN after jitting - as the FORTRAN code includes the process of mapping the image to 2D for convolution and reducing it back to 1D.

The reason the frame convolver is slower is because of its large number of -1 entries, which represents pixel in the frame that are outside of the masked region. The code reads every -1 and then continues onto the next index. Below is the fraction of values for each frame array that are -1s:

LSST:

frame_array: 0.4806965585227189
buring_frame_array: 0.8761812348231439

Euclid

frame_array: 0.24736280724621756
buring_frame_array: 0.8446561570493754

HST:

frame_array: 0.12451287733922718
buring_frame_array: 0.826131601559918

HST_up:

frame_array: 0.07482219308446714
buring_frame_array: 0.8173658286983688

Therefore, we need a new data representation that removes all -1's, albeit we need to think carefully about memory.

I suggest that we remove all of the -1's and store a new list of arrays which represent, for every element in frame_array / blurring_frame_array, the index of the kernel they map too. Then, instead of iterating over the length of the kernel in the frame convolver, we would iterative over the length of each frame_array / blurring_frame_array and grab the kernel index using the new array.

Currently frame_array and blurring_frame_array are lists of numpy arrays. This new scheme would allow us to make them one giant NumPy array. Numba might be happier if we did this, but it could produce a hit on memory?

This problem will also effect the pixelization matrix convolution and be solved in the same manner!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions