Describe the bug
The mpol.losses.log_likelihood calculates the likelihood as

When it should actually be closer to

(e.g., Deep Learning: Foundations and Concepts, Eqn 2.66).
There is at least one error in the missing
$$\sum_i \ln \sigma_i^2$$
(I think there may still be factors of 2 that remain slightly different from the complex-valued calculations in MPoL vs. the text, which assumes real only), and the source code is missing the negative sign.
This must have been a case of me being more tired than normal, since I think I've implemented this correctly in other codebases. This also implies the mpol.losses.log_likelihood_gridded routine is also incorrect, since it calls mpol.losses.log_likelihood.
Morover, other loss functions in mpol.losses are incorrectly named for the quantity they actually calculate.
-
mpol.losses.nll does not actually calculate a negative log likelihood, it calculates a 'reduced' $\chi^2$, since it does not include the penalty for the weight values
- same for
mpol.losses.nll_gridded
Suggested fix
- correct
mpol.losses.log_likelihood to calculate the correct quantity
- rename
mpol.losses.nll and mpol.losses.nll_gridded to mpol.losses.reduced_chi_squared and mpol.losses.reduced_chi_squared_gridded, respectively
- add a
mpol.losses.log_likelihood_avg routine that is the average of mpol.losses.log_likelihood. This is useful for cases where the weights may be adjusted (and thus the penalty factor is needed) and we are working with batches of different data sizes.
- recommend in documentation that
mpol.losses.reduced_chi_squared and mpol.losses.reduced_chi_squared_gridded are default loss functions for RML imaging and that corrected mpol.losses.log_likelihood is the proper loss function for inference (e.g., MCMC).
- document changes in changelog
Additional context
Recommend that we stay away from the nll name entirely, since it appears to be inconsistently defined in the broader ML context. Sometimes it is the negative log likelihood (i.e. negative of Eqn 2.66) but more often than not it some averaged or normalized factor that does not include the contribution from the weights. These factors matter when building RML workflows and can make it very tricky to intercompare results from different sized datasets.
Downstream updates
When fixed, @briannazawadzki @jeffjennings will need to update their calls to mpol.losses.nll_gridded -> mpol.losses.reduced_chi_squared_gridded.
Describe the bug
The


mpol.losses.log_likelihoodcalculates the likelihood asWhen it should actually be closer to
(e.g., Deep Learning: Foundations and Concepts, Eqn 2.66).
There is at least one error in the missing
$$\sum_i \ln \sigma_i^2$$
(I think there may still be factors of 2 that remain slightly different from the complex-valued calculations in MPoL vs. the text, which assumes real only), and the source code is missing the negative sign.
This must have been a case of me being more tired than normal, since I think I've implemented this correctly in other codebases. This also implies the
mpol.losses.log_likelihood_griddedroutine is also incorrect, since it callsmpol.losses.log_likelihood.Morover, other loss functions in
mpol.lossesare incorrectly named for the quantity they actually calculate.mpol.losses.nlldoes not actually calculate a negative log likelihood, it calculates a 'reduced'mpol.losses.nll_griddedSuggested fix
mpol.losses.log_likelihoodto calculate the correct quantitympol.losses.nllandmpol.losses.nll_griddedtompol.losses.reduced_chi_squaredandmpol.losses.reduced_chi_squared_gridded, respectivelympol.losses.log_likelihood_avgroutine that is the average ofmpol.losses.log_likelihood. This is useful for cases where the weights may be adjusted (and thus the penalty factor is needed) and we are working with batches of different data sizes.mpol.losses.reduced_chi_squaredandmpol.losses.reduced_chi_squared_griddedare default loss functions for RML imaging and that correctedmpol.losses.log_likelihoodis the proper loss function for inference (e.g., MCMC).Additional context
Recommend that we stay away from the
nllname entirely, since it appears to be inconsistently defined in the broader ML context. Sometimes it is the negative log likelihood (i.e. negative of Eqn 2.66) but more often than not it some averaged or normalized factor that does not include the contribution from the weights. These factors matter when building RML workflows and can make it very tricky to intercompare results from different sized datasets.Downstream updates
When fixed, @briannazawadzki @jeffjennings will need to update their calls to
mpol.losses.nll_gridded->mpol.losses.reduced_chi_squared_gridded.