Skip to content

fzhao70/gamlss-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gamlss-python

PyPI Python License: GPL-3.0

R's gamlss in pure Python

A Python port of the R packages gamlss (5.5-0) and gamlss.dist (6.1-1): Generalised Additive Models for Location, Scale and Shape (Rigby & Stasinopoulos, 2005).

The port reproduces the original R implementation numerically: the RS/CG fitting algorithms, link functions, distribution derivatives, deviance computations, standard errors and predictions are transcribed from the R sources and verified against reference results generated by the original R package (see tests/). Coefficients typically agree with R to ~1e-9 relative, deviances to 1e-9, iteration counts exactly.

Pure Python

The package is implemented entirely in Python — no C extensions, no Cython, no build step, and no dependency on R at runtime. The only requirements are numpy, scipy, pandas and patsy (whose own compiled internals do the numerical heavy lifting, as with any scientific Python package). Routines that the R version implements in C (e.g. the PIG/SICHEL-family recursions in gamlss.dist/src/*.c) were reimplemented in Python/numpy rather than wrapped.

Status

This is a young port — treat it as beta. The parametric fitting core is verified against R by the automated suite described below, but beyond that only some of the core functions have been manually tested; less-travelled code paths may still contain porting bugs. If a result matters, cross-check it against the original R package, and please report discrepancies via the issue tracker.

License and attribution

This package is a Python translation of the R packages gamlss and gamlss.dist by Mikis Stasinopoulos, Robert Rigby and co-authors (licensed GPL-2 | GPL-3), and bundles datasets from gamlss.data. As a derivative work it is distributed under the GNU GPL v3 (see LICENSE). It is not affiliated with or endorsed by the original authors.

If you use GAMLSS in published work, please cite:

Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape (with discussion). Applied Statistics, 54, 507-554.

Install

pip install gamlss        # from PyPI

pip install -e .          # from the repo root
pip install -e .[test]    # with test dependencies

Dependencies: numpy, scipy, pandas, patsy.

Quick start

import gamlss as gl
from gamlss import gamlss, fitted, coef, deviance, GAIC

ab = gl.load_data("abdom")

# R: gamlss(y ~ x, sigma.formula = ~x, family = BCT, data = abdom)
m = gamlss("y ~ x", sigma_formula="~x", family=gl.BCT(), data=ab)

m.summary()                  # R: summary(m)
coef(m, "mu")                # R: coef(m, "mu")
fitted(m, "sigma")           # R: fitted(m, "sigma")
deviance(m)                  # R: deviance(m)
GAIC(m, k=2)                 # R: GAIC(m, k=2)
m.predict(what="mu", newdata=ab.head(5), type="response")

API mapping (R -> Python)

R uses . inside argument and component names; Python uses _.

R Python
gamlss(y~x, sigma.formula=~z, family=GA, data=d) gamlss("y ~ x", sigma_formula="~z", family=GA(), data=d)
gamlss.control(n.cyc=50) gamlss_control(n_cyc=50) or gamlss(..., n_cyc=50)
glim.control(cc=1e-4) glim_control(cc=1e-4) or gamlss(..., cc=1e-4)
method=RS() / CG() / mixed(2, 20) method=RS() / CG() / mixed(2, 20)
mu.start=, sigma.fix=TRUE mu_start=, sigma_fix=True
weights=w weights=w (array or column name)
m$mu.fv, m$mu.lp, m$mu.coefficients m.mu_fv, m.mu_lp, m.mu_coefficients
m$G.deviance, m$aic, m$sbc, m$df.fit m.G_deviance, m.aic, m.sbc, m.df_fit
fitted(m, "mu") fitted(m, "mu") or m.fitted("mu")
coef(m, what="sigma") coef(m, what="sigma")
coefAll(m) coefAll(m)
deviance(m, "G") deviance(m, "G")
logLik(m) logLik(m)
AIC(m1, m2, k=2) / GAIC(m, k=log(n)) / IC AIC(m1, m2, k=2) / GAIC(m, k=...) / IC
summary(m, type="vcov"/"qr") m.summary(type="vcov"/"qr") or summary(m, ...)
vcov(m, type="se") vcov(m, type="se") or m.vcov("se")
predict(m, what="mu", newdata=nd, type="response") m.predict(what="mu", newdata=nd, type="response")
predictAll(m, newdata=nd) m.predictAll(newdata=nd)
lpred(m, what="mu", type="terms", se.fit=TRUE) lpred(m, what="mu", type="terms", se_fit=True)
residuals(m) (normalised quantile residuals) residuals(m) / m.get_residuals()
residuals(m, "mu", type="weighted") residuals(m, "mu", type="weighted")
Rsq(m) Rsq(m)
dNO/pNO/qNO/rNO(..., lower.tail=, log.p=) dNO/pNO/qNO/rNO(..., lower_tail=, log_p=)

Formulas

Formulas are strings interpreted by patsy with R-compatible behaviour:

  • "y ~ x + C(g)" — factors via C() (R: y ~ x + g with a factor g); design-matrix term order follows R's model.matrix.
  • "y ~ poly(x, 3)" — exact port of R's orthogonal poly() (including prediction via the stored recurrence coefficients).
  • "cbind(y, n - y) ~ x" — binomial responses with denominators.
  • "y ~ x + offset(log(t))" — in-formula offsets.
  • ^ is translated to R semantics ((a+b)^2 crossing, power inside I()).

Families implemented

Continuous: NO, NO2, EXP, GA, IG, IGAMMA, GU, RG, LO, LOGNO, WEI, WEI3, TF, PE, GG, SN1, JSU, JSUo, SHASHo, BCCG, BCCGo, BCT, BCTo, BCPE, BCPEo, BE, BEo Discrete: PO, NBI, NBII, GEOM, PIG, LG, ZIP, ZIP2, ZINBI, ZANBI Binomial-type: BI, BB, ZABI, ZIBI

Each family ships its d/p/q/r functions (dGA, pGA, ...) matching the R parameterisations exactly.

Verification against R

r-scripts/gen_reference.R fits 43 reference models with the original R gamlss (plus dense d/p/q grids for every family) and exports the results to JSON. The pytest suite re-fits every model in Python and compares:

  • coefficients (rtol 1e-6, typically agree to 1e-9),
  • global deviance / AIC / SBC (rtol 1e-9),
  • effective df, residual df, noObs and the exact RS/CG iteration count,
  • fitted values for every distribution parameter,
  • quantile residuals (continuous families; discrete ones are randomised by construction),
  • qr-type standard errors (chol2inv of the working-WLS R matrix),
  • vcov-type standard errors (numerical Hessian, optimHess with R's ndeps=1e-3, falling back to HessianPB exactly as R does),
  • predictions for new data (including R's "safe prediction" refit semantics, factor levels and poly() terms).

Run them:

# regenerate the reference (needs R with gamlss + jsonlite):
Rscript r-scripts/gen_reference.R
# verify the Python port:
python -m pytest tests/ -q

Known, documented differences

  • R quirks are reproduced where they affect results; e.g. the ifelse(x<0, 0, fy) shape-truncation in dBI that makes ZIBI/ZABI use the first observation's bd/mu in the zero-mass correction is replicated (gamlss/dist/BI.py:_dBI0).
  • qIG/pSN1-style quantiles use root finding: R's uniroot (tol ~1.2e-4) vs scipy's brentq (1e-12) — Python values are more precise; agreement is at R's tolerance.
  • vcov() on near-singular Hessians (e.g. BCTo on abdom where tau -> inf): R produces NaN standard errors and its summary() falls back to type="qr"; knife-edge sign differences in the fallback path can occur. The deviance/coefficients still agree.
  • Randomised quantile residuals for discrete families use NumPy's RNG; they match R distributionally, not bitwise (pass rng= for reproducibility).
  • Smoothing/additive terms (pb(), cs(), ...) are not ported yet — the parametric GAMLSS core (formulas, all four parameters, RS/CG, weights, offsets, factors) is complete.

Layout

gamlss/            the package
  engine.py        gamlss(), RS/CG/mixed, glim.fit, lm.wfit, controls
  family.py        gamlss.family infrastructure
  links.py         make.link.gamlss link functions
  formula.py       R formulas on patsy (poly, cbind, offset, naming)
  model.py         the fitted-model object + all S3-method equivalents
  methods.py       functional API (fitted, coef, GAIC, ...)
  rqres.py         normalised (randomised) quantile residuals
  dist/            one module per gamlss.dist family
  data/            datasets from gamlss.data as CSV
r-scripts/         R reference generation + fetch_r_sources.sh
tests/             verification suite (R reference JSONs included)

The original R sources used for the transcription are not committed; run r-scripts/fetch_r_sources.sh to download the exact CRAN versions into r-src/.

About

Python version of R package gamlss

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors