R's gamlss in pure Python
A Python port of the R packages gamlss (5.5-0) and gamlss.dist (6.1-1): Generalised Additive Models for Location, Scale and Shape (Rigby & Stasinopoulos, 2005).
- Pure Python
- Status
- License and attribution
- Install
- Quick start
- API mapping (R -> Python)
- Families implemented
- Verification against R
- Known, documented differences
- Layout
The port reproduces the original R implementation numerically: the
RS/CG fitting algorithms, link functions, distribution derivatives,
deviance computations, standard errors and predictions are transcribed
from the R sources and verified against reference results generated by
the original R package (see tests/). Coefficients typically agree
with R to ~1e-9 relative, deviances to 1e-9, iteration counts exactly.
The package is implemented entirely in Python — no C extensions, no
Cython, no build step, and no dependency on R at runtime. The only
requirements are numpy, scipy, pandas and patsy (whose own compiled
internals do the numerical heavy lifting, as with any scientific
Python package). Routines that the R version implements in C (e.g.
the PIG/SICHEL-family recursions in gamlss.dist/src/*.c) were
reimplemented in Python/numpy rather than wrapped.
This is a young port — treat it as beta. The parametric fitting core is verified against R by the automated suite described below, but beyond that only some of the core functions have been manually tested; less-travelled code paths may still contain porting bugs. If a result matters, cross-check it against the original R package, and please report discrepancies via the issue tracker.
This package is a Python translation of the R packages
gamlss and
gamlss.dist by Mikis
Stasinopoulos, Robert Rigby and co-authors (licensed GPL-2 | GPL-3),
and bundles datasets from
gamlss.data. As a
derivative work it is distributed under the GNU GPL v3 (see
LICENSE). It is not affiliated with or endorsed by the original
authors.
If you use GAMLSS in published work, please cite:
Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape (with discussion). Applied Statistics, 54, 507-554.
pip install gamlss # from PyPI
pip install -e . # from the repo root
pip install -e .[test] # with test dependenciesDependencies: numpy, scipy, pandas, patsy.
import gamlss as gl
from gamlss import gamlss, fitted, coef, deviance, GAIC
ab = gl.load_data("abdom")
# R: gamlss(y ~ x, sigma.formula = ~x, family = BCT, data = abdom)
m = gamlss("y ~ x", sigma_formula="~x", family=gl.BCT(), data=ab)
m.summary() # R: summary(m)
coef(m, "mu") # R: coef(m, "mu")
fitted(m, "sigma") # R: fitted(m, "sigma")
deviance(m) # R: deviance(m)
GAIC(m, k=2) # R: GAIC(m, k=2)
m.predict(what="mu", newdata=ab.head(5), type="response")R uses . inside argument and component names; Python uses _.
| R | Python |
|---|---|
gamlss(y~x, sigma.formula=~z, family=GA, data=d) |
gamlss("y ~ x", sigma_formula="~z", family=GA(), data=d) |
gamlss.control(n.cyc=50) |
gamlss_control(n_cyc=50) or gamlss(..., n_cyc=50) |
glim.control(cc=1e-4) |
glim_control(cc=1e-4) or gamlss(..., cc=1e-4) |
method=RS() / CG() / mixed(2, 20) |
method=RS() / CG() / mixed(2, 20) |
mu.start=, sigma.fix=TRUE |
mu_start=, sigma_fix=True |
weights=w |
weights=w (array or column name) |
m$mu.fv, m$mu.lp, m$mu.coefficients |
m.mu_fv, m.mu_lp, m.mu_coefficients |
m$G.deviance, m$aic, m$sbc, m$df.fit |
m.G_deviance, m.aic, m.sbc, m.df_fit |
fitted(m, "mu") |
fitted(m, "mu") or m.fitted("mu") |
coef(m, what="sigma") |
coef(m, what="sigma") |
coefAll(m) |
coefAll(m) |
deviance(m, "G") |
deviance(m, "G") |
logLik(m) |
logLik(m) |
AIC(m1, m2, k=2) / GAIC(m, k=log(n)) / IC |
AIC(m1, m2, k=2) / GAIC(m, k=...) / IC |
summary(m, type="vcov"/"qr") |
m.summary(type="vcov"/"qr") or summary(m, ...) |
vcov(m, type="se") |
vcov(m, type="se") or m.vcov("se") |
predict(m, what="mu", newdata=nd, type="response") |
m.predict(what="mu", newdata=nd, type="response") |
predictAll(m, newdata=nd) |
m.predictAll(newdata=nd) |
lpred(m, what="mu", type="terms", se.fit=TRUE) |
lpred(m, what="mu", type="terms", se_fit=True) |
residuals(m) (normalised quantile residuals) |
residuals(m) / m.get_residuals() |
residuals(m, "mu", type="weighted") |
residuals(m, "mu", type="weighted") |
Rsq(m) |
Rsq(m) |
dNO/pNO/qNO/rNO(..., lower.tail=, log.p=) |
dNO/pNO/qNO/rNO(..., lower_tail=, log_p=) |
Formulas are strings interpreted by patsy with R-compatible behaviour:
"y ~ x + C(g)"— factors viaC()(R:y ~ x + gwith a factorg); design-matrix term order follows R'smodel.matrix."y ~ poly(x, 3)"— exact port of R's orthogonalpoly()(including prediction via the stored recurrence coefficients)."cbind(y, n - y) ~ x"— binomial responses with denominators."y ~ x + offset(log(t))"— in-formula offsets.^is translated to R semantics ((a+b)^2crossing, power insideI()).
Continuous: NO, NO2, EXP, GA, IG, IGAMMA, GU, RG, LO, LOGNO, WEI, WEI3, TF, PE, GG, SN1, JSU, JSUo, SHASHo, BCCG, BCCGo, BCT, BCTo, BCPE, BCPEo, BE, BEo
Discrete: PO, NBI, NBII, GEOM, PIG, LG, ZIP, ZIP2, ZINBI, ZANBI
Binomial-type: BI, BB, ZABI, ZIBI
Each family ships its d/p/q/r functions (dGA, pGA, ...)
matching the R parameterisations exactly.
r-scripts/gen_reference.R fits 43 reference models with the original
R gamlss (plus dense d/p/q grids for every family) and exports the
results to JSON. The pytest suite re-fits every model in Python and
compares:
- coefficients (rtol 1e-6, typically agree to 1e-9),
- global deviance / AIC / SBC (rtol 1e-9),
- effective df, residual df,
noObsand the exact RS/CG iteration count, - fitted values for every distribution parameter,
- quantile residuals (continuous families; discrete ones are randomised by construction),
- qr-type standard errors (
chol2invof the working-WLS R matrix), - vcov-type standard errors (numerical Hessian,
optimHesswith R'sndeps=1e-3, falling back toHessianPBexactly as R does), - predictions for new data (including R's "safe prediction" refit
semantics, factor levels and
poly()terms).
Run them:
# regenerate the reference (needs R with gamlss + jsonlite):
Rscript r-scripts/gen_reference.R
# verify the Python port:
python -m pytest tests/ -q- R quirks are reproduced where they affect results; e.g. the
ifelse(x<0, 0, fy)shape-truncation indBIthat makes ZIBI/ZABI use the first observation'sbd/muin the zero-mass correction is replicated (gamlss/dist/BI.py:_dBI0). qIG/pSN1-style quantiles use root finding: R'suniroot(tol ~1.2e-4) vs scipy'sbrentq(1e-12) — Python values are more precise; agreement is at R's tolerance.vcov()on near-singular Hessians (e.g. BCTo on abdom where tau -> inf): R produces NaN standard errors and itssummary()falls back totype="qr"; knife-edge sign differences in the fallback path can occur. The deviance/coefficients still agree.- Randomised quantile residuals for discrete families use NumPy's RNG;
they match R distributionally, not bitwise (pass
rng=for reproducibility). - Smoothing/additive terms (
pb(),cs(), ...) are not ported yet — the parametric GAMLSS core (formulas, all four parameters, RS/CG, weights, offsets, factors) is complete.
gamlss/ the package
engine.py gamlss(), RS/CG/mixed, glim.fit, lm.wfit, controls
family.py gamlss.family infrastructure
links.py make.link.gamlss link functions
formula.py R formulas on patsy (poly, cbind, offset, naming)
model.py the fitted-model object + all S3-method equivalents
methods.py functional API (fitted, coef, GAIC, ...)
rqres.py normalised (randomised) quantile residuals
dist/ one module per gamlss.dist family
data/ datasets from gamlss.data as CSV
r-scripts/ R reference generation + fetch_r_sources.sh
tests/ verification suite (R reference JSONs included)
The original R sources used for the transcription are not committed;
run r-scripts/fetch_r_sources.sh to download the exact CRAN versions
into r-src/.