Last updated: 2026-04-14
This is the single source of truth for the development roadmap of the TemporalHazard R package — a pure-R implementation of the multiphase parametric hazard model of Blackstone, Naftel, and Turner (1986).
For detailed feature-level parity analysis against the original C/SAS HAZARD
system, see SAS-PARITY-GAP-ANALYSIS.md in this directory.
Ported the core modeling engine from compiled C code and SAS macro wrappers to a pure-R package with modern idioms (S3 classes, formula interface, roxygen2, testthat).
| Milestone | Scope | Status |
|---|---|---|
| M0 | Package scaffold, CI, pkgdown, math primitives | Complete |
| M1 | Public constructor/predict/print API, argument mapping, migration vignette | Complete |
| M2 | Weibull fitting, analytical gradient, optimizer, golden fixtures, parity harness | Complete |
| M3 | Exponential, log-logistic, log-normal; mixed censoring; time-varying covariates; parity helpers | Complete |
| M4 | summary.hazard(), formula interface (Surv() ~ predictors), vignettes, release docs, CI |
Complete |
- 5 distributions: Weibull, exponential, log-logistic, log-normal, multiphase
- Right, left, and interval censoring with correct likelihood contributions
- Piecewise time-varying coefficients via
time_windows - Formula interface:
hazard(Surv(time, status) ~ age + nyha, data = df, ...) - S3 methods:
coef(),vcov(),summary(),predict(),print() hzr_argument_mapping()for SAS/C parameter name translation- Golden fixture regression tests and C binary parity infrastructure
- 7 Quarto vignettes covering getting started through mathematical foundations
Detail: MIGRATION-STATUS.md
Implemented the N-phase additive hazard decomposition that is the distinguishing feature of the Blackstone/Naftel/Turner framework.
Additive cumulative hazard across J phases:
H(t|x) = Sigma_j mu_j(x) * Phi_j(t; t_half_j, nu_j, m_j)
where mu_j(x) = exp(alpha_j + x * beta_j) and Phi_j is the temporal shape function for phase j.
- Generalized temporal decomposition family (
hzr_decompos()) — all 6 mathematical cases based on signs of nu and m - Phase types: G1/CDF (early), G2/constant, G3 (late, unbounded)
hzr_phase()constructor withfixed = "shapes"matching SASFIXparm- Phase-specific covariate formulas
- Multi-start optimizer with Nelder-Mead warm-up for fixed-shape problems
predict(..., decompose = TRUE)for per-phase cumulative hazard- Phase-grouped coefficient table in
summary.hazard() - 540+ tests, C binary parity validation on CABGKUL dataset (log-likelihood -3740.52 matching C reference with fixed shapes)
| File | Description |
|---|---|
R/decomposition.R |
hzr_decompos(), hzr_decompos_g3(), phase cumhaz/hazard |
R/phase-spec.R |
hzr_phase() constructor and helpers |
R/likelihood-multiphase.R |
Likelihood, gradient, multi-start optimizer |
R/hazard_api.R |
Multiphase dispatch, predict decompose, summary |
Detail: MULTIPHASE_PLAN.md
Preparing the package for initial CRAN submission.
R CMD check --as-cran: 0 ERROR, 0 WARNING, 1 NOTE (new submission)- lintr clean with CI enforcement (
.github/workflows/lint.yaml) - NEWS.md, inst/CITATION, cran-comments.md
- Fixed
fitdefault docs/behavior mismatch - Removed non-deterministic
set.seed(NULL)in multi-start optimizer - Normalized vignette metadata (YAML
vignette:key across all 7 files) - Fixed CITATION and DESCRIPTION URL redirects (trailing slash)
- Push local commits and verify all CI workflows pass green
devtools::spell_check()— fix any typos- Regenerate README figures:
source("inst/dev/generate-readme-figures.R") - Submit via
devtools::submit_cran()or web form - Monitor and respond to CRAN feedback
Close the remaining feature gaps between the C/SAS HAZARD system and the
R package. See SAS-PARITY-GAP-ANALYSIS.md for detailed implementation
notes on each item.
Turner's theorem integrated into .hzr_optim_multiphase(). One phase's
log_mu solved analytically at each iteration via .hzr_conserve_events().
Controlled via control$conserve (default TRUE). Currently supports
right-censored + exact-event data; interval/left-censored extension is a
follow-up item. Key functions: .hzr_conserve_events(),
.hzr_select_fixmu_phase(), .hzr_log_mu_positions().
Priority: High Effort: Large Impact: Critical for exploratory clinical research workflows
The C/SAS system has forward, backward, and two-way stepwise selection with configurable SLENTRY/SLSTAY thresholds, MOVE limits, forced entry ordering, and FAST approximate Wald screening. Selection operates within the hazard model, accounting for multiphase structure.
Tasks:
- Wald test infrastructure for individual covariate effects within multiphase framework
- Forward step: fit with each candidate added, select smallest p-value below SLENTRY
- Backward step: test each included variable, remove largest p-value above SLSTAY
- Stepping loop with MOVE limits and convergence detection
- Phase-specific variable selection (variables enter different phases independently)
- FAST screening mode using approximate Wald updates (optimization)
- User-facing API:
hzr_stepwise()orstep.hazard()method - Selection trace printing/reporting
weights argument on hazard() applies Fisher weighting to the
log-likelihood across all five distributions (Weibull, exponential,
log-logistic, log-normal, multiphase). Threaded through every
distribution's likelihood and analytic gradient, the multiphase
Conservation-of-Events adjustment (.hzr_conserve_events() weights
per-phase cumhaz sums so Turner's adjustment is on the same scale as
the weighted event count), and the numerical-gradient fallbacks for
mixed censoring. Integer weights reproduce the row-duplicated fit to
optimizer tolerance for every distribution.
Surv(start, stop, event) is parsed (start -> time_lower, stop -> time,
event -> status) and scored as H(stop) - H(start) per epoch in the
Weibull and multiphase likelihoods. The trivial Surv(0, t, d) case
degenerates to H(stop) and recovers the plain-Surv fit exactly. Full
wire-up is tracked under 4f below.
Every LL term in R/likelihood-exponential.R,
R/likelihood-loglogistic.R, and R/likelihood-lognormal.R now
multiplies by weights[idx_*]; analytic gradients use weighted
building blocks (w * status, w * cumhaz, w * (1+delta) * pw
for log-logistic, w * (1-delta) * mills for log-normal). Numerical
gradient fallbacks forward weights through. The
dist %in% c("weibull", "multiphase") guard in hazard() has been
removed.
.hzr_conserve_events() and .hzr_select_fixmu_phase() take an
optional weights argument applied to per-phase cumhaz sums; the
all(weights == 1) guard on CoE in R/likelihood-multiphase.R is
gone. Duplication-parity tests for each distribution are in
test-weights.R; weighted-CoE parity (CoE on and off) lives in
test-conservation-of-events.R.
.hzr_logl_weibull() and .hzr_logl_multiphase() apply
H(stop) - H(start) to event and right-censored terms via a
cumhaz_start term that is zero when no time_lower is supplied
(plain-Surv path). Analytic gradients for both distributions add the
matching -d H(start)/d theta contributions (Weibull closed-form;
multiphase per-phase Phi_j(start) and shape derivatives, reusing
the existing G3 finite-difference machinery). The hazard() guard on
start > 0 is removed; the three split-invariance tests in
test-repeating-events.R are live.
Remaining follow-ups (low priority): vignette section on epoch-decomposed longitudinal data, and a SAS-parity fixture for a counting-process reference fit.
predict.hazard() takes se.fit = FALSE, level = 0.95. When
se.fit = TRUE, the return value is a data frame with columns
fit, se.fit, lower, upper. Per-row SEs come from the delta
method: se_i = sqrt( (J V J^T)_{ii} ) where J[i, ] = d eta_i / d theta
and V = vcov(theta_hat).
- Jacobians. Closed-form for Weibull
(
.hzr_predict_jacobian_weibull()) and multiphase (.hzr_predict_jacobian_multiphase(), reusing.hzr_phase_derivatives()/.hzr_g3_phase_derivatives());numDeriv::jacobian()fallback for exponential / log-logistic / log-normal via a per-call cumhaz closure. - Scale.
hazardandcumulative_hazarduse log-scale CLs;survivaluseslog(-log(S)) = log HCLs so 0 <= lower <= upper <= 1;linear_predictoris symmetric on the natural scale. Matches SAShzp_calc_haz_CL.c/hzp_calc_srv_CL.c. - Fixed parameters.
fixed = "shapes"and CoE leave NA rows/cols in the expanded vcov. The sandwich restricts to the free-parameter submatrix and zero-indexes the corresponding Jacobian columns, so CLs reflect only free-parameter uncertainty. - Backward compat.
se.fit = FALSEpreserves the pre-0.9.8 scalar-vector return shape.decompose = TRUEwithse.fit = TRUEis rejected with a clean error.
With 4a, 4b, 4c, 4d, 4e, 4f, and 4g complete, the remaining SAS-parity
gaps are: density / quantile prediction types, OUTEST / OUTVCOV
dataset export, anova.hazard() S3 method, NOCOV / NOCOR summary
flags, and the stepwise enhancements (FAST screening, multi-step MOVE
trace) tracked separately.
The SAS HAZARD system ships utility macros for non-parametric estimation,
goodness-of-fit, variable calibration, bootstrap inference, and competing
risks. These cover the full analytical workflow around the core fitting
engine. R equivalents should be exported functions in the hzr_ namespace.
| SAS Macro | Purpose | R Equivalent | Priority |
|---|---|---|---|
kaplan.sas |
KM survival with exact logit CL | hzr_kaplan() |
Done |
hazplot.sas |
Observed vs predicted overlay + CoE GOF | hzr_gof() |
Done |
deciles.hazard.sas |
Calibration by predicted survival deciles | hzr_deciles() |
Done |
chisqgf.sas |
Chi-square GOF test (obs vs expected) | (part of hzr_deciles()) |
Done |
nelsonl.sas |
Nelson estimator for weighted/repeated events | hzr_nelson() |
Done |
nelsont.sas |
Nelson estimator for terminating events | hzr_nelson() (unified) |
Done |
bootstrap.hazard.sas |
Bootstrap bagging with variable selection | hzr_bootstrap() |
Done |
bootstrap.summary.sas |
Summarize bootstrap coefficient distributions | (part of hzr_bootstrap()) |
Done |
logit.sas |
Calibrate continuous variables (decile logit) | hzr_calibrate() |
Done |
logitgr.sas |
Calibrate by natural grouping variable | hzr_calibrate() (with by=) |
Done |
markov.sas |
Competing risks cumulative incidence (Greenwood) | hzr_competing_risks() |
Done |
plot.sas |
Publication-quality SAS plots | Not needed (ggplot2) | — |
SAS reference: kaplan.sas
R's survival::survfit() provides KM estimates, but the SAS macro adds
logit-transformed exact confidence limits (more accurate than Greenwood
in the tails), hazard rate estimation at each interval, life integral
computation, and stratified log-rank testing. A thin wrapper around
survfit() that adds these extras and returns a tidy data frame would
match the SAS output structure used by hazplot.sas and deciles.hazard.sas.
Effort: Small — mostly formatting survfit() output with added CL
transform.
SAS reference: hazplot.sas
The central validation tool in the HAZARD workflow. Given a fitted hazard
object, compute at each observed event time:
- Parametric predicted survival, cumulative hazard, and hazard rate
- Nonparametric (KM or Nelson) estimates at the same times
- Cumulative observed vs expected event counts (conservation of events)
- Chi-square GOF p-value for the obs/exp comparison
- Per-phase hazard component percentages
Returns a data frame suitable for plotting parametric vs nonparametric
overlays with ggplot2. Optionally supports stratified GOF by a grouping
variable.
Effort: Medium — requires combining predict.hazard() output with
hzr_kaplan() output and implementing the CoE event-counting algorithm.
SAS reference: deciles.hazard.sas
Given a fitted model and a time point, rank patients into deciles by predicted survival, then compare actual vs expected event counts per decile with chi-square GOF. The inference vignette already demonstrates this workflow manually — wrapping it as a function standardizes the output and enables programmatic calibration checks.
Effort: Small — the logic is already demonstrated in the vignette; needs packaging as an exported function.
SAS reference: nelsonl.sas, nelsont.sas
Wayne Nelson's estimator for cumulative hazard with lognormal confidence
limits. Handles both terminating (single) events and weighted/repeated
events. Not available in base survival — survfit() uses the
Breslow estimator which differs in variance estimation. Important for
validating fits on weighted or repeating-event data.
Effort: Medium — the Nelson algorithm is straightforward but the lognormal CI transform and weighted-event handling need careful implementation.
SAS reference: bootstrap.hazard.sas, bootstrap.summary.sas
Resample data with replacement, refit the hazard model on each replicate, and accumulate coefficient distributions. Supports fractional sampling (bagging), time-varying covariate data (resample by patient ID), and optional stepwise selection per replicate. Returns a tidy data frame of per-replicate coefficients with summary statistics (frequency of selection, mean, SD, CI).
Effort: Medium — the resampling loop is simple, but handling TVC data, failed fits, and optional stepwise selection adds complexity. The inference vignette already shows manual bootstrap; this wraps it as a function.
SAS reference: logit.sas, logitgr.sas
Group a continuous covariate into quantile bins, compute event probability per bin, and apply logit (or Gompertz/Cox) transform. Used for assessing functional form before model entry — does the covariate-outcome relationship look linear on the logit scale, or does it need a transform? Supports stratification by a grouping variable.
Effort: Small — simple rank + aggregate + transform pipeline.
SAS reference: markov.sas
Compute cumulative incidence of competing events using Greenwood's variance formula. Unlike 1-KM (which overestimates incidence when competing risks exist), this provides correct marginal incidence for each event type with standard errors. The valves dataset (death, PVE, reoperation) is a natural test case.
Effort: Medium — the Greenwood variance matrix computation is
moderately complex. Could also wrap cmprsk::cuminc() if a dependency
is acceptable.
The SAS HAZARD documentation walks users through a disciplined analytical sequence. The R vignettes cover the same pieces but have gaps relative to what a SAS HAZARD veteran would expect.
Priority: High
A new vignette that mirrors the SAS ac.* → hz.* → hm.* → hp.* →
hs.* workflow on one dataset (AVC):
- Nonparametric baseline (KM life table)
- Shape fitting: start simple (Weibull), build to multiphase, show how to fix overparameterized shapes
- Variable selection: manual screening with
hzr_calibrate()and univariable logistic, then multivariable model building - Prediction: patient-specific risk profiles, covariate sensitivity
- Validation:
hzr_gof()overlay,hzr_deciles()calibration, conservation-of-events check
This is the single most impactful documentation addition — it teaches the analytical discipline, not just the API.
Priority: Medium
A section (in getting-started or a standalone vignette) making the case for multiphase parametric modeling vs Cox regression:
| Feature | Cox (semi-parametric) | TemporalHazard (parametric) |
|---|---|---|
| Proportional hazards | Required | Not required |
| Baseline hazard | Unspecified | Fully parametric |
| Multiple hazard phases | Not supported | Supported |
| Patient-specific prediction | Approximate | Exact |
| Event conservation check | Not available | Planned |
| Interval censoring | Not standard | Supported |
The SAS introduction makes this comparison prominently. R users coming
from coxph() need to understand why they'd switch.
Priority: Medium
A section in the fitting vignette or math foundations covering:
- How to choose starting values from the KM cumulative hazard plot
- When to fix shape parameters (
fixed = "shapes") vs estimate freely - Multi-start strategy and
control$n_starts - Signs of overparameterization (boundary estimates, huge SEs, NaN)
- Optimization method differences (the SAS docs walk through Newton vs Quasi-Newton vs Steepest Descent)
Priority: Low
Worked examples demonstrating interval-censored and left-truncated data
with Surv(time1, time2, status) syntax. The math foundations vignette
covers the theory but there are no hands-on examples.
Items that would improve the package beyond SAS parity.
- Rcpp acceleration for likelihood hot paths (decomposition, cumulative hazard evaluation) — currently pure R
- Analytical gradients for multiphase — currently uses numerical
gradients via
numDeriv; analytical score vectors would improve optimizer precision and speed confint.hazard()— profile likelihood or Wald-based confidence intervals as an S3 method- Residuals — deviance, martingale, or Schoenfeld residuals for model diagnostics
- Frailty / random effects — shared frailty models for clustered data (distinct from the epoch-decomposition approach to repeating events)
- Parallel bootstrap — leverage
future/furrrfor bootstrap CI computation
| File | Purpose |
|---|---|
inst/dev/MIGRATION-STATUS.md |
Phase 1 milestone detail and reconciliation |
inst/dev/MULTIPHASE_PLAN.md |
Phase 2 architecture, math, API design, build order |
inst/dev/SAS-PARITY-GAP-ANALYSIS.md |
Phase 4 feature-by-feature gap analysis with C/SAS code references |
HANDOFF.md |
Session-to-session context transfer (top-level, .Rbuildignored) |
NEWS.md |
User-facing changelog |
cran-comments.md |
CRAN submission notes |
| Repo | Path | Purpose |
|---|---|---|
| hazard | ~/Documents/GitHub/hazard/ |
Original C/SAS source code |
| mixhazard | ~/Documents/GitHub/mixhazard/ |
Generalized decompos() R source |
| hvtiPlotR | ~/Documents/GitHub/hvtiPlotR/ |
Survival plotting companion package |