diff --git a/NEWS.md b/NEWS.md index baf486604..a93f4f838 100644 --- a/NEWS.md +++ b/NEWS.md @@ -11,6 +11,7 @@ If you read this from a place other than $dis` consists of only `NA`s will not concer In order to decide for a submodel size, we first inspect the `plot()` results: ```{r plot_vsel_lat} -( gg_lat <- plot(vs_lat, stats = "mlpd", deltas = TRUE) ) +( gg_lat <- plot(vs_lat, stats = "mlpd", deltas = "mixed") ) ``` Although the submodels' MLPDs seem to be very close to the reference model's MLPD from a submodel size of 6 on, a zoomed plot reveals that there is still some discrepancy at sizes 6 to 11 and that size 12 would be a better choice (further down below in the `summary()` output, we will also see that on absolute scale, the discrepancy at sizes 6 to 11 is not negligible): @@ -269,7 +269,7 @@ rm(warn_instable_orig) ``` ```{r post_vs_trad} print(time_trad) -( gg_trad <- plot(vs_trad, stats = "mlpd", deltas = TRUE) ) +( gg_trad <- plot(vs_trad, stats = "mlpd", deltas = "mixed") ) smmry_trad <- summary(vs_trad, stats = "mlpd", type = c("mean", "lower", "upper", "diff")) print(smmry_trad, digits = 2) @@ -365,7 +365,7 @@ The message concerning `latent_ilink` can be safely ignored here (the internal d Again, we first inspect the `plot()` results to decide for a submodel size: ```{r plot_vsel_nebin} -( gg_nebin <- plot(vs_nebin, stats = "mlpd", deltas = TRUE) ) +( gg_nebin <- plot(vs_nebin, stats = "mlpd", deltas = "mixed") ) ``` Again, a zoomed plot is more helpful: diff --git a/vignettes/projpred.Rmd b/vignettes/projpred.Rmd index 73dced63a..e46310ec4 100755 --- a/vignettes/projpred.Rmd +++ b/vignettes/projpred.Rmd @@ -228,10 +228,14 @@ foreach::registerDoSEQ() We can now select a final submodel size by looking at a predictive performance plot similar to the one created for the preliminary `cv_varsel()` run above. -By default, the performance statistics are plotted on their original scale, but with `deltas = TRUE`, they are plotted as differences^[For the geometric mean predictive density (GMPD, see argument `stats` of `summary.vsel()`), `deltas = TRUE` means to calculate the GMPD *ratio* (not difference) vs. the baseline model.] from a baseline model (which is the reference model by default, at least in the most common cases). -Since the differences^[For the GMPD, this is again a ratio.] and the (frequentist) uncertainty in their estimation are usually of more interest than the original-scale performance statistics (at least with regard to the decision for a final submodel size), we directly plot with `deltas = TRUE` here: +By default, the performance statistics are plotted on their actual scale and the uncertainty bars match this scale, but argument `deltas` offers two more options: + +* With `deltas = TRUE`, the performance statistics are plotted as differences^[For the geometric mean predictive density (GMPD, see argument `stats` of `summary.vsel()` and `plot.vsel()`), `deltas = TRUE` plots the GMPD *ratio* (not difference) vs. the baseline model. Despite this special case, we will call `deltas = TRUE` the "difference scale" for simplicity.] from the baseline model^[For the definition of the baseline model, see argument `baseline` of `summary.vsel()` and `plot.vsel()`; in the most common cases, the default baseline model is the reference model.] and the uncertainty bars match this scale, +* With `deltas = "mixed"`, the performance statistics (i.e., their point estimates) are plotted on the actual scale, but the uncertainty bars visualize the difference-scale uncertainty. + +Since the difference-scale uncertainty is usually of more interest than the actual-scale uncertainty (at least with regard to the decision for a final submodel size) and the actual-scale point estimates are often of more interest than the difference-scale point estimates, we plot with `deltas = "mixed"` here: ```{r plot_vsel} -plot(cvvs, stats = "mlpd", deltas = TRUE) +plot(cvvs, stats = "mlpd", deltas = "mixed") ``` ### Decision for final submodel size @@ -260,8 +264,10 @@ A tabular representation of the plot created by `plot.vsel()` can be achieved vi For the output of `summary.vsel()`, there is a sophisticated `print()` method (`print.vselsummary()`) which is also called by the shortcut method `print.vsel()`^[`print.vsel()` is the method that is called when simply printing an object resulting from `varsel()` or `cv_varsel()`.]. Specifically, to create the summary table matching the predictive performance plot above as closely as possible (and to also adjust the minimum number of printed significant digits), we may call `summary.vsel()` and `print.vselsummary()` as follows: ```{r smmry_vsel} -smmry <- summary(cvvs, stats = "mlpd", type = c("mean", "lower", "upper"), - deltas = TRUE) +smmry <- summary(cvvs, + stats = "mlpd", + type = c("mean", "diff.lower", "diff.upper"), + deltas = FALSE) print(smmry, digits = 1) ```