Skip to content

feat(Prophet-coeffs): DS-2469 store prophet coeffs#11

Merged
Eliel-Avilez merged 5 commits intomainfrom
feat/store-prophet-coeffs
Feb 6, 2026
Merged

feat(Prophet-coeffs): DS-2469 store prophet coeffs#11
Eliel-Avilez merged 5 commits intomainfrom
feat/store-prophet-coeffs

Conversation

@Eliel-Avilez
Copy link
Copy Markdown
Collaborator

@Eliel-Avilez Eliel-Avilez commented Jan 22, 2026

SUMMARY

When a Robyn run is executed, and factor_vars are specified, a new CSV file (prophet_regressor_coefficients.csv) will be created within the Robyn init folder. The motivation for saving the prophet regressors, comes with the need of handling categorical variables within Scenario Modeling predict flow, for which prophet regressors of categorical variables are needed to transform the values.
The regressors will be named as <variable>_<value>, where <value> is each value of the categorical variable present in the dataset.

Example of prophet_regressor_coefficients.csv with factor_vars = c('Marathons', 'RandomVariable'):

regressor coefficient
1 Marathons_0 -0.014
2 Marathons_1 0.006
1 RandomVariable_X -0.003
2 RandomVariable_Y 0.006

It's worth noting, that in the original dataset that generated the above example, we have the values 0, 1, and 2 for the variable Marathons, and values X, Y, and Z for the variable RandomVariable. Prophet takes one of the values as reference and omits it at the moment of the coefficient generation.

  1. Added code to extract the Prophet's coefficients on R/R/inputs.R (outlined by comments: #! EA START, !# EA END).
  2. Save Prophet's coefficients on a new CSV on R/R/outputs.R.

NOTE: This changes' output were envisioned for the changes mentioned on this other PR

STORY NUMBER and LINK

DS-2469

@GG-TechOpsSuper
Copy link
Copy Markdown

GG-TechOpsSuper commented Jan 22, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@Eliel-Avilez Eliel-Avilez changed the title feat(Prophet-coeffs): DS-2107 store prophet coeffs feat(Prophet-coeffs): DS-2469 store prophet coeffs Jan 29, 2026
@Eliel-Avilez Eliel-Avilez marked this pull request as ready for review January 29, 2026 23:30
@Eliel-Avilez Eliel-Avilez self-assigned this Jan 29, 2026
@Eliel-Avilez Eliel-Avilez added the feature New feature or request label Jan 29, 2026
Copy link
Copy Markdown
Collaborator

@sallyhong sallyhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh!!!

I see, it's a double PR review. :) I'll check that one too!

Comment thread R/R/outputs.R Outdated
Comment on lines +332 to +338
if (!is.null(InputCollect$prophet_custom_output) &&
!is.null(InputCollect$prophet_custom_output$prophet_coefficients)) {
prophet_coefs <- InputCollect$prophet_custom_output$prophet_coefficients
if (!is.null(prophet_coefs) && nrow(prophet_coefs) > 0) {
write.csv(prophet_coefs, paste0(plot_folder, "prophet_regressor_coefficients.csv"), row.names = TRUE)
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can be refactored to this

prophet_coefs <- InputCollect$prophet_custom_output$prophet_coefficients

if (!is.null(prophet_coefs) && nrow(prophet_coefs) > 0) {
  write.csv(
    prophet_coefs,
    file = paste0(plot_folder, "prophet_regressor_coefficients.csv"),
    row.names = TRUE
  )
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changes applied.

Comment thread R/R/inputs.R Outdated
Comment on lines +778 to +823
#! EA START
# Extract prophet regressor coefficients
prophet_coefficients <- NULL
if (!is.null(prophet_model) && !is.null(prophet_model$params) && !is.null(prophet_model$params$beta)) {
# Get regressor names from extra_regressors
regressor_names <- names(prophet_model$extra_regressors)

if (length(regressor_names) > 0 && ncol(prophet_model$params$beta) > 0) {
# Extract beta coefficients (mean across samples)
# beta is a matrix: rows are samples, columns are regressors
beta_matrix <- prophet_model$params$beta

# Get the column indices for regressors (skip trend, seasonality components)
# Regressors start after the base components
n_base_components <- ncol(beta_matrix) - length(regressor_names)
regressor_indices <- (n_base_components + 1):ncol(beta_matrix)

if (length(regressor_indices) == length(regressor_names)) {
# Calculate mean coefficient for each regressor across samples
regressor_coefs <- colMeans(beta_matrix[, regressor_indices, drop = FALSE])

# Create data frame with regressor names and coefficients
prophet_coefficients <- data.frame(
regressor = regressor_names,
coefficient = as.numeric(regressor_coefs),
stringsAsFactors = FALSE
)
} else {
# Fallback: try to match by position or extract all extra regressor columns
# Prophet stores regressors in extra_regressors, and their coefficients in beta
# The order should match
if (ncol(beta_matrix) >= length(regressor_names)) {
# Take the last columns matching the number of regressors
start_idx <- ncol(beta_matrix) - length(regressor_names) + 1
regressor_coefs <- colMeans(beta_matrix[, start_idx:ncol(beta_matrix), drop = FALSE])

prophet_coefficients <- data.frame(
regressor = regressor_names,
coefficient = as.numeric(regressor_coefs),
stringsAsFactors = FALSE
)
}
}
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite a bit of nested if's ..try this?

# Returns a data.frame with columns: regressor, coefficient
# or NULL if coefficients/regressors can't be extracted.
extract_prophet_regressor_coefs <- function(prophet_model) {
  # --- Guard clauses (bail early) ---
  if (is.null(prophet_model)) return(NULL)
  if (is.null(prophet_model$params) || is.null(prophet_model$params$beta)) return(NULL)
  if (is.null(prophet_model$extra_regressors)) return(NULL)

  regressor_names <- names(prophet_model$extra_regressors)
  if (length(regressor_names) == 0) return(NULL)

  beta_matrix <- prophet_model$params$beta
  if (is.null(dim(beta_matrix)) || ncol(beta_matrix) == 0) return(NULL)

  # --- Helper to build the output ---
  build_df <- function(coefs) {
    data.frame(
      regressor = regressor_names,
      coefficient = as.numeric(coefs),
      stringsAsFactors = FALSE
    )
  }

  # --- Preferred path: regressors are the last K columns ---
  k <- length(regressor_names)
  if (ncol(beta_matrix) < k) return(NULL)

  # In most Prophet implementations, extra regressors live in the last K beta columns.
  start_idx <- ncol(beta_matrix) - k + 1
  regressor_cols <- start_idx:ncol(beta_matrix)

  coefs <- colMeans(beta_matrix[, regressor_cols, drop = FALSE])
  build_df(coefs)
}

i think prompting a message like in line 761 would be helpful

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good structure, thanks for the suggestion.
I followed through, and I added some extra stuff, I tried to follow the same format you suggested because I think is cleaner.

@Eliel-Avilez
Copy link
Copy Markdown
Collaborator Author

Eliel-Avilez commented Feb 4, 2026

Added reference level that Prophet omits. This change is to give Scenario Modeling the actual factor values on which the model was trained on, and not leave the reference level with coeff = 0 to a guess. (Marathons_2 and RandomVariable_Z in this case)

image

@sallyhong
Copy link
Copy Markdown
Collaborator

Thanks for making the changes!
Really close to approving-- I promise!

I think one more sanity check that all regressors are accounted for
maybe something like number of regressors == num unique values in the factor column might be worth

also have we checked how robyn names factors if the values have underscores or spaces? like if the factor values were "Test Name 1" "test-name-2" "TEST_NAME_3" ? just want to make sure we got the name conversion logic down so it doesn't bite us downstream.

@Eliel-Avilez
Copy link
Copy Markdown
Collaborator Author

Eliel-Avilez commented Feb 6, 2026

  1. Added the sanity check number of regressors == num unique values in the factor column in function add_reference_levels_to_prophet_coefficients() to take advantage of the already existing loop.
    ‎R/R/inputs.R‎

  2. Regarding to the factor vars names, if the name contains dots (Test.Name) or spaces (Test Name), that would be dealt within the Rmd scenario modeling client template, because when the CSV is read by R, will automatically replace the spaces in the name with dots, and will break at robyn_inputs(). A user constraint will be added to avoid spaces and dots on variable names.

In the case the factor variables have an underscore (Test_Name) everything will work as intended.
Here an example of the output using a factor var Another_Random_Variable:
image

@sallyhong sallyhong self-requested a review February 6, 2026 19:18
Copy link
Copy Markdown
Collaborator

@sallyhong sallyhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all my points for Robyn2.

I think everything should be fine but if we catch something weird in ds-scenario-modeling we can always revisit :)

Sally

@Eliel-Avilez Eliel-Avilez merged commit b5c4da9 into main Feb 6, 2026
2 checks passed
@Eliel-Avilez Eliel-Avilez deleted the feat/store-prophet-coeffs branch February 10, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Development

Successfully merging this pull request may close these issues.

3 participants