From cf4668ca4b057910c03ed355dfe95f4e775180b2 Mon Sep 17 00:00:00 2001
From: Fabio Vera <fabiovera@microsoft.com>
Date: Thu, 22 May 2025 15:26:53 -0400
Subject: [PATCH 1/3] initial commit for validation docs

Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
---
 doc/spec/spec.rst       |  1 +
 doc/spec/validation.rst | 68 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)
 create mode 100644 doc/spec/validation.rst

diff --git a/doc/spec/spec.rst b/doc/spec/spec.rst
index 38917c742..2afd6a93d 100644
--- a/doc/spec/spec.rst
+++ b/doc/spec/spec.rst
@@ -13,6 +13,7 @@ EconML User Guide
     estimation_dynamic
     inference
     model_selection
+    validation
     interpretability
     federated_learning
     references
diff --git a/doc/spec/validation.rst b/doc/spec/validation.rst
new file mode 100644
index 000000000..54280d2d7
--- /dev/null
+++ b/doc/spec/validation.rst
@@ -0,0 +1,68 @@
+Validation
+======================
+
+Validating causal estimates is inherently challenging, as the true counterfactual outcome for a given treatment is
+unobservable. However, there are several checks and tools available in EconML to help assess the credibility of causal
+estimates.
+
+
+Sensitivity Analysis
+---------------------
+
+For many EconML estimators, unobserved confounding can lead to biased causal estimates.
+Moreover, it is impossible to prove the absence of unobserved confounders.
+This is a fundamental problem for observational causal inference.
+
+To mitigate this problem, EconML provides a suite of sensitivity analysis tools,
+based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_,
+to assess the robustness of causal estimates to unobserved confounding. 
+
+Specifically, select estimators (subclasses of :class:`.DML` and :class:`.DRLearner`)
+have access to ``sensitivity_analysis``, ``robustness_value``, and ``sensitivity_summary`` methods.
+
+``sensitivity_analysis`` provides an updated confidence interval for the ATE based on a specified level of unobserved confounding.
+
+
+``robustness_value`` computes the minimum level of unobserved confounding required
+to make it impossible to reject a null hypothesis (default 0).
+
+
+``sensitivity_summary`` provides a summary of the the two above methods.
+
+DRTester
+----------------
+
+EconML provides the :class:`.DRTester` class, which implements Best Linear Predictor (BLP), calibration r-squared,
+and uplift modeling methods for validation.
+
+See an example notebook `here <https://github.com/py-why/EconML/blob/main/notebooks/CATE%20validation.ipynb>`__.
+
+Scoring
+-------
+
+Many EconML estimators implement a ``.score`` method to evaluate the goodness-of-fit of the final model. While it may be 
+difficult to make direct sense of results from ``.score``, EconML offers the :class:`RScorer` class to facilitate model 
+selection based on scoring.
+
+:class:`RScorer` enables comparison and selection among different causal models.
+
+See an example notebook `here
+<https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Model%20Selection%20with%20the%20RScorer.ipynb>`__.
+
+Confidence Intervals and Inference
+----------------------------------
+
+Most EconML estimators allow for inference, including standard errors, confidence intervals, and p-values for
+estimated effects. A common validation approach is to check whether the p-values are below a chosen significance level
+(e.g., 0.05). If not, the null hypothesis that the causal effect is zero cannot be rejected.
+
+**Note:** Inference results are only valid if the model specification is correct. For example, if a linear model is used
+but the true data-generating process is nonlinear, the inference may not be reliable. It is generally not possible to
+guarantee correct specification, so p-value inspection should be considered a surface-level check.
+
+DoWhy Refutation Tests
+----------------------
+
+The DoWhy library, which complements EconML, includes several refutation tests for validating causal estimates. These
+tests work by comparing the original causal estimate to estimates obtained from perturbed versions of the data, helping
+to assess the robustness of causal conclusions.
\ No newline at end of file

From c1f08100ddd2b0048853859d86f536895ab1bf64 Mon Sep 17 00:00:00 2001
From: Fabio Vera <fabiovera@microsoft.com>
Date: Thu, 5 Jun 2025 15:20:21 -0400
Subject: [PATCH 2/3] polish references

Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
---
 doc/spec/references.rst     | 6 ++++++
 doc/spec/validation.rst     | 2 +-
 econml/dml/causal_forest.py | 4 ++--
 econml/dml/dml.py           | 4 ++--
 econml/dr/_drlearner.py     | 4 ++--
 5 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/doc/spec/references.rst b/doc/spec/references.rst
index 07ef60507..51e7de5bd 100644
--- a/doc/spec/references.rst
+++ b/doc/spec/references.rst
@@ -17,6 +17,12 @@ References
     Two-Stage Estimation with a High-Dimensional Second Stage.
     2018.
 
+.. [Chernozhukov2022]
+    V. Chernozhukov, C. Cinelli, N. Kallus, W. Newey, A. Sharma, and V. Syrgkanis.
+    Long Story Short: Omitted Variable Bias in Causal Machine Learning.
+    *NBER Working Paper No. 30302*, 2022.
+    URL https://www.nber.org/papers/w30302.
+
 .. [Hartford2017]
     Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy.
     Deep IV: A flexible approach for counterfactual prediction.
diff --git a/doc/spec/validation.rst b/doc/spec/validation.rst
index 54280d2d7..7a2177083 100644
--- a/doc/spec/validation.rst
+++ b/doc/spec/validation.rst
@@ -14,7 +14,7 @@ Moreover, it is impossible to prove the absence of unobserved confounders.
 This is a fundamental problem for observational causal inference.
 
 To mitigate this problem, EconML provides a suite of sensitivity analysis tools,
-based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_,
+based on [Chernozhukov2022]_,
 to assess the robustness of causal estimates to unobserved confounding. 
 
 Specifically, select estimators (subclasses of :class:`.DML` and :class:`.DRLearner`)
diff --git a/econml/dml/causal_forest.py b/econml/dml/causal_forest.py
index ab353e9fc..829204537 100644
--- a/econml/dml/causal_forest.py
+++ b/econml/dml/causal_forest.py
@@ -857,7 +857,7 @@ def sensitivity_interval(self, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interval_
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
@@ -901,7 +901,7 @@ def robustness_value(self, null_hypothesis=0, alpha=0.05, interval_type='ci'):
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
diff --git a/econml/dml/dml.py b/econml/dml/dml.py
index 01e184b74..0606f14cc 100644
--- a/econml/dml/dml.py
+++ b/econml/dml/dml.py
@@ -646,7 +646,7 @@ def sensitivity_interval(self, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interval_
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
@@ -690,7 +690,7 @@ def robustness_value(self, null_hypothesis=0, alpha=0.05, interval_type='ci'):
 
         Can only be calculated when Y and T are single arrays, and T is binary or continuous.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
diff --git a/econml/dr/_drlearner.py b/econml/dr/_drlearner.py
index 3208c8064..e6f35e7e9 100644
--- a/econml/dr/_drlearner.py
+++ b/econml/dr/_drlearner.py
@@ -798,7 +798,7 @@ def sensitivity_interval(self, T, alpha=0.05, c_y=0.05, c_t=0.05, rho=1., interv
         The sensitivity interval is the range of values for the ATE that are
         consistent with the observed data, given a specified level of confounding.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------
@@ -848,7 +848,7 @@ def robustness_value(self, T, null_hypothesis=0, alpha=0.05, interval_type='ci')
 
         Returns 0 if the original interval already includes the null_hypothesis.
 
-        Based on `Chernozhukov et al. (2022) <https://www.nber.org/papers/w30302>`_
+        Based on [Chernozhukov2022]_
 
         Parameters
         ----------

From 940d3eae48692f406ad7b96c4d08a6288af2b5b5 Mon Sep 17 00:00:00 2001
From: Fabio Vera <fabiovera@microsoft.com>
Date: Thu, 5 Jun 2025 15:24:06 -0400
Subject: [PATCH 3/3] update wording

Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
---
 doc/spec/validation.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/spec/validation.rst b/doc/spec/validation.rst
index 7a2177083..859c2e9b4 100644
--- a/doc/spec/validation.rst
+++ b/doc/spec/validation.rst
@@ -24,7 +24,7 @@ have access to ``sensitivity_analysis``, ``robustness_value``, and ``sensitivity
 
 
 ``robustness_value`` computes the minimum level of unobserved confounding required
-to make it impossible to reject a null hypothesis (default 0).
+so that confidence intervals around the ATE would begin to include the given point (0 by default).
 
 
 ``sensitivity_summary`` provides a summary of the the two above methods.