diff --git a/.gitignore b/.gitignore index cf2b1210b..28a457bd9 100644 --- a/.gitignore +++ b/.gitignore @@ -10,7 +10,6 @@ __pycache__/ *.log *.out *.synctex.gz -*.pdf # C extensions *.so diff --git a/azure-pipelines.yml b/azure-pipelines.yml index f3f94f883..3e8fc0548 100644 --- a/azure-pipelines.yml +++ b/azure-pipelines.yml @@ -35,6 +35,7 @@ jobs: foreach ($file in $editedFiles) { switch -Wildcard ($file) { "README.md" { Continue } + ".gitignore" { Continue } "econml/_version.py" { Continue } "prototypes/*" { Continue } "images/*" { Continue } @@ -70,7 +71,7 @@ jobs: - script: 'pip install git+https://github.com/slundberg/shap.git@d1d2700acc0259f211934373826d5ff71ad514de' displayName: 'Install specific version of shap' - - script: 'pip install sphinx sphinx_rtd_theme' + - script: 'pip install sphinx!=5.1.0 sphinx_rtd_theme' displayName: 'Install sphinx' - script: 'python setup.py build_sphinx -W' diff --git a/doc/Causal-Inference-User-Guide-v4-022520.pdf b/doc/Causal-Inference-User-Guide-v4-022520.pdf new file mode 100644 index 000000000..94a029a6e Binary files /dev/null and b/doc/Causal-Inference-User-Guide-v4-022520.pdf differ diff --git a/doc/conf.py b/doc/conf.py index 8edb25a3b..da401eb3f 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -21,7 +21,7 @@ # -- Project information ----------------------------------------------------- project = 'econml' -copyright = '2019, Microsoft Research' +copyright = '2022, Microsoft Research' author = 'Microsoft Research' version = econml.__version__ release = econml.__version__ @@ -119,7 +119,7 @@ # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". # html_static_path = ['_static'] -html_extra_path = ['map.svg'] +html_extra_path = ['map.svg', 'Causal-Inference-User-Guide-v4-022520.pdf', "spec/img"] # Custom sidebar templates, must be a dictionary that maps document names # to template names. diff --git a/doc/spec/causal_intro.rst b/doc/spec/causal_intro.rst new file mode 100644 index 000000000..b563f4891 --- /dev/null +++ b/doc/spec/causal_intro.rst @@ -0,0 +1,10 @@ +Introduction to Causal Inference +================================= + +If you are new to causal inference, it may be helpful to walk through a quick overview of concepts and techniques that we refer to over the course of the documentation. Below we provide a high level introduction to causal inference tailored for EconML: + +.. raw:: html + + + +The folks at DoWhy also have a broader introduction `here `__. \ No newline at end of file diff --git a/doc/spec/faq.rst b/doc/spec/faq.rst new file mode 100644 index 000000000..7e8b7ab41 --- /dev/null +++ b/doc/spec/faq.rst @@ -0,0 +1,77 @@ +Frequently Asked Questions (FAQ) +==================================================================== + +When should I use EconML? +-------------------------- + +EconML is designed to answer causal questions: what will happen in response to some change in behavior, +prices, or conditions? These questions require different methods than forecasting questions: +what will happen next if everything continues as it has been? + + +What are the advantages of EconML? +----------------------------------- + +EconML offers the broadest range of cutting-edge AI models designed specifically to answer causal questions. +The EconML models also build on familiar Python packages, allowing users to easily select the best model for their question. +Finally, EconML includes custom interpreters to create presentation-ready output. + + +How do I know if the results make sense? +---------------------------------------- + +Try comparing the consistency of your estimates across multiple models, including some that make +stronger structural assumptions like linear relationships and some that do not. Pay attention to the +standard errors as well as the point estimates—imprecise estimates should be interpreted accordingly. +While researchers can introduce bias by narrowly fishing for estimates that match their prior, it is also important +to use your expertise to evaluate results. If you estimate that a 5% decrease in price generates +an implausible 5000% increase in sales you should carefully review your code! + +I'm getting causal estimates that don't make sense. What next? +---------------------------------------------------------------- +First carefully check your code for errors and try several causal models. +If your estimates are consistent, but implausible, you may have a confounding variable that hasn’t been measured in your data. +Think carefully about the source of the data you are using: was there something unusual going on +during the period when the data were collected (for example a holiday or an economic downturn)? +Is there something unusual about your sample (for example, all men with pre-existing heart conditions)? + + +What if I don't have a good instrument, can't run an experiment, and don't observe all confounders? +------------------------------------------------------------------------------------------------------------ +In this case, no statistical approach can perfectly isolate the causal effect of the treatment on the outcome. +DML, OrthoForest, or MetaLearners, all including all the confounders you can observe, +will deliver the best approximation of the causal effect that minimizes the bias from confounders. +Be aware of some remaining bias when using these estimates. + + +How can I test whether I'm identifying the causal effect? +------------------------------------------------------------ +You are identifying a valid causal effect if and only if the underlying assumptions of the causal model +assumed by the estimation routine are correct. Those are often hard to test (though the `DoWhy `__ package may help). +Having made those assumptions, the EconML package allows you to fit the best causal model you can. +Many models will store a final stage fit metric that can be used to validate how well the causal model predicts out of sample, +which is a good diagnostic as to the quality of your model. + + +How do I give feedback? +------------------------------------ + +This project welcomes contributions and suggestions. Most contributions require you to agree to +a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, +grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. + + +When you submit a pull request, a CLA-bot will automatically determine whether you need to provide +a CLA and decorate the PR appropriately (e.g., label, comment). +Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. + + +This project has adopted the Microsoft Open Source Code of Conduct. +For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments. + + + + + + + diff --git a/doc/spec/img/Attribution.png b/doc/spec/img/Attribution.png new file mode 100644 index 000000000..39d97502f Binary files /dev/null and b/doc/spec/img/Attribution.png differ diff --git a/doc/spec/img/Recommendation.png b/doc/spec/img/Recommendation.png new file mode 100644 index 000000000..e0b3413de Binary files /dev/null and b/doc/spec/img/Recommendation.png differ diff --git a/doc/spec/img/Segmentation.png b/doc/spec/img/Segmentation.png new file mode 100644 index 000000000..fafe81acf Binary files /dev/null and b/doc/spec/img/Segmentation.png differ diff --git a/doc/spec/img/imgFamiliar.png b/doc/spec/img/imgFamiliar.png new file mode 100644 index 000000000..b22b13e72 Binary files /dev/null and b/doc/spec/img/imgFamiliar.png differ diff --git a/doc/spec/img/imgFlexible.png b/doc/spec/img/imgFlexible.png new file mode 100644 index 000000000..58af38227 Binary files /dev/null and b/doc/spec/img/imgFlexible.png differ diff --git a/doc/spec/img/imgUnified.png b/doc/spec/img/imgUnified.png new file mode 100644 index 000000000..00b2e7541 Binary files /dev/null and b/doc/spec/img/imgUnified.png differ diff --git a/doc/spec/motivation.rst b/doc/spec/motivation.rst index 90b061771..1e59ecb6d 100644 --- a/doc/spec/motivation.rst +++ b/doc/spec/motivation.rst @@ -31,55 +31,76 @@ python API. Motivating Examples =================== -Customer Targeting ------------------- - -An important problem in modern business analytics is building automated tools to prioritize customer -acquisition and personalize customer interactions to increase sales and revenue. Typically businesses -will offer personalized incentives to customers to increase spend or increase the level of -engagement via more human resources. Any such personalized intervention corresponds to a monetary -investment and the main question that business analytics are called to answer is: what is the return -on investment (ROI)? - -Analyzing the ROI is inherently a treatment effect question: what was the effect of any investment -on a particular customer on its spend? Understanding how these return on investment varies across -customers can enable more targeted investment policies and increased ROI via better targeting. Using historical -data from deployed investments, and estimating the heterogeneous treatment effect via any of -the proposed methods, business analysts can learn in an automated manner, data-driven -customer targeting and prioritization policies. - -Personalized Pricing --------------------- - -Personalized discounts have become very widespread in the digital economy. To set the optimal -personalized discount policy a business needs to understand what is the effect -of a drop in price on the demand of a customer for a product as a function of customer -characteristics. The estimation of such personalized demand elasticities can also be -phrased in the language of heterogeneous treatment effects, where the treatment -is the price (or typically log of price) on the demand (or typically log of demand) -as a function of observable features of the customer. Hence, estimation of heterogeneous -treatment effects can lead to optimal pricing policies. - - -Stratification in Clinical Trials ----------------------------------------- - -Which patients should be selected for a clinical trial? If we want to demonstrate -that a clinical treatment has an effect on at least some subset of a population, then -fully randomized clinical trials are inappropriate as they will solely estimate -average effects. Using heterogeneous treatment effect techniques, we can use -observational data to come up with estimates of these effects and identify -good candidate patients for a clinical trial that our model estimates have high -treatment effects. - -Learning Click-Through-Rates ----------------------------- - -In the design of a page layout and more importantly in ad placement, it is important -to understand the click-through-rate of page components (e.g. ads) on different positions -of a page. Even though the modern approach is to run multiple A/B tests, when such -page components involve revenue considerations (such as ad placement), then observational -data can help guide correct A/B tests to run. Heterogeneous treatment effect estimation -can provide estimates of the click-through-rate of page components from -observational data. In this setting, the treatment is simply whether the component is -placed on that page position and the response is whether the user clicked on it. +EconML is designed to measure the causal effect of some treatment variable(s) T on an outcome variable Y, controlling for a set of features X. Use cases include: + +Recommendation A/B testing +----------------------------- + +*Interpret experiments with imperfect compliance* + +.. image:: img/Recommendation.png + :alt: Recommendation A/B testing logo + +**Question**: A travel website would like to know whether joining a membership program +causes users to spend more time engaging with the website. + +**Problem**: They can’t look directly at existing data, comparing members and non-members, +because the customers who chose to become members are likely already more engaged than other users. +Nor can they run a direct A/B test because they can’t force users to sign up for membership. + +**Solution**: The company had run an earlier experiment to test the value of a new, +faster sign-up process. EconML’s DRIV estimator uses this experimental nudge towards membership +as an instrument that generates random variation in the likelihood of membership. +The DRIV model adjusts for the fact that not every customer who was offered the easier sign-up +became a member and returns the effect of membership rather than the effect of receiving the quick sign-up. + +Link to jupyter notebook: +`Recommendation A/B Testing `__ + +More details: +`Trip Advisor Case Study `__ + + +Customer Segmentation +---------------------- + +*Estimate individualized responses to incentives* + +.. image:: img/Segmentation.png + :alt: Customer Segmentation logo + +**Question**: A media subscription service would like to offer targeted discounts +through a personalized pricing plan. + +**Problem**: They observe many features of their customers, +but are not sure which customers will respond most to a lower price. + +**Solution**: EconML’s DML estimator uses price variations in existing data, +along with a rich set of user features, to estimate heterogeneous price sensitivities +that vary with multiple customer features. +The tree interpreter provides a presentation-ready summary of the key features +that explain the biggest differences in responsiveness to a discount. + +Link to jupyter notebook: +`Customer Segmentation `__. + +Multi-investment Attribution +----------------------------- +*Distinguish the effects of multiple outreach efforts* + +.. image:: img/Attribution.png + :alt: Multi-investment Attribution logo + +**Question**: A startup would like to know the most effective approach for recruiting new customers: +price discounts, technical support to ease adoption, or a combination of the two. + +**Problem**: The risk of losing customers makes experiments across outreach efforts too expensive. +So far, customers have been offered incentives strategically, +for example larger businesses are more likely to get technical support. + +**Solution**: EconML’s Doubly Robust Learner model jointly estimates the effects of multiple discrete treatments. +The model uses flexible functions of observed customer features to filter out confounding correlations +in existing data and deliver the causal effect of each effort on revenue. + +Link to jupyter notebook: +`Multi-investment Attribution `__. \ No newline at end of file diff --git a/doc/spec/overview.rst b/doc/spec/overview.rst new file mode 100644 index 000000000..137803342 --- /dev/null +++ b/doc/spec/overview.rst @@ -0,0 +1,32 @@ +Overview +========= + +EconML is a Python package that applies the power of machine learning techniques to estimate individualized causal responses from observational or experimental data. The suite of estimation methods provided in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users. + +EconML is open source software developed by the `ALICE `__ team at Microsoft Research. + +.. raw:: html + +

+
+
+
+

Flexible icon

Flexible

Allows for flexible model forms that do not impose strong assumptions, including models of heterogenous responses to treatment.

+
+

Unified icon

Unified

Broad set of methods representing latest advances in the econometrics and machine learning literature within a unified API.

+
+

Familiar icon

Familiar Interface

Built on standard Python packages for machine learning and data analysis.

+

+
+ +**Why causality?** + +Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals. + +**Why not just a vanilla machine learning solution?** + +Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior. + +**Why causal machine learning/EconML?** + +Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data. diff --git a/doc/spec/spec.rst b/doc/spec/spec.rst index 649dba98c..ad10428bb 100644 --- a/doc/spec/spec.rst +++ b/doc/spec/spec.rst @@ -1,19 +1,10 @@ EconML User Guide ================= -Causal machine learning applies the power of machine learning techniques to answer causal questions. - -* Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals. - -* Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior. - -* Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data. - -The EconML Python SDK, developed by the ALICE team at MSR New England, incorporates individual machine learning steps into interpretable causal models. By reducing the need for expert judgment, these innovations improve the reliability of what-if predictions and empower data scientists without extensive economic training to conduct causal analysis using existing data. - - .. toctree:: + overview motivation + causal_intro api flowchart comparison @@ -23,6 +14,7 @@ The EconML Python SDK, developed by the ALICE team at MSR New England, incorpora inference interpretability references + faq .. todo:: benchmark