Causal Meta-Mediation Analysis Inferring Dose-Response Function From Summary Statistics of Many Randomized Experiments

KDD 2020, 2020.

Cited by: 0|Bibtex|Views50
Weibo:
Since the offline A/B test literature works out the first part: counterfactual estimators of offline evaluation metrics that move the same way as their online counterparts, we focus on the second part: inferring causal effects of online evaluation metrics on business key performa...

Abstract:

It is common in the internet industry to use offline-developed algorithms to power online products that contribute to the success of a business. Offline-developed algorithms are guided by offline evaluation metrics, which are often different from online business key performance indicators (KPIs). To maximize business KPIs, it is important...More

Code:

Data:

0
Introduction
  • Nowadays it is common in the internet industry to develop algorithms that power online products using historical data.
  • Offline evaluation metrics are different from online business KPIs. For instance, a ranking algorithm, which powers search pages in e-commerce platforms, typically optimizes for relevance by predicting purchase or click probabilities of items.
  • A ranking algorithm, which powers search pages in e-commerce platforms, typically optimizes for relevance by predicting purchase or click probabilities of items
  • It could be tested offline for rank-aware evaluation metrics, for example, normalized discounted cumulative gain (NDCG), mean reciprocal rank (MRR) or mean average precision (MAP), which are calculated from the test set of historical purchase or clickthrough feedback of users.
  • The discrepancy between offline evaluation metrics and online business KPIs poses a challenge to product owners because it is not clear that, in order to maximize online business KPIs, which offline evaluation metric should be adopted to guide the offline development of algorithms
Highlights
  • Nowadays it is common in the internet industry to develop algorithms that power online products using historical data
  • Since the offline A/B test literature [6] bridges the inconsistency between changes of offline and online evaluation metrics, we only focus on, how sitewide gross merchandise value (GMV) would change for 10% lifts in online normalized discounted cumulative gain (NDCG), mean reciprocal rank (MRR), and mean average precision (MAP) of search page respectively
  • By noticing that online products could be assessed by online counterparts of offline evaluation metrics, we decompose the problem into two parts
  • Since the offline A/B test literature works out the first part: counterfactual estimators of offline evaluation metrics that move the same way as their online counterparts, we focus on the second part: inferring causal effects of online evaluation metrics on business key performance indicators (KPIs)
  • We model online evaluation metrics as mediators and formalize the problem as to identify, estimate, and test mediator dose-response function (DRF)
  • We apply the approach on Etsy’s real data to uncover the causality between three most popular rank-aware online evaluation metrics and GMV, and show how we successfully identify MRR as the offline evaluation metric for GMV maximization
Results
  • To decide the polynomial terms in the model, the authors perform Wald tests.
  • The result from Wald tests is in Table 4.
  • Since all the null hypothesis are rejected, the results suggest them including ATE on M , ATE on M2 and ATE on M3 in the model of each M (NDCG, MAP, MRR).
  • Figure 5 shows the estimation results.
  • Blue lines depict estimated mediator DRFs. Scattered points represent summary statistics, ATE on ranking evaluation metrics and ATE on GMV, of all experiments.
  • The range of three evaluation metrics could be much smaller than those in IR literature since they are defined at the user level
Conclusion
  • The algorithms developed offline power online products and online products contribute to the success of a business.
  • Offline evaluation metrics, which guide algorithm development, are different from online business KPIs. It is important for product owners to pick the offline evaluation metric guided by which the algorithm could maximize online business KPIs. By noticing that online products could be assessed by online counterparts of offline evaluation metrics, the authors decompose the problem into two parts.
  • The authors apply the approach on Etsy’s real data to uncover the causality between three most popular rank-aware online evaluation metrics and GMV, and show how the authors successfully identify MRR as the offline evaluation metric for GMV maximization
Summary
  • Introduction:

    Nowadays it is common in the internet industry to develop algorithms that power online products using historical data.
  • Offline evaluation metrics are different from online business KPIs. For instance, a ranking algorithm, which powers search pages in e-commerce platforms, typically optimizes for relevance by predicting purchase or click probabilities of items.
  • A ranking algorithm, which powers search pages in e-commerce platforms, typically optimizes for relevance by predicting purchase or click probabilities of items
  • It could be tested offline for rank-aware evaluation metrics, for example, normalized discounted cumulative gain (NDCG), mean reciprocal rank (MRR) or mean average precision (MAP), which are calculated from the test set of historical purchase or clickthrough feedback of users.
  • The discrepancy between offline evaluation metrics and online business KPIs poses a challenge to product owners because it is not clear that, in order to maximize online business KPIs, which offline evaluation metric should be adopted to guide the offline development of algorithms
  • Objectives:

    The authors' goal is to estimate average mediator DRF: E[μi (m)], with which the authors can compute the percentage change of E[μi (m)] for a 10% increase in m ceteris paribus.
  • Results:

    To decide the polynomial terms in the model, the authors perform Wald tests.
  • The result from Wald tests is in Table 4.
  • Since all the null hypothesis are rejected, the results suggest them including ATE on M , ATE on M2 and ATE on M3 in the model of each M (NDCG, MAP, MRR).
  • Figure 5 shows the estimation results.
  • Blue lines depict estimated mediator DRFs. Scattered points represent summary statistics, ATE on ranking evaluation metrics and ATE on GMV, of all experiments.
  • The range of three evaluation metrics could be much smaller than those in IR literature since they are defined at the user level
  • Conclusion:

    The algorithms developed offline power online products and online products contribute to the success of a business.
  • Offline evaluation metrics, which guide algorithm development, are different from online business KPIs. It is important for product owners to pick the offline evaluation metric guided by which the algorithm could maximize online business KPIs. By noticing that online products could be assessed by online counterparts of offline evaluation metrics, the authors decompose the problem into two parts.
  • The authors apply the approach on Etsy’s real data to uncover the causality between three most popular rank-aware online evaluation metrics and GMV, and show how the authors successfully identify MRR as the offline evaluation metric for GMV maximization
Tables
  • Table1: Finite-Sample Performance Comparison
  • Table2: Assumption Violation
  • Table3: Model Selection and Wald Tests μ(m): β1m + β2m2 + β3m3
  • Table4: Wald Test Results
  • Table5: Parameter Values
  • Table6: Second Stage Regression Results
Download tables as Excel
Reference
  • Joshua Angrist, Guido Imbens, and Donald Rubin. 1996. Identification of Causal Effects Using Instrumental Variables. J. Amer. Statist. Assoc. 91, 434 (6 1996), 444.
    Google ScholarLocate open access versionFindings
  • Joshua Angrist and Alan Krueger. 2001. Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives 15, 4 (11 2001), 69–85.
    Google ScholarLocate open access versionFindings
  • Reuben Baron and David Kenny. 1986. The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology 51, 6 (1986), 1173– 1182.
    Google ScholarLocate open access versionFindings
  • Will Browne and Mike Jones. 2017. What works in e-commerce - a meta-analysis of 6700 online experiments. Qubit Digital Ltd (2017), 1–21.
    Google ScholarFindings
  • Harris Cooper, Larry Hedges, and Jeffrey Valentine. 2009. The handbook of research synthesis and meta-analysis. Russell Sage Foundation.
    Google ScholarFindings
  • Alexandre Gilotte, ClÃľment Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data
    Google ScholarLocate open access versionFindings
  • Donald Green, Shang Ha, and John Bullock. 2010. Enough already about "Black
    Google ScholarFindings
  • William Greene. 2011. Econometric analysis (7 ed.). Pearson Education Inc. 1232 pages.
    Google ScholarFindings
  • James Heckman and Rodrigo Pinto. 2015. Econometric Mediation Analyses: Identifying the Sources of Treatment Effects from Experimentally Estimated Production Technologies with Unmeasured and Mismeasured Inputs. Econometric Reviews 34 (2015), 6–31.
    Google ScholarLocate open access versionFindings
  • Julian Higgins and Simon Thompson. 2002. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 21, 11 (6 2002), 1539–1558.
    Google ScholarLocate open access versionFindings
  • Kosuke Imai, Luke Keele, Dustin Tingley, and Teppei Yamamoto. 2014. Comment on Pearl: Practical implications of theoretical results for causal mediation analysis. Psychological Methods 19, 4 (2014), 482–487.
    Google ScholarLocate open access versionFindings
  • Kosuke Imai, Luke Keele, and Teppei Yamamoto. 2010.
    Google ScholarFindings
  • Guido Imbens. 2000. The Role of the Propensity Score in Estimating DoseResponse Functions. Biometrika 87, 3 (2000), 706–710.
    Google ScholarLocate open access versionFindings
  • Guido Imbens and Keisuke Hirano. 2004. The Propensity Score with Continuous
    Google ScholarFindings
  • David MacKinnon, Amanda Fairchild, and Matthew Fritz. 2006. Mediation Analysis. Annual Review of Psychology 58, 1 (12 2006), 593–614.
    Google ScholarLocate open access versionFindings
  • Judea Pearl. 2001. Direct and indirect effects. In Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 411–420.
    Google ScholarLocate open access versionFindings
  • Judea Pearl. 2014. Interpretation and identification of causal mediation. Psychological Methods 19, 4 (2014), 459–481.
    Google ScholarLocate open access versionFindings
  • Judea Pearl. 2014. Reply to Commentary by Imai, Keele, Tingley, and Yamamoto Concerning Causal Mediation Analysis. Psychological Methods 19, 4 (2014), 488– 492.
    Google ScholarLocate open access versionFindings
  • Alexander Peysakhovich and Dean Eckles. 2018. Learning causal effects from many randomized experiments using regularized instrumental variables. In The Web Conference 2018 (WWW 2018). ACM, New York, NY.
    Google ScholarLocate open access versionFindings
  • James Robins. 2003. Semantics of causal DAG models and the identification of direct and indirect effects. Highly Structured Stochastic Systems (1 2003), 70–82.
    Google ScholarLocate open access versionFindings
  • James Robins and Sander Greenland. 1992. Identifiability and exchangeability for direct and indirect effects. Epidemiology 3, 2 (1992), 143–155.
    Google ScholarLocate open access versionFindings
  • James Robins and Thomas Richardson. 2010. Alternative graphical causal models and the identification of direct effects. Causality and psychopathology: finding the determinants of disorders and their cures (2010).
    Google ScholarFindings
  • Donald Rubin. 2003. Basic concepts of statistical inference for causal effects in experiments and observational studies. (2003).
    Google ScholarFindings
  • Derek Rucker, Kristopher Preacher, Zakary Tormala, and Richard Petty. 2011.
    Google ScholarFindings
  • Dylan Small. 2012. Mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables. Journal of Statistical Research 46, 2 (2012), 91–103.
    Google ScholarLocate open access versionFindings
  • Michael Sobel. 2008. Identification of Causal Parameters in Randomized Studies With Mediating Variables. Journal of Educational and Behavioral Statistics 33, 2 (2008), 230–251.
    Google ScholarLocate open access versionFindings
  • Tom Stanley and Hristos Doucouliagos. 2012. Meta-regression analysis in economics and business. Routledge.
    Google ScholarFindings
  • Jeffrey Wooldridge. 2010. Econometric analysis of cross section and panel data. MIT Press, Cambridge, MA. 1096 pages.
    Google ScholarFindings
  • Xuan Yin and Liangjie Hong. 2019. The Identification and Estimation of Direct and Indirect Effects in A/B Tests Through Causal Mediation Analysis. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19). ACM, New York, NY, USA, 2989–2999.
    Google ScholarLocate open access versionFindings
  • 1. Taken together, we have shown that E[Xi⊤εi′] = 0, therefore B
    Google ScholarFindings
  • 1. With respect to the third component, E (SiTi )⊤Hεi′ =E (SiTi )⊤H (SiTi )⊤(γi − H π )
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments