Boba: Authoring and Visualizing Multiverse Analyses

Yang Liu
Yang Liu
Alex Kale
Alex Kale

IEEE Trans. Vis. Comput. Graph., pp. 1753-1763, 2021.

被引用1|引用|浏览81|DOI:https://doi.org/10.1109/TVCG.2020.3028985
WOS EI
其它链接arxiv.org|pubmed.ncbi.nlm.nih.gov|academic.microsoft.com|dblp.uni-trier.de
微博一下
This paper presents Boba, an integrated domain-specific language and visual analysis system for authoring and interpreting multiverse analyses

摘要

Multiverse analysis is an approach to data analysis in which all "reasonable" analytic decisions are evaluated in parallel and interpreted collectively, in order to foster robustness and transparency. However, specifying a multiverse is demanding because analysts must manage myriad variants from a cross-product of analytic decisions, and ...更多

代码

数据

0
简介
  • The last decade saw widespread failure to replicate findings in published literature across multiple scientific fields [2, 6, 35, 41].
  • As the replication crisis emerged [1], scholars began to re-examine how data analysis practices might lead to spurious findings.
  • An important contributing factor is the flexibility in making analytic decisions [16,17,48].
  • Flexibility in making decisions might inflate false-positive rates when researchers explore multiple alternatives and selectively report desired outcomes [48], a practice known as p-hacking [34].
  • A crowdsourced study [47] shows that well-
重点内容
  • The last decade saw widespread failure to replicate findings in published literature across multiple scientific fields [2, 6, 35, 41]
  • An important contributing factor is the flexibility in making analytic decisions [16,17,48]
  • We enable users to take into account sampling uncertainty and model fit by comparing observed data with model predictions [14]
  • This paper presents Boba, an integrated domain-specific language (DSL) and visual analysis system for authoring and interpreting multiverse analyses
  • With the DSL, users annotate their analysis script to insert local variations, from which the compiler synthesizes executable script variants corresponding to all compatible analysis paths
  • We provide a command line tool for compiling the DSL specification, running the generated scripts, merging the outputs, and invoking the visual analysis system
方法
  • DESIGN REQUIREMENTS

    The authors' overarching goal is to make it easier for researchers to conduct multiverse analyses.
  • As noted in prior work [12, 30], specifying a multiverse is tedious.
  • This is primarily because a multiverse is composed of many forking paths, yet non-linear program structures are not well supported in conventional tools [45].
结论
  • Through the process of designing, building, and using Boba, the authors gain insights into challenges that multiverse analysis poses for software designers and users.
  • One strategy is to represent analysis goals in higher-level abstractions, from which appropriate analysis methods might be synthesized [22].
  • Another is to guide less experienced users through key decision points and possible alternatives [30], starting from an initial script.This paper presents Boba, an integrated DSL and visual analysis system for authoring and interpreting multiverse analyses.
  • Boba is available as open source software at https://github.com/uwdata/boba
总结
  • Introduction:

    The last decade saw widespread failure to replicate findings in published literature across multiple scientific fields [2, 6, 35, 41].
  • As the replication crisis emerged [1], scholars began to re-examine how data analysis practices might lead to spurious findings.
  • An important contributing factor is the flexibility in making analytic decisions [16,17,48].
  • Flexibility in making decisions might inflate false-positive rates when researchers explore multiple alternatives and selectively report desired outcomes [48], a practice known as p-hacking [34].
  • A crowdsourced study [47] shows that well-
  • Methods:

    DESIGN REQUIREMENTS

    The authors' overarching goal is to make it easier for researchers to conduct multiverse analyses.
  • As noted in prior work [12, 30], specifying a multiverse is tedious.
  • This is primarily because a multiverse is composed of many forking paths, yet non-linear program structures are not well supported in conventional tools [45].
  • Conclusion:

    Through the process of designing, building, and using Boba, the authors gain insights into challenges that multiverse analysis poses for software designers and users.
  • One strategy is to represent analysis goals in higher-level abstractions, from which appropriate analysis methods might be synthesized [22].
  • Another is to guide less experienced users through key decision points and possible alternatives [30], starting from an initial script.This paper presents Boba, an integrated DSL and visual analysis system for authoring and interpreting multiverse analyses.
  • Boba is available as open source software at https://github.com/uwdata/boba
相关工作
  • We draw on prior work on authoring and visualizing multiverse analyses, and approaches for authoring alternative programs and designs.

    2.1 Multiverse Analysis

    Analysts begin a multiverse analysis by identifying reasonable analytic decisions a-priori [37, 49, 50]. Prior work defines reasonable decisions as those with firm theoretical and statistical support [49], and decisions can span the entire analysis pipeline from data collection and wrangling to statistical modeling and inference [30, 56]. While general guidelines such as a decision checklist [56] exist, defining what decisions are reasonable still involves a high degree of researcher subjectivity.

    The next step in multiverse analyses is to exhaust all compatible decision combinations and execute the analysis variants (we call a variant a universe). Despite the growing interest in performing multiverse analysis (e.g., [6,9,21,36,43]), few tools currently exist to aid authoring. Young and Holsteen [59] developed a STATA module that simplifies multimodel analysis into a single command, but it only works for simple variable substitution. Rdfanalysis [13], an R package, supports more complex alternative scenarios beyond simple value substitution, but the architecture assumes a linear sequential relationship between decisions. Our DSL similarly provides scaffolding for specifying a multiverse, but it has a simpler syntax, extends to other languages, and handles procedural dependencies between decisions.
基金
  • This work was supported by NSF Award 1901386
引用论文
  • M. Baker. 1,500 scientists lift the lid on reproducibility. Nature, 533(7604):452–454, 2016. doi: 10.1038/533452a
    Locate open access versionFindings
  • C. G. Begley and L. M. Ellis. Raise standards for preclinical cancer research. Nature, 483(7391):531–533, 201doi: 10.1038/483531a
    Locate open access versionFindings
  • J. Bernard, M. Hutter, H. Reinemuth, H. Pfeifer, C. Bors, and J. Kohlhammer. Visual-interactive preprocessing of multivariate time series data. In Computer Graphics Forum, vol. 38, pp. 401–412. Wiley Online Library, 2019. doi: doi.org/10.1111/cgf.13698
    Google ScholarLocate open access versionFindings
  • J. Bernard, T. Ruppert, O. Goroll, T. May, and J. Kohlhammer. Visualinteractive preprocessing of time series data. In Proceedings of SIGRAD 2012, number 81, pp. 39–48. Linkoping University Electronic Press, 2012.
    Google ScholarLocate open access versionFindings
  • M. Booshehrian, T. Moller, R. M. Peterman, and T. Munzner. Vismon: Facilitating analysis of trade-offs, uncertainty, and sensitivity in fisheries management decision making. In Computer Graphics Forum, vol. 31, pp. 1235–1244. Wiley Online Library, 2012. doi: 10.1111/j.1467-8659.2012. 03116.x
    Locate open access versionFindings
  • R. Border, E. C. Johnson, L. M. Evans, A. Smolen, N. Berley, P. F. Sullivan, and M. C. Keller. No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression across multiple large samples. American Journal of Psychiatry, 176(5):376–387, 2019. doi: 10. 1176/appi.ajp.2018.18070881
    Locate open access versionFindings
  • N. Boukhelifa, M.-E. Perrin, S. Huron, and J. Eagan. How Data Workers Cope with Uncertainty: A Task Characterisation Study. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 201doi: 10.1145/3025453.3025738
    Locate open access versionFindings
  • J. Cesario, D. J. Johnson, and W. Terrill. Is there evidence of racial disparity in police use of deadly force? analyses of officer-involved fatal shootings in 2015–2016. Social psychological and personality science, 10(5):586–595, 2019. doi: 10.1177/1948550618775108
    Locate open access versionFindings
  • M. Credeand L. A. Phillips. Revisiting the power pose effect: How robust are the results reported by carney, cuddy, and yap (2010) to data analytic decisions? Social Psychological and Personality Science, 8(5):493–499, 2017. doi: 10.1177/1948550617714584
    Locate open access versionFindings
  • E. Dejonckheere, E. K. Kalokerinos, B. Bastian, and P. Kuppens. Poor emotion regulation ability mediates the link between depressive symptoms and affective bipolarity. Cognition and Emotion, 33(5):1076–1083, 2019. doi: 10.1080/02699931.2018.1524747
    Locate open access versionFindings
  • E. Dejonckheere, M. Mestdagh, M. Houben, Y. Erbas, M. Pe, P. Koval, A. Brose, B. Bastian, and P. Kuppens. The bipolarity of affect and depressive symptoms. Journal of personality and social psychology, 114(2):323, 2018. doi: 10.1037/pspp0000186
    Locate open access versionFindings
  • P. Dragicevic, Y. Jansen, A. Sarma, M. Kay, and F. Chevalier. Increasing the transparency of research papers with explorable multiverse analyses. In Proc. ACM Human Factors in Computing Systems, pp. 65:1–65:15, 2019. doi: 10.1145/3290605.3300295
    Locate open access versionFindings
  • J. Gassen. A package to explore and document your degrees of freedom. https://github.com/joachim-gassen/rdfanalysis, 2019.
    Findings
  • A. Gelman. A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing. International Statistical Review, 2003.
    Google ScholarLocate open access versionFindings
  • A. Gelman, J. Hwang, and A. Vehtari. Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6):997– 1016, 2014. doi: 10.1007/s11222-013-9416-2
    Locate open access versionFindings
  • A. Gelman and E. Loken. The garden of forking paths: Why multiple comparisons can be a problem, even when there is no fishing expedition or p-hacking and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 2013.
    Google ScholarLocate open access versionFindings
  • A. Gelman and E. Loken. The statistical crisis in science. American Scientist, 102(6):460, 2014. doi: 10.1511/2014.111.460
    Locate open access versionFindings
  • P. J. Guo. Software tools to facilitate research programming. PhD thesis, Stanford University, 2012.
    Google ScholarFindings
  • B. Hartmann, L. Yu, A. Allison, Y. Yang, and S. R. Klemmer. Design as exploration: Creating interface alternatives through parallel authoring and runtime tuning. In Proc. ACM User Interface Software and Technology, pp. 91–100, 2008. doi: 10.1145/1449715.1449732
    Locate open access versionFindings
  • J. Hoffswell, W. Li, and Z. Liu. Techniques for flexible responsive visualization design. In Proc. ACM Human Factors in Computing Systems, pp. 1–1, 20doi: 10.1145/3313831.3376777
    Locate open access versionFindings
  • Z. Jelveh, B. Kogut, and S. Naidu. Political language in economics. Columbia Business School Research Paper, (14-57), 2018. doi: 10.2139/ ssrn.2535453
    Locate open access versionFindings
  • E. Jun, M. Daum, J. Roesch, S. E. Chasins, E. D. Berger, R. Just, and K. Reinecke. Tea: A high-level language and runtime system for automating statistical analysis. CoRR, abs/1904.05387, 2019.
    Findings
  • K. Jung, S. Shavitt, M. Viswanathan, and J. M. Hilbe. Female hurricanes are deadlier than male hurricanes. Proceedings of the National Academy of Sciences, 111(24):8782–8787, 2014. doi: 10.1073/pnas.1402786111
    Locate open access versionFindings
  • A. Kale, M. Kay, and J. Hullman. Decision-making under uncertainty in research synthesis: Designing for the garden of forking paths. In Proc. ACM Human Factors in Computing Systems, pp. 202:1–202:14, 2019. doi: 10.1145/3290605.3300432
    Locate open access versionFindings
  • M. Kay, G. L. Nelson, and E. B. Hekler. Researcher-centered design of statistics: Why bayesian statistics better fit the culture and incentives of HCI. In Proc. ACM Human Factors in Computing Systems, pp. 4521–4532, 2016. doi: 10.1145/2858036.2858465
    Locate open access versionFindings
  • M. B. Kery, A. Horvath, and B. Myers. Variolite: Supporting exploratory programming by data scientists. In Proc. ACM Human Factors in Computing Systems, pp. 1265–1276, 2017. doi: 10.1145/3025453.3025626
    Locate open access versionFindings
  • M. B. Kery and B. A. Myers. Interactions for untangling messy history in a computational notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 147–155, 2018. doi: 10. 1109/VLHCC.2018.8506576
    Locate open access versionFindings
  • Q. Li, M. R. Morris, A. Fourney, K. Larson, and K. Reinecke. The impact of web browser reader views on reading speed and user experience. In Proc. ACM Human Factors in Computing Systems, pp. 524:1–524:12, 2019. doi: 10.1145/3290605.3300754
    Locate open access versionFindings
  • R. Lipshitz and O. Strauss. Coping with Uncertainty: A Naturalistic Decision-Making Analysis. Organizational Behavior and Human Decision Processes, 69(2):149–163, 1997. doi: 10.1006/obhd.1997.2679
    Locate open access versionFindings
  • Y. Liu, T. Althoff, and J. Heer. Paths explored, paths omitted, paths obscured: Decision points & selective reporting in end-to-end data analysis. In Proc. ACM Human Factors in Computing Systems, pp. 406:1–406:14, 2020. doi: 10.1145/3313831.3376533
    Locate open access versionFindings
  • A. Lunzer. Towards the subjunctive interface: General support for parameter exploration by overlaying alternative application states. In Late Breaking Hot Topics, IEEE Visualization, vol. 98, pp. 45–48, 1998.
    Google ScholarLocate open access versionFindings
  • A. Lunzer. Choice and comparison where the user wants them: Subjunctive interfaces for computer-supported exploration. In Proceedings of INTERACT, pp. 474–482, 1999.
    Google ScholarLocate open access versionFindings
  • S. McConnell. Code complete. Microsoft Press, 2 ed., 2004.
    Google ScholarFindings
  • L. D. Nelson, J. Simmons, and U. Simonsohn. Psychology’s renaissance. Annual Review of Psychology, 69(1):511–534, 2018. doi: 10.1146/annurev -psych-122216-011836
    Locate open access versionFindings
  • Open Science Collaboration. Estimating the reproducibility of psychological science. Science, 349(6251), 2015. doi: 10.1126/science.aac4716
    Locate open access versionFindings
  • A. Orben and A. K. Przybylski. The association between adolescent wellbeing and digital technology use. Nature Human Behaviour, 3(2):173, 2019.
    Google ScholarLocate open access versionFindings
  • C. J. Patel, B. Burford, and J. P. A. Ioannidis. Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. Journal of Clinical Epidemiology, 68(9):1046– 1058, 2015. doi: 10.1016/j.jclinepi.2015.05.029
    Locate open access versionFindings
  • C. Pettitt. Dagre. https://github.com/dagrejs/dagre, 2015.
    Findings
  • C. Phelan, J. Hullman, M. Kay, and P. Resnick. Some prior(s) experience necessary: Templates for getting started with bayesian analysis. In Proc. ACM Human Factors in Computing Systems, pp. 479:1–479:12, 2019. doi: 10.1145/3290605.3300709
    Locate open access versionFindings
  • G. J. Poarch, J. Vanhove, and R. Berthele. The effect of bidialectalism on executive function. International Journal of Bilingualism, 23(2):612–628, 2019. doi: 10.1177/1367006918763132
    Locate open access versionFindings
  • F. Prinz, T. Schlange, and K. Asadullah. Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery, 10(9):712, 2011. doi: 10.1038/nrd3439-c1
    Locate open access versionFindings
  • J. R. Rae, S. Gulgoz, L. Durwood, M. DeMeules, R. Lowe, G. Lindquist, and K. R. Olson. Predicting early-childhood gender transitions. Psychological Science, 2019. doi: 10.1177/0956797619830649
    Locate open access versionFindings
  • J. M. Rohrer, B. Egloff, and S. C. Schmukle. Probing birth-order effects on narrow traits using specification-curve analysis. Psychological Science, 28(12):1821–1832, 2017.
    Google ScholarLocate open access versionFindings
  • M. Rubin. Do p values lose their meaning in exploratory analyses? it depends how you define the familywise error rate. Review of General Psychology, 21(3):269–275, 2017. doi: 10.1037/gpr0000123
    Locate open access versionFindings
  • A. Rule, A. Tabard, and J. D. Hollan. Exploration and explanation in computational notebooks. In Proc. ACM Human Factors in Computing Systems, p. 32, 2018. doi: 10.1145/3173574.3173606
    Locate open access versionFindings
  • M. Sedlmair, C. Heinzl, S. Bruckner, H. Piringer, and T. Moller. Visual parameter space analysis: A conceptual framework. IEEE Transactions on Visualization and Computer Graphics, 20(12):2161–2170, 2014. doi: 10.1109/TVCG.2014.2346321
    Locate open access versionFindings
  • [48] J. P. Simmons, L. D. Nelson, and U. Simonsohn. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11):1359– 1366, 2011. doi: 10.1177/0956797611417632
    Locate open access versionFindings
  • [49] U. Simonsohn, J. P. Simmons, and L. D. Nelson. Specification curve: Descriptive and inferential statistics on all reasonable specifications. Available at SSRN 2694998, 2015. doi: 10.2139/ssrn.2694998
    Findings
  • [50] S. Steegen, F. Tuerlinckx, A. Gelman, and W. Vanpaemel. Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5):702–712, 2016. doi: 10.1177/1745691616658637
    Locate open access versionFindings
  • [51] K. Sugiyama, S. Tagawa, and M. Toda. Methods for visual understanding of hierarchical system structures. IEEE Transactions on Systems, Man, and Cybernetics, 11(2):109–125, 1981. doi: 10.1109/TSMC.1981.4308636
    Locate open access versionFindings
  • [52] M. Terry, E. D. Mynatt, K. Nakakoji, and Y. Yamamoto. Variation in element and action: Supporting simultaneous development of alternative solutions. In Proc. ACM Human Factors in Computing Systems, pp. 711– 718, 2004. doi: 10.1145/985692.985782
    Locate open access versionFindings
  • [53] E. R. Tufte, N. H. Goeler, and R. Benson. Envisioning information. Graphics Press, 1990.
    Google ScholarFindings
  • [54] W. Vanpaemel, S. Steegen, F. Tuerlinckx, and A. Gelman. Multiverse analysis. https://osf.io/zj68b/, 2018.
    Findings
  • [55] A. Vehtari, A. Gelman, and J. Gabry. Practical Bayesian model evaluation using leave-one-out cross-validation and Estimating out-of-sample pointwise predictive accuracy using posterior simulations. J Stat Comput, 27(5):1413–1432, 2017. doi: 10.1007/s11222-016-9696-4
    Locate open access versionFindings
  • [56] J. M. Wicherts, C. L. S. Veldkamp, H. E. M. Augusteijn, M. Bakker, R. C. M. van Aert, and M. A. L. M. van Assen. Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7:1832, 2016. doi: 10.3389/fpsyg.2016.01832
    Locate open access versionFindings
  • [57] L. Wilkinson. Dot plots. The American Statistician, 53(3):276–281, 1999.
    Google ScholarLocate open access versionFindings
  • [58] Y. Yao, A. Vehtari, D. Simpson, and A. Gelman. Using stacking to average bayesian predictive distributions (with discussion). Bayesian Analysis, 13(3):917–1007, 2018. doi: 10.1214/17-BA1091
    Locate open access versionFindings
  • [59] C. Young and K. Holsteen. Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, 46(1):3–40, 2017. doi: 10.1177/0049124115610347
    Locate open access versionFindings
  • [60] E. Zgraggen, Z. Zhao, R. Zeleznik, and T. Kraska. Investigating the effect of the multiple comparisons problem in visual analysis. In Proc. ACM Human Factors in Computing Systems, pp. 479:1–479:12, 2018. doi: 10. 1145/3173574.3174053
    Locate open access versionFindings
下载 PDF 全文
您的评分 :
0

 

标签
评论