Boba: Authoring and Visualizing Multiverse Analyses
IEEE Trans. Vis. Comput. Graph., pp. 1753-1763, 2021.
WOS EI
微博一下:
摘要:
Multiverse analysis is an approach to data analysis in which all "reasonable" analytic decisions are evaluated in parallel and interpreted collectively, in order to foster robustness and transparency. However, specifying a multiverse is demanding because analysts must manage myriad variants from a cross-product of analytic decisions, and ...更多
代码:
数据:
简介
- The last decade saw widespread failure to replicate findings in published literature across multiple scientific fields [2, 6, 35, 41].
- As the replication crisis emerged [1], scholars began to re-examine how data analysis practices might lead to spurious findings.
- An important contributing factor is the flexibility in making analytic decisions [16,17,48].
- Flexibility in making decisions might inflate false-positive rates when researchers explore multiple alternatives and selectively report desired outcomes [48], a practice known as p-hacking [34].
- A crowdsourced study [47] shows that well-
重点内容
- The last decade saw widespread failure to replicate findings in published literature across multiple scientific fields [2, 6, 35, 41]
- An important contributing factor is the flexibility in making analytic decisions [16,17,48]
- We enable users to take into account sampling uncertainty and model fit by comparing observed data with model predictions [14]
- This paper presents Boba, an integrated domain-specific language (DSL) and visual analysis system for authoring and interpreting multiverse analyses
- With the DSL, users annotate their analysis script to insert local variations, from which the compiler synthesizes executable script variants corresponding to all compatible analysis paths
- We provide a command line tool for compiling the DSL specification, running the generated scripts, merging the outputs, and invoking the visual analysis system
方法
- DESIGN REQUIREMENTS
The authors' overarching goal is to make it easier for researchers to conduct multiverse analyses. - As noted in prior work [12, 30], specifying a multiverse is tedious.
- This is primarily because a multiverse is composed of many forking paths, yet non-linear program structures are not well supported in conventional tools [45].
结论
- Through the process of designing, building, and using Boba, the authors gain insights into challenges that multiverse analysis poses for software designers and users.
- One strategy is to represent analysis goals in higher-level abstractions, from which appropriate analysis methods might be synthesized [22].
- Another is to guide less experienced users through key decision points and possible alternatives [30], starting from an initial script.This paper presents Boba, an integrated DSL and visual analysis system for authoring and interpreting multiverse analyses.
- Boba is available as open source software at https://github.com/uwdata/boba
总结
Introduction:
The last decade saw widespread failure to replicate findings in published literature across multiple scientific fields [2, 6, 35, 41].- As the replication crisis emerged [1], scholars began to re-examine how data analysis practices might lead to spurious findings.
- An important contributing factor is the flexibility in making analytic decisions [16,17,48].
- Flexibility in making decisions might inflate false-positive rates when researchers explore multiple alternatives and selectively report desired outcomes [48], a practice known as p-hacking [34].
- A crowdsourced study [47] shows that well-
Methods:
DESIGN REQUIREMENTS
The authors' overarching goal is to make it easier for researchers to conduct multiverse analyses.- As noted in prior work [12, 30], specifying a multiverse is tedious.
- This is primarily because a multiverse is composed of many forking paths, yet non-linear program structures are not well supported in conventional tools [45].
Conclusion:
Through the process of designing, building, and using Boba, the authors gain insights into challenges that multiverse analysis poses for software designers and users.- One strategy is to represent analysis goals in higher-level abstractions, from which appropriate analysis methods might be synthesized [22].
- Another is to guide less experienced users through key decision points and possible alternatives [30], starting from an initial script.This paper presents Boba, an integrated DSL and visual analysis system for authoring and interpreting multiverse analyses.
- Boba is available as open source software at https://github.com/uwdata/boba
相关工作
- We draw on prior work on authoring and visualizing multiverse analyses, and approaches for authoring alternative programs and designs.
2.1 Multiverse Analysis
Analysts begin a multiverse analysis by identifying reasonable analytic decisions a-priori [37, 49, 50]. Prior work defines reasonable decisions as those with firm theoretical and statistical support [49], and decisions can span the entire analysis pipeline from data collection and wrangling to statistical modeling and inference [30, 56]. While general guidelines such as a decision checklist [56] exist, defining what decisions are reasonable still involves a high degree of researcher subjectivity.
The next step in multiverse analyses is to exhaust all compatible decision combinations and execute the analysis variants (we call a variant a universe). Despite the growing interest in performing multiverse analysis (e.g., [6,9,21,36,43]), few tools currently exist to aid authoring. Young and Holsteen [59] developed a STATA module that simplifies multimodel analysis into a single command, but it only works for simple variable substitution. Rdfanalysis [13], an R package, supports more complex alternative scenarios beyond simple value substitution, but the architecture assumes a linear sequential relationship between decisions. Our DSL similarly provides scaffolding for specifying a multiverse, but it has a simpler syntax, extends to other languages, and handles procedural dependencies between decisions.
基金
- This work was supported by NSF Award 1901386
引用论文
- M. Baker. 1,500 scientists lift the lid on reproducibility. Nature, 533(7604):452–454, 2016. doi: 10.1038/533452a
- C. G. Begley and L. M. Ellis. Raise standards for preclinical cancer research. Nature, 483(7391):531–533, 201doi: 10.1038/483531a
- J. Bernard, M. Hutter, H. Reinemuth, H. Pfeifer, C. Bors, and J. Kohlhammer. Visual-interactive preprocessing of multivariate time series data. In Computer Graphics Forum, vol. 38, pp. 401–412. Wiley Online Library, 2019. doi: doi.org/10.1111/cgf.13698
- J. Bernard, T. Ruppert, O. Goroll, T. May, and J. Kohlhammer. Visualinteractive preprocessing of time series data. In Proceedings of SIGRAD 2012, number 81, pp. 39–48. Linkoping University Electronic Press, 2012.
- M. Booshehrian, T. Moller, R. M. Peterman, and T. Munzner. Vismon: Facilitating analysis of trade-offs, uncertainty, and sensitivity in fisheries management decision making. In Computer Graphics Forum, vol. 31, pp. 1235–1244. Wiley Online Library, 2012. doi: 10.1111/j.1467-8659.2012. 03116.x
- R. Border, E. C. Johnson, L. M. Evans, A. Smolen, N. Berley, P. F. Sullivan, and M. C. Keller. No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression across multiple large samples. American Journal of Psychiatry, 176(5):376–387, 2019. doi: 10. 1176/appi.ajp.2018.18070881
- N. Boukhelifa, M.-E. Perrin, S. Huron, and J. Eagan. How Data Workers Cope with Uncertainty: A Task Characterisation Study. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 201doi: 10.1145/3025453.3025738
- J. Cesario, D. J. Johnson, and W. Terrill. Is there evidence of racial disparity in police use of deadly force? analyses of officer-involved fatal shootings in 2015–2016. Social psychological and personality science, 10(5):586–595, 2019. doi: 10.1177/1948550618775108
- M. Credeand L. A. Phillips. Revisiting the power pose effect: How robust are the results reported by carney, cuddy, and yap (2010) to data analytic decisions? Social Psychological and Personality Science, 8(5):493–499, 2017. doi: 10.1177/1948550617714584
- E. Dejonckheere, E. K. Kalokerinos, B. Bastian, and P. Kuppens. Poor emotion regulation ability mediates the link between depressive symptoms and affective bipolarity. Cognition and Emotion, 33(5):1076–1083, 2019. doi: 10.1080/02699931.2018.1524747
- E. Dejonckheere, M. Mestdagh, M. Houben, Y. Erbas, M. Pe, P. Koval, A. Brose, B. Bastian, and P. Kuppens. The bipolarity of affect and depressive symptoms. Journal of personality and social psychology, 114(2):323, 2018. doi: 10.1037/pspp0000186
- P. Dragicevic, Y. Jansen, A. Sarma, M. Kay, and F. Chevalier. Increasing the transparency of research papers with explorable multiverse analyses. In Proc. ACM Human Factors in Computing Systems, pp. 65:1–65:15, 2019. doi: 10.1145/3290605.3300295
- J. Gassen. A package to explore and document your degrees of freedom. https://github.com/joachim-gassen/rdfanalysis, 2019.
- A. Gelman. A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-Fit Testing. International Statistical Review, 2003.
- A. Gelman, J. Hwang, and A. Vehtari. Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6):997– 1016, 2014. doi: 10.1007/s11222-013-9416-2
- A. Gelman and E. Loken. The garden of forking paths: Why multiple comparisons can be a problem, even when there is no fishing expedition or p-hacking and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 2013.
- A. Gelman and E. Loken. The statistical crisis in science. American Scientist, 102(6):460, 2014. doi: 10.1511/2014.111.460
- P. J. Guo. Software tools to facilitate research programming. PhD thesis, Stanford University, 2012.
- B. Hartmann, L. Yu, A. Allison, Y. Yang, and S. R. Klemmer. Design as exploration: Creating interface alternatives through parallel authoring and runtime tuning. In Proc. ACM User Interface Software and Technology, pp. 91–100, 2008. doi: 10.1145/1449715.1449732
- J. Hoffswell, W. Li, and Z. Liu. Techniques for flexible responsive visualization design. In Proc. ACM Human Factors in Computing Systems, pp. 1–1, 20doi: 10.1145/3313831.3376777
- Z. Jelveh, B. Kogut, and S. Naidu. Political language in economics. Columbia Business School Research Paper, (14-57), 2018. doi: 10.2139/ ssrn.2535453
- E. Jun, M. Daum, J. Roesch, S. E. Chasins, E. D. Berger, R. Just, and K. Reinecke. Tea: A high-level language and runtime system for automating statistical analysis. CoRR, abs/1904.05387, 2019.
- K. Jung, S. Shavitt, M. Viswanathan, and J. M. Hilbe. Female hurricanes are deadlier than male hurricanes. Proceedings of the National Academy of Sciences, 111(24):8782–8787, 2014. doi: 10.1073/pnas.1402786111
- A. Kale, M. Kay, and J. Hullman. Decision-making under uncertainty in research synthesis: Designing for the garden of forking paths. In Proc. ACM Human Factors in Computing Systems, pp. 202:1–202:14, 2019. doi: 10.1145/3290605.3300432
- M. Kay, G. L. Nelson, and E. B. Hekler. Researcher-centered design of statistics: Why bayesian statistics better fit the culture and incentives of HCI. In Proc. ACM Human Factors in Computing Systems, pp. 4521–4532, 2016. doi: 10.1145/2858036.2858465
- M. B. Kery, A. Horvath, and B. Myers. Variolite: Supporting exploratory programming by data scientists. In Proc. ACM Human Factors in Computing Systems, pp. 1265–1276, 2017. doi: 10.1145/3025453.3025626
- M. B. Kery and B. A. Myers. Interactions for untangling messy history in a computational notebook. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 147–155, 2018. doi: 10. 1109/VLHCC.2018.8506576
- Q. Li, M. R. Morris, A. Fourney, K. Larson, and K. Reinecke. The impact of web browser reader views on reading speed and user experience. In Proc. ACM Human Factors in Computing Systems, pp. 524:1–524:12, 2019. doi: 10.1145/3290605.3300754
- R. Lipshitz and O. Strauss. Coping with Uncertainty: A Naturalistic Decision-Making Analysis. Organizational Behavior and Human Decision Processes, 69(2):149–163, 1997. doi: 10.1006/obhd.1997.2679
- Y. Liu, T. Althoff, and J. Heer. Paths explored, paths omitted, paths obscured: Decision points & selective reporting in end-to-end data analysis. In Proc. ACM Human Factors in Computing Systems, pp. 406:1–406:14, 2020. doi: 10.1145/3313831.3376533
- A. Lunzer. Towards the subjunctive interface: General support for parameter exploration by overlaying alternative application states. In Late Breaking Hot Topics, IEEE Visualization, vol. 98, pp. 45–48, 1998.
- A. Lunzer. Choice and comparison where the user wants them: Subjunctive interfaces for computer-supported exploration. In Proceedings of INTERACT, pp. 474–482, 1999.
- S. McConnell. Code complete. Microsoft Press, 2 ed., 2004.
- L. D. Nelson, J. Simmons, and U. Simonsohn. Psychology’s renaissance. Annual Review of Psychology, 69(1):511–534, 2018. doi: 10.1146/annurev -psych-122216-011836
- Open Science Collaboration. Estimating the reproducibility of psychological science. Science, 349(6251), 2015. doi: 10.1126/science.aac4716
- A. Orben and A. K. Przybylski. The association between adolescent wellbeing and digital technology use. Nature Human Behaviour, 3(2):173, 2019.
- C. J. Patel, B. Burford, and J. P. A. Ioannidis. Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. Journal of Clinical Epidemiology, 68(9):1046– 1058, 2015. doi: 10.1016/j.jclinepi.2015.05.029
- C. Pettitt. Dagre. https://github.com/dagrejs/dagre, 2015.
- C. Phelan, J. Hullman, M. Kay, and P. Resnick. Some prior(s) experience necessary: Templates for getting started with bayesian analysis. In Proc. ACM Human Factors in Computing Systems, pp. 479:1–479:12, 2019. doi: 10.1145/3290605.3300709
- G. J. Poarch, J. Vanhove, and R. Berthele. The effect of bidialectalism on executive function. International Journal of Bilingualism, 23(2):612–628, 2019. doi: 10.1177/1367006918763132
- F. Prinz, T. Schlange, and K. Asadullah. Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery, 10(9):712, 2011. doi: 10.1038/nrd3439-c1
- J. R. Rae, S. Gulgoz, L. Durwood, M. DeMeules, R. Lowe, G. Lindquist, and K. R. Olson. Predicting early-childhood gender transitions. Psychological Science, 2019. doi: 10.1177/0956797619830649
- J. M. Rohrer, B. Egloff, and S. C. Schmukle. Probing birth-order effects on narrow traits using specification-curve analysis. Psychological Science, 28(12):1821–1832, 2017.
- M. Rubin. Do p values lose their meaning in exploratory analyses? it depends how you define the familywise error rate. Review of General Psychology, 21(3):269–275, 2017. doi: 10.1037/gpr0000123
- A. Rule, A. Tabard, and J. D. Hollan. Exploration and explanation in computational notebooks. In Proc. ACM Human Factors in Computing Systems, p. 32, 2018. doi: 10.1145/3173574.3173606
- M. Sedlmair, C. Heinzl, S. Bruckner, H. Piringer, and T. Moller. Visual parameter space analysis: A conceptual framework. IEEE Transactions on Visualization and Computer Graphics, 20(12):2161–2170, 2014. doi: 10.1109/TVCG.2014.2346321
- [48] J. P. Simmons, L. D. Nelson, and U. Simonsohn. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11):1359– 1366, 2011. doi: 10.1177/0956797611417632
- [49] U. Simonsohn, J. P. Simmons, and L. D. Nelson. Specification curve: Descriptive and inferential statistics on all reasonable specifications. Available at SSRN 2694998, 2015. doi: 10.2139/ssrn.2694998
- [50] S. Steegen, F. Tuerlinckx, A. Gelman, and W. Vanpaemel. Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5):702–712, 2016. doi: 10.1177/1745691616658637
- [51] K. Sugiyama, S. Tagawa, and M. Toda. Methods for visual understanding of hierarchical system structures. IEEE Transactions on Systems, Man, and Cybernetics, 11(2):109–125, 1981. doi: 10.1109/TSMC.1981.4308636
- [52] M. Terry, E. D. Mynatt, K. Nakakoji, and Y. Yamamoto. Variation in element and action: Supporting simultaneous development of alternative solutions. In Proc. ACM Human Factors in Computing Systems, pp. 711– 718, 2004. doi: 10.1145/985692.985782
- [53] E. R. Tufte, N. H. Goeler, and R. Benson. Envisioning information. Graphics Press, 1990.
- [54] W. Vanpaemel, S. Steegen, F. Tuerlinckx, and A. Gelman. Multiverse analysis. https://osf.io/zj68b/, 2018.
- [55] A. Vehtari, A. Gelman, and J. Gabry. Practical Bayesian model evaluation using leave-one-out cross-validation and Estimating out-of-sample pointwise predictive accuracy using posterior simulations. J Stat Comput, 27(5):1413–1432, 2017. doi: 10.1007/s11222-016-9696-4
- [56] J. M. Wicherts, C. L. S. Veldkamp, H. E. M. Augusteijn, M. Bakker, R. C. M. van Aert, and M. A. L. M. van Assen. Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7:1832, 2016. doi: 10.3389/fpsyg.2016.01832
- [57] L. Wilkinson. Dot plots. The American Statistician, 53(3):276–281, 1999.
- [58] Y. Yao, A. Vehtari, D. Simpson, and A. Gelman. Using stacking to average bayesian predictive distributions (with discussion). Bayesian Analysis, 13(3):917–1007, 2018. doi: 10.1214/17-BA1091
- [59] C. Young and K. Holsteen. Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, 46(1):3–40, 2017. doi: 10.1177/0049124115610347
- [60] E. Zgraggen, Z. Zhao, R. Zeleznik, and T. Kraska. Investigating the effect of the multiple comparisons problem in visual analysis. In Proc. ACM Human Factors in Computing Systems, pp. 479:1–479:12, 2018. doi: 10. 1145/3173574.3174053
下载 PDF 全文
标签
评论