nanoHUB user behavior: moving from retrospective statistics to actionable behavior analysis

semanticscholar(2019)

引用 0|浏览0
暂无评分
摘要
nanoHUB annually serves 17,000+ registered users with over 1 million simulations. In the past, we have used data analytics to demonstrate that nanoHUB can be a powerful scientific knowledge sharing platform. We used retrospective data analytics to show how simulation tools were used in structured education and how simulation tools were used in novel research. With the use of such retrospective analytics, we have made strategic decisions in terms of tool and content developments and justified continued nanoHUB investments by the US National Science Foundation (NSF). As we migrate towards a sustainable nanoHUB we must embrace similar processes pursued by in similar platforms such as Uber or AirBnB: we need to create actionable data analytics that can rapidly support user experience and help grow the supply in the two-sided market platform – we need to improve the experience of providers as well as end-users. This paper describes some aspects on how we pursue user behavior analysis inside the virtual worlds of nanotechnology simulation tools. From such user behavior we plan to derive actionable analytics that influence user behaviors as they interact with nanoHUB. Keywords— nanoHUB; HUBzero; science gateways; user behavior; analytics; cluster; meander; education INTRODUCTION AND BACKGROUND nanoHUB is a scientific knowledge platform that has enabled over 3,500 researchers and educators to share 500+ research simulation tools and models as well as 6,000+ lectures and tutorials globally through a novel cyberinfrastructure. nanoHUB annually serves 17,000+ registered users with over 1 million simulations in an end-to-end user-oriented scientific computing cloud. Over 1.5 million visitors access the openly available web content items annually. These might be considered impressive summative numbers, but they do not address if the site has any impact or what these users are doing. Understanding these numbers requires some background on the original intentions and cyberinfrastructure developments around nanoHUB. Fundamental issues raised by peerreviewers were the perceived ability of a University project to provide a stable, national-level infrastructure, provide support for the offered services, and provide compute cycles for an ever-growing user base. From the very beginning in 1996 [1], the predecessor to nanoHUB called Purdue Network Computing Hub (PUNCH) was created to enable researchers to share their code without re-writes through novel web interfaces with end-users in education and research. PUNCH was so novel that even the web-server had to be created within the team. By 2004 the standard web-form-interfaces were antiquated and did not inspire the interactive exploration of simulation results with rapid “What If?” questions that users might have. Users had to download their simulation data to manipulate them in a form where they can be truly used. nanoHUB was not an end-toend usage platform. It became clear that the system had to be revamped to enable the hosting of user-friendly engineeringuse inspired interactive applications. Such interactive sessions had to be hosted in a reliable, scalable middleware that was running in production mode, not as a research paper demonstration. 3D dataset exploration had to be supported on remote, dedicated GPUs that deliver the results to end users. RAPPTURE, the Rapid APPlication infrastrucTURE toolkit [2] enabled researchers, who typically did not have any graphical user interfaces to their codes to describe the input and outputs of their codes in XML and to generate a GUI. New middleware [3] enabled 1,000+ users to be hosted simultaneously on a moderate cluster of about 20 compute nodes. A novel remote GPU-based visualization system [4] supported hundreds of simultaneous sessions. nanoHUB established the first community accounts on TeraGrid and OSG which would execute heavy-lifting nanoHUB simulation jobs completely transparently on behalf of users who had no accounts on these grid platforms [5]. We developed processes [6] to continually test the reliability of these remote grid services to ensure smooth user services. For application support we developed policies and operational infrastructure that enabled tool contributors to support and improve their tools through question & answer forums and through wishlists. As this novel infrastructure emerged in 2005 we observed rapid growth in the simulation user base from the historical numbers of 500 annual users to over 10,000 in a few years. As 11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019 questions of technical feasibility were addressed new questions as to actual and potential impact emerged. Early-on our peer reviewers raised fundamental questions whether such research-based simulation tools could be used by other researchers at all and if these tools could be used in education without specific customizations. The nanoHUB team developed analytics that documented nanoHUB use research through reference and citation searches in the scientific literature. Today we can document over 2,200 papers that cite nanoHUB and we keep track of the used resources and tools, to provide attribution to the published tools. When we showed the first 200 formal citations our peers remained unconvinced that this could be good research. We then began to track secondary citations, which today sum to over 30,000 resulting in an h-index of 82. Our peers had a similarly strong opinion that research tools could not be used in education. We therefore developed novel clustering algorithms [7] that documented systematic used of simulation tools in formal education settings. Today we can show that over 35,000 students in over 1,800 classes at over 180 institutions have used nanoHUB in formalized education settings. We could also measure the time-to-adoption between tool publication and first-time systematic use in a classroom. The median time was determined to be less than 6 months. From the analysis of research use and education use we can begin to qualify the attributes of the underlying simulation tools. We found significant use in education and in research for many of the nanoHUB tools. These research and education impact studies are documented in detail in Nature Nanotechnology [8]. We used retrospective data analytics to show how simulation tools were used in structured education and how simulation tools were used in novel research. We showed that the transition from research tool publication to adoption in the classroom is happening rapidly in typically less than six months and demonstrated through longitudinal data how research tools migrate into education. With the use of these retrospective analytics, we have made strategic decisions in terms of tool and content developments and justified continued investments by NSF into nanoHUB. As we migrate towards a sustainable nanoHUB we must embrace similar processes pursued by in similar platforms such as Uber or AirBnB: we need to create actionable data analytics that can rapidly support user experience and help grow the supply in the two-sided market platform – we need to improve the experience of providers as well as end-users. II RESEARCH QUESTIONS Beyond raw numbers of users and simulations, we have over the years continued to ask ourselves: How do users behave in their virtual world of a simulation tool? More specifically: How do they “travel” through the design/exploration world? How many individual simulations do they run within one session? How many parameters do users change? How different do researchers, classroom users, and selfstudy users behave? How different do different classes behave? Does different class instruction material / scaffolding make a difference? Can we provide feedback to instructors on their classrooms? Given certain usage patterns inside the tool: Can we improve the tools and provide feedback to the developers? There are a variety of different requirements that need to be met to address some of these questions in a scalable infrastructure such as: Storage/availability of individual simulation runs within user sessions A data description language that is shared across different tools A large set of simulation runs and participants Other user data such as classroom participation, or researcher identification, geolocation, etc. In the next Sections we describe some of our first results that begin to address some of these questions. For our initial study presented here we focus on the user behavior for PN Junction Lab [9] which is consistently one of the top 10 nanoHUB tools [10] within any year. Despite our codename pntoy the tool is powered by an industrial strength semiconductor device modeling tool called PADRE [11]. Instead of learning the complex PADRE input language that involves gridding, geometry, material and environmental specifications, users can easily ask “What if?” questions in a toy-like fashion. II SEARCHERS AND WILDCATTERS RAPPTURE provides a rather generic description of simulation tool inputs and outputs. Over 90% of the 500+ nanoHUB simulation tools utilize RAPPTURE as their data description language. With existing simulation logs we can now begin to study the user behavior inside simulation tools. Each simulation tool typically consists of 10 to 50 parameters that are exposed to the users. Most of these parameters are freeform numbers such as length, doping, effective mass, dielectric constant, temperature etc. with their specific units, while there is also a significant set of discrete options such as model or geometry choices. Assuming that each parameter might have just 10 reasonable choices, then each tool spans a configurational design space of at least 1010 to 1050. The dimensionality of these tools is clearly too large to be intuitively understood. We developed a visualization methodology [12] to flatten an N-dimensional space into 2 dimensions. Figure 1 shows the conceptual mapping and shows two significantly differen
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要