Joint ALMA Observatory data science exploration on the cloud

Sergio Pavez,Ignacio Toledo, Tomas Staig, Nicolás Ovando, Gastón Vélez,Jorge Ibsen,Jorge Sierra, Agustin Grangetto

Observatory Operations: Strategies, Processes, and Systems IX(2022)

引用 0|浏览1
暂无评分
摘要
The Joint ALMA Observatory (JAO) decided some years ago to become a data-centric operational facility, basing its operational decision-making processes on evidence and ensuring several efforts to adopt data science practices to its daily operations. Key non-profit collaborations allowed ALMA to work with Dataiku, empowering us to design projects to explore high data volumes and prepare solutions to enable informed operational decisions. To increase the capabilities of the data science platform, JAO invested on an in-house infrastructure, providing a Hadoop ecosystem which allowed processing big datasets in reasonable time. The provisioning of such ecosystems is laborious and expensive in terms of system administration effort, highlighting the need to explore alternatives. JAO sought to collaborate with cloud providers to investigate alternatives, deciding to experiment with Amazon Web Services (AWS). A key element to this decision was flexibility provided, and a practical hands-on explorative approach, which was close to JAO's vision. The relationship, formalized through a Memorandum of Understanding, enabled the development of a proof of concept (PoC) aiming to replicate the existing system on the cloud. Although the PoC might not impress as an ambitious goal, designing an architecture using the broad set of technologies offered by AWS to seamlessly work together with Dataiku was a non-trivial challenge on top of the limited six weeks available to complete it and the continuous learning of technologies and concepts. This paper summarizes our results, lessons learned, and key insights gained during our focused and successful rapid prototyping effort.
更多
查看译文
关键词
ALMA, Data Science, DSS, AWS, Cloud
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要