Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure
CoRR(2024)
摘要
The NASA Astrophysics Data System (ADS) is the primary Digital Library portal
for researchers in astronomy and astrophysics. Over the past 30 years, the ADS
has gone from being an astronomy-focused bibliographic database to an open
digital library system supporting research in space and (soon) earth sciences.
This paper describes the evolution of the ADS system, its capabilities, and the
technological infrastructure underpinning it.
We give an overview of the ADS's original architecture, constructed primarily
around simple database models. This bespoke system allowed for the efficient
indexing of metadata and citations, the digitization and archival of full-text
articles, and the rapid development of discipline-specific capabilities running
on commodity hardware. The move towards a cloud-based microservices
architecture and an open-source search engine in the late 2010s marked a
significant shift, bringing full-text search capabilities, a modern API, higher
uptime, more reliable data retrieval, and integration of advanced
visualizations and analytics.
Another crucial evolution came with the gradual and ongoing incorporation of
Machine Learning and Natural Language Processing algorithms in our data
pipelines. Originally used for information extraction and classification tasks,
NLP and ML techniques are now being developed to improve metadata enrichment,
search, notifications, and recommendations. we describe how these computational
techniques are being embedded into our software infrastructure, the challenges
faced, and the benefits reaped.
Finally, we conclude by describing the future prospects of ADS and its
ongoing expansion, discussing the challenges of managing an interdisciplinary
information system in the era of AI and Open Science, where information is
abundant, technology is transformative, but their trustworthiness can be
elusive.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要