Hydra -- A Federated Data Repository over NDN

arxiv(2022)

引用 0|浏览4
暂无评分
摘要
Today's big data science communities manage their data publication and replication at the application layer. These communities utilize myriad mechanisms to publish, discover, and retrieve datasets - the result is an ecosystem of either centralized, or otherwise a collection of ad-hoc data repositories. Publishing datasets to centralized repositories can be process-intensive, and those repositories do not accept all datasets. The ad-hoc repositories are difficult to find and utilize due to differences in data names, metadata standards, and access methods. To address the problem of scientific data publication and storage, we have designed Hydra, a secure, distributed, and decentralized data repository made of a loose federation of storage servers (nodes) provided by user communities. Hydra runs over Named Data Networking (NDN) and utilizes the State Vector Sync (SVS) protocol that lets individual nodes maintain a "global view" of the system. Hydra provides a scalable and resilient data retrieval service, with data distribution scalability achieved via NDN's built-in data anycast and in-network caching and resiliency against individual server failures through automated failure detection and maintaining a specific degree of replication. Hydra utilizes "Favor", a locally calculated numerical value to decide which nodes will replicate a file. Finally, Hydra utilizes data-centric security for data publication and node authentication. Hydra uses a Network Operation Center (NOC) to bootstrap trust in Hydra nodes and data publishers. The NOC distributes user and node certificates and performs the proof-of-possession challenges. This technical report serves as the reference for Hydra. It outlines the design decisions, the rationale behind them, the functional modules, and the protocol specifications.
更多
查看译文
关键词
federated data repository,hydra
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要