Colt: concept lineage tool for data flow metadata capture and analysis

Hosted Content(2017)

引用 3|浏览37
暂无评分
摘要
AbstractMost organizations are becoming increasingly data-driven, often processing data from many different sources to enable critical business operations. Beyond the well-addressed challenge of storing and processing large volumes of data, financial institutions in particular are increasingly subject to federal regulations requiring high levels of accountability for the accuracy and lineage of this data. For companies like GE Capital, which maintain data across a globally interconnected network of thousands of systems, it is becoming increasingly challenging to capture an accurate understanding of the data flowing between those systems. To address this problem, we designed and developed a concept lineage tool allowing organizational data flows to be modeled, visualized and interactively explored. This tool has novel features that allow a data flow network to be contextualized in terms of business-specific metadata such as the concept, business, and product for which it applies. Key analysis features have been implemented, including the ability to trace the origination of particular datasets, and to discover all systems where data is found that meets some user-defined criteria. This tool has been readily adopted by users at GE Capital and in a short time has already become a business-critical application, with over 2,200 data systems and over 1,000 data flows captured.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要