A Sequential Addressing Subsampling Method for Massive Data Analysis Under Memory Constraint

Rui Pan,Yingqiu Zhu,Baishan Guo,Xuening Zhu,Hansheng Wang

arxiv（2023）

引用 0|浏览8

暂无评分

摘要

The emergence of massive data in recent years brings challenges to automatic statistical inference. This is particularly true if the data are too numerous to be read into memory as a whole. Accordingly, new sampling techniques are needed to sample data from a hard drive. In this paper, we propose a sequential addressing subsampling (SAS) method that can sample data directly from the hard drive. The newly proposed SAS method is time saving in terms of addressing cost compared to that of the random addressing subsampling (RAS) method. Estimators (e.g., the sample mean) based on the SAS subsamples are constructed, and their properties are studied. We conduct a series of simulation studies to verify the finite sample performance of the proposed SAS estimators. The time cost is also compared between the SAS and RAS methods. An analysis of the airline data is presented for illustration purpose.

查看译文

关键词

Massive data,random addressing subsampling,sequential addressing subsampling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要