Large linear classification when data cannot fit in memory

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three(2011)

引用 223|浏览7
Linear classification is a useful tool for dealing with large-scale data in applications such as document classification and natural language processing. Recent developments of linear classification have shown that the training process can be efficiently conducted. However, when the data size exceeds the memory capacity, most training methods suffer from very slow convergence due to the severe disk swapping. Although some methods have attempted to handle such a situation, they are usually too complicated to support some important functions such as parameter selection. In this paper, we introduce a block minimization framework for data larger than memory. Under the framework, a solver splits data into blocks and stores them into separate files. Then, at each time, the solver trains a data block loaded from disk. Although the framework is simple, the experimental results show that it effectively handles a data set 20 times larger than the memory capacity.
linear classification,memory capacity,data set,data size,large-scale data,block minimization framework,document classification,severe disk,training method,training process,large linear classification
AI 理解论文