Improving Data-Scarce Image Classification Through Multimodal Synthetic Data Pretraining

2023 IEEE Sensors Applications Symposium (SAS)(2023)

引用 0|浏览15
暂无评分
摘要
Deep Learning algorithms and models greatly benefit from the release of large-scale datasets, also including synthetically generated data, when real-life data is scarce. Multimodal datasets feature more descriptive environmental information than single-sensor ones, but they are generally small and not widely accessible. In this paper, we construct a synthetically-generated image classification dataset consisting of grayscale camera images and depth information acquired from an 8x8-pixel Time-of-Flight sensor. We propose and evaluate six Convolutional Neural Network-based feature-level fusion models to integrate the multimodal data, outperforming the accuracy of the cameraonly model by up to 17% in real-world settings. By pretraining the model on synthetically-generated sample pairs, followed by fine-tuning it with only 16 real-domain samples, we outperform a non-pretrained counterpart by 35% while maintaining the storage constraints in the order of hundreds of kB. Our proposed convolutional model, pretrained on both synthetic and real-world sensor data, achieves a top-1 accuracy of 86.48%, proving the benefits of using multimodal datasets to train feature-level data fusion neural networks. Low-power emerging embedded microcontrollers, such as multi-core RISC-V systems-on-chip, are perfect candidates for running our model due to their reduced power consumption and parallel computing capabilities that speed up inference.
更多
查看译文
关键词
sensor fusion,multimodal data fusion,time of flight sensor,synthetic data,tinyML,low-power,image classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络