Lightweight single pass numerical reading extraction for displays in the wild.

Shanmukha Yenneti,Yan-Ming Chiou,Bob Price

Imaging and Multimedia Analytics at the Edge（2023）

引用 0|浏览0

暂无评分

摘要

Although considerable progress has been made in recognizing multi-character text from images, there are still cases where there is a lack of robust computationally-efficient methods that can execute on portable devices to read device displays in the wild. We specifically address the problem of parsing digits from 7 segment displays. Recognizing these displays is important for many tasks such as assisting users with tasks using augmented reality agents that need to verify actions or connecting legacy devices to the internet for process control using cheap cameras. Legacy techniques based on image processing operators and OCR are brittle whereas massive deep networks are too computationally expensive. We describe a computationally tractable VGG style backbone combined with a novel digit inference head that can be trained using a synthetic display generator with novel augmentations. We show the model trained on augmented synthetic data generalizes well to a corpus of real-world display images getting 97.8% single-frame accuracy and obtaining a throughput of 30 frames per second. We describe how the output can be further stabilized to improve accuracy through a kind of mode filtering.

查看译文

关键词

numerical reading extraction,displays,lightweight single pass

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要