The role of scene context in object recognition by humans and convolutional neural networks

Journal of Vision(2023)

引用 0|浏览2
暂无评分
摘要
It is rare that humans are required to recognize objects without a surrounding context. Previous research has shown that modifying the scene information can decrease the speed and accuracy of object recognition in human observers. Although convolutional neural networks (CNNs) can attain near human-level performance on simple object recognition tasks, it remains unclear whether these models of biological vision continue to reflect human abilities when objects occur in complex scenes. Here, we investigated the impact of visual clutter and semantic incongruence on object recognition accuracy in humans and CNNs. Eighteen undergraduate students and four CNNs implemented with Pytorch were shown 384 greyscale images consisting of a target object superimposed on a background scene. We manipulated the level of visual clutter, defined as how much texture, pattern, or excess information is in an image, and the semantic congruency, defined as whether the object-scene pairing was realistic. The eight target categories consisted of animals (bear, bison, elephant, owl) and common indoor objects (lamp, teapot, vacuum, vase), which were presented in either outdoor nature scenes or indoor scenes. The scenes were rated on their degree of clutter by separate participants and sorted into low or high clutter scenes. We found that human observers performed significantly worse with increased clutter, yet CNN performance was unaffected by clutter. Interestingly, the CNNs showed significantly better classification accuracy for congruent than incongruent object-scene pairings while the human observers did not. However, human participants did show a congruency bias effect, choosing a congruent category over an incongruent category in a significant portion of trials where they reported low confidence. Our findings reveal notable deviations between human and CNN object classification performance and indicate that CNN models do not process background scene context in the same way that humans do.
更多
查看译文
关键词
scene context,object recognition,convolutional neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要