AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The visual context is encoded at the learning stage regardless of the predicting results, based on all given starting gaze points uniformly distributed on images in all the scales

Top–Down Gaze Movement Control in Target Search Using Population Cell Coding of Visual Context

IEEE T. Autonomous Mental Development, no. 3 (2010): 196-215

引用9|浏览13
EI WOS SCOPUS
下载 PDF 全文
引用
微博一下

摘要

Visual context plays an important role in humans' top-down gaze movement control for target searching. Exploring the mental development mechanism in terms of incremental visual context encoding by population cells is an interesting issue. This paper presents a biologically inspired computational model. The visual contextual cues were used...更多

代码

数据

0
简介
  • T ARGET search or object detection is an important ability of human vision system. Generally this process consists of two phases: 1) prediction of a target or an object’s place; and

    Manuscript received February 02, 2010; revised May 17, 2010; accepted May 28, 2010.
  • The third layer uses a single neuron or a group of population neurons to represent an object or a visual field image, and control or synthesize the eye movement through connection weights from the third layer to the fourth layer.
重点内容
  • T ARGET search or object detection is an important ability of human vision system
  • We propose a developmental neural network system that encodes the top–down knowledge of the visual context and infers the location of the target using a population-cell coding mechanism
  • The visual context is encoded at the learning stage regardless of the predicting results, based on all given starting gaze points uniformly distributed on images in all the scales
  • For determining the value of; the averaged means the average number of coding neurons participated in visual context encoding or decoding procedure; is the number of total coding neurons generated in the third layer, which represented the system’s model complexity; the system’s generalization error is evaluated by the comprehensive error described in Section IV-A
  • Our theoretical analysis and experimental results indicated that the population-cell coding system is generally more efficient than the single-cell coding system and the -NN-based coding system in representing the visual context and controlling the gaze motion for target searching
  • The ratios of the average encoding quantity (0.35 million connection weights for the left eye center) required by the population-cell coding system to the encoding quantities (0.43 and 6.87 million connection weights for left eye center) required by the single-cell coding system and the -NN-based coding system are about 77% and 5%, respectively; 2) With small samples, in the case of Exp. 1 (30 images are selected as training images), the locating accuracy for the left eye center by the population-cell coding system is 3.11 pixels which is 35.6% and 16.4% higher than the accuracies (4.83 and 3.72 pixels) provided by the single-cell coding system and the -NN-based coding system, respectively
  • Because this paper intends mainly to discuss the efficiency of visual context encoding and its top–down control for gaze movement in target search, some respects are not included in the current system
结果
  • Section III describes the developmental learning structure using population-cell coding mechanism and its principle on encoding visual context and controlling gaze movement in target search.
  • The coding neurons are linked by the feature neurons in the second layer and two movement control neurons in the fourth layer with two groups of connection weights to represent the encoded context knowledge and experience.
  • For the system with population coding neurons involved in to represent the visual field image and synthesize the gaze movement, if is the number of targets in a test set, the mean and the standard deviation of the target locating errors are formulized, respectively [see (21) and
  • According to the principle of the structural risk minimization [55], the problem can be transformed to find a system by minimizing the system’s structural risk where is the comprehensive error for the training set; is a function of and represents the system’s model complexity; is the number of coding neurons in the third layer, representing the number of visual context patterns encoded in the system; determines how many cells involved in to represent the visual field image and synthesize the gaze movement.
  • The system encodes all the visual contexts it encountered at the encoding stage and uses the first coding neurons with largest responses to represent a visual field image and synthesize a gaze movement at the decoding stage.
  • For determining the value of; the averaged means the average number of coding neurons participated in visual context encoding or decoding procedure; is the number of total coding neurons generated in the third layer, which represented the system’s model complexity; the system’s generalization error is evaluated by the comprehensive error described in Section IV-A.
  • A population cell coding mechanism for visual context learning and gaze movement controlling are presented.
结论
  • The authors' theoretical analysis and experimental results indicated that the population-cell coding system is generally more efficient than the single-cell coding system and the -NN-based coding system in representing the visual context and controlling the gaze motion for target searching.
  • Because this paper intends mainly to discuss the efficiency of visual context encoding and its top–down control for gaze movement in target search, some respects are not included in the current system.
表格
  • Table1: The key part of the algorithm is dynamically generating coding neurons. The coding neurons are linked by the feature neurons in the second layer and two movement control neurons in the fourth layer with two groups of connection weights to represent the encoded context knowledge and experience. ALGORITHM FOR VISUAL CONTEXT ENCODING
  • Table2: EXPERIMENTS: PERFORMANCES OF THREE CODING SYSTEMS FOR MULTITARGET SEARCH
  • Table3: ALGORITHM FOR GAZE MOVEMENT CONTROL
  • Table4: MODEL SELECTION: COMPLEXITY VERSUS GENERALIZATION ERROR
  • Table5: ALGORITHM FOR DETERMINING THE NUMBER OF POPULATION CODING NEURONS
Download tables as Excel
基金
  • This research was supported in part by the National Basic Research Program of China (2009CB320902), the Natural Science Foundation of China (60673091, 60702031, and 60970087), the Hi-Tech Research and Development Program of China (2006AA01Z122), the Beijing Natural Science Foundation (4072023 and 4102013), the Beijing Municipal Education Committee (KM200610005012), and the Beijing Municipal Foundation for Excellent Talents (20061D0501500211)
引用论文
  • G. Schneider, “Contrasting visuomotor functions of the tectum and cortex in the golden hamster,” Psychol. Forschung, vol. 31, no. 1, pp. 52–62, 1967.
    Google ScholarLocate open access versionFindings
  • R. Held, D. Ingle, G. Schneider, and C. Trevarthen, “Locating and identifying: Two modes of visual processing,” Psychol. Forschung, vol. 31, no. 1, pp. 42–43, 1967.
    Google ScholarLocate open access versionFindings
  • L. Ungerleider and M. Mishkin, “Two cortical visual systems,” in Analysis of Visual Behavior, D. J. Ingle, M. A. Goodale, and R. J. W. Mansfield, Eds. Cambridge, MA: MIT Press, 1982, pp. 549–586.
    Google ScholarFindings
  • L. Ungerleider and J. Haxby, “‘What’ and ‘where’ in the human brain,” Current Opinion Neurobiol., vol. 4, no. 2, pp. 157–165, 1994.
    Google ScholarLocate open access versionFindings
  • What and Where Pathways [Online]. Available: http://www.scholarpedia.org/article/What_and_where_pathways
    Findings
  • B. Velichkovsky, M. Joos, J. Helmert, and S. Pannasch, “Two visual systems and their eye movements: Evidence from static and dynamic scene perception,” in Proc. 27th Conf. Cogn. Sci. Soc., Stresa, Italy, Jul. 21–23, 2005, pp. 2283–2288.
    Google ScholarLocate open access versionFindings
  • N. Broadbent, L. Squire, and R. Clark, “Spatial memory, recognition memory, and the hippocampus,” in Proc. Nat. Acad. Sci. USA, 2004, vol. 11, pp. 14515–14520.
    Google ScholarLocate open access versionFindings
  • C. Siagian and L. Itti, “Biologically-inspired robotics vision montecarlo localization in the outdoor environment,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2007.
    Google ScholarLocate open access versionFindings
  • R. Mcpeek and E. Keller, “Saccade target selection in the Superior Colliculus during a visual search task,” J. Neurophysiol., vol. 88, no. 4, pp. 2019–2034, 2002.
    Google ScholarLocate open access versionFindings
  • G. Shepherd, Neurobiology, 2nd ed. London, U.K.: Oxford Univ. Press, 1988.
    Google ScholarFindings
  • A. Duchowski, Eye Tracking Methodology: Theory and Practice, 2nd ed. Berlin, Germany: Springer-Verlag, 2007.
    Google ScholarFindings
  • Z. Ji, J. Weng, and D. Prokhorov, “Where-what network 1: “Where” and “what” assist each other through top-down connections,” in Proc. 7th IEEE Int. Conf. Develop. Learn., Monterey, CA, 2008, pp. 61–66.
    Google ScholarLocate open access versionFindings
  • R. Peters and L. Itti, “Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Las Vegas, NV, Jun. 2007.
    Google ScholarLocate open access versionFindings
  • G. Zelinsky, W. Zhang, B. Yu, X. Chen, and D. Samaras, “The role of top-down and bottom-up processes in guiding eye movements during visual search,” in Proc. Adv. Neural Inform. Process. Syst., Vancouver, BC, Canada, 2006.
    Google ScholarLocate open access versionFindings
  • M. Cerf, J. Harel, W. Einhaeuser, and C. Koch, “Predicting human gaze using low-level saliency combined with face detection,” in Proc. Adv. Neural Inform. Process. Syst., Vancouver, BC, Canada, 2007.
    Google ScholarLocate open access versionFindings
  • R. Milanese, H. Wechsler, S. Gil, J. Bost, and T. Pun, “Integration of bottom-up and top-down cues for visual attention using non-linear relaxation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Hilton Head, SC, 1994, pp. 781–785.
    Google ScholarLocate open access versionFindings
  • J. Tsotsos, S. Culhane, W. Wai, Y. Lai, N. Davis, and F. Nuflo, “Modeling visual attention via selective tuning,” Artif. Intell., vol. 78, pp. 507–545, 1995.
    Google ScholarLocate open access versionFindings
  • V. Navalpakkam, J. Rebesco, and L. Itti, “Modeling the influence of task on attention,” Vis. Res., vol. 45, no. 2, pp. 205–231, 2005.
    Google ScholarLocate open access versionFindings
  • A. Torralba, “Contextual priming for object detection,” Int. J. Comput. Vis., vol. 53, no. 2, pp. 169–191, 2003.
    Google ScholarLocate open access versionFindings
  • A. Oliva, A. Torralba, M. Castelhano, and J. Henderson, “Top down control of visual attention in object detection,” in Proc. IEEE Int. Conf. Image Process., Barcelona, Spain, 2003, vol. I, pp. 429–432.
    Google ScholarLocate open access versionFindings
  • K. Murphy, A. Torralba, and W. Freeman, “Using the forest to see the trees: A graphical model relating features, objects, and scenes,” in Proc. Adv. Neural Inform. Process. Syst., Vancouver, BC, Canada, 2003.
    Google ScholarLocate open access versionFindings
  • A. Torralba, A. Oliva, M. Castelhano, and J. Henderson, “Contextual guidance of eye movements and attention in real-world scenes: The role of global features on objects search,” Psychol. Rev., vol. 113, 2006.
    Google ScholarLocate open access versionFindings
  • K. Ehinger, B. Hidalgo-Sotelo, A. Torralba, and A. Oliva, “Modelling search for people in 900 scenes: A combined source model of eye guidance,” Vis. Cogn., vol. 17, no. 6–7, pp. 945–978, 2009.
    Google ScholarLocate open access versionFindings
  • L. Paletta and C. Greindl, “Context based object detection from video,” in Proc. Int. Conf. Comput. Vis. Syst., Graz, Austria, 2003, pp. 502–512.
    Google ScholarLocate open access versionFindings
  • H. Kruppa, M. Santana, and B. Schiele, “Fast and robust face finding via local context,” in Proc. Joint IEEE Int. Workshop Vis. Surveillance Perform. Eval. Tracking Surveillance, Nice, France, 2003.
    Google ScholarLocate open access versionFindings
  • N. Bergboer, E. Postma, and H. van den Herik, “Context-based object detection in still images,” Image Vis. Comput., vol. 24, pp. 987–1000, 2006.
    Google ScholarLocate open access versionFindings
  • J. Miao, X. Chen, W. Gao, and Y. Chen, “A visual perceiving and eyeball-motion controlling neural network for object searching and locating,” in Proc. Int. Joint. Conf. Neural Netw., Vancouver, BC, Canada, 2006, pp. 4395–4400.
    Google ScholarLocate open access versionFindings
  • E. Osuna, R. Freund, and F. Girosi, “Training support vector machines: An application to face detection,” in Proc. Comput. Vis. Pattern Recog., San Juan, Puerto Rico, 1997, vol. 3, pp. 130–136.
    Google ScholarLocate open access versionFindings
  • H. Schneiderman and T. Kanade, “A statistical method for 3D object detection applied to faces and cars,” in Proc. Comput. Vis. Pattern Recog., Hilton Head, SC, 2000, vol. 1, pp. 746–751.
    Google ScholarLocate open access versionFindings
  • P. Viola and M. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, 2004.
    Google ScholarLocate open access versionFindings
  • X. Chen and A. Yuille, “Detecting and reading text in natural scenes,” in Proc. Comput. Vis. Pattern Recog., Washington, DC, 2004.
    Google ScholarLocate open access versionFindings
  • C. Garcia and M. Delakis, “Convolutional face finder: A neural architecture for fast and robust face detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1408–1423, Nov. 2004.
    Google ScholarLocate open access versionFindings
  • G. Malcolm and J. Henderson, “Combining top-down processes to guide eye movements during real-world scene search,” J. Vis., vol. 10, no. 2, pp. 1–11, 2010.
    Google ScholarLocate open access versionFindings
  • M. Chun and Y. Jiang, “Contextual cueing: Implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol., vol. 36, pp. 28–71, 1998.
    Google ScholarLocate open access versionFindings
  • M. Chun, “Contextual cueing of visual attention,” Trends Cogn. Sci., vol. 4, no. 5, pp. 170–178, 2000.
    Google ScholarLocate open access versionFindings
  • J. Henderson, P. Weeks, Jr., and A. Hollingworth, “The effects of semantic consistency on eye movements during complex scene viewing,” J. Exp. Psychol.: Human Perception Perform., vol. 25, no. 1, pp. 210–228, 1999.
    Google ScholarLocate open access versionFindings
  • M. Kunar, S. Flusberg, and J. Wolfe, “Contextual cueing by global features,” Perception Psychophys., vol. 68, no. 7, pp. 1204–1216, 2006.
    Google ScholarLocate open access versionFindings
  • J. Brockmole, M. Castelhano, and J. Henderson, “Contextual cueing in naturalistic scenes: Global and local context,” J. Exp. Psychol.: Learn. Memory and Cogn., vol. 32, no. 4, pp. 699–706, 2006.
    Google ScholarLocate open access versionFindings
  • J. Brockmole and J. Henderson, “Using real-world scenes as contextual cues for search,” Vis. Cogn., vol. 13, no. 1, pp. 99–108, 2006.
    Google ScholarLocate open access versionFindings
  • K. Chua and M. Chun, “Implicit scene learning is viewpoint dependent,” Perception Psychophys., vol. 65, no. 1, pp. 72–80, 2003.
    Google ScholarLocate open access versionFindings
  • M. Bear, B. Connors, and M. Paradiso, Neuroscience: Exploring the Brain, 2nd ed. New York: Lippincott Williams & Wilkins, 2001.
    Google ScholarFindings
  • “Special issue on binding problem,” Neuron, vol. 24, no. 1, 1999.
    Google ScholarLocate open access versionFindings
  • D. Wang, “The time dimension for scene analysis,” IEEE Trans. Neural Netw., vol. 16, no. 6, pp. 1401–1426, Jun. 2005.
    Google ScholarLocate open access versionFindings
  • J. Weng and W. Hwang, “From neural networks to the brain: Autonomous mental development,” IEEE Comput. Intell. Mag., vol. 1, no. 3, pp. 15–31, Aug. 2006.
    Google ScholarLocate open access versionFindings
  • M. Young and S. Yamane, “Sparse population coding of faces in the inferotemporal cortex,” Science, vol. 256, no. 1, pp. 1327–1330, 1992.
    Google ScholarLocate open access versionFindings
  • J. Weng and N. Zhang, “Optimal in-place learning and the lobe component analysis,” in Proc. Int. Joint Conf. Neural Netw., Vancouver, BC, Canada, 2006, pp. 3887–3894.
    Google ScholarLocate open access versionFindings
  • J. Weng, T. Luwang, H. Lu, and X. Xue, “Multilayer in-place learning networks for modeling functional layers in the laminar cortex,” Neural Netw., vol. 21, pp. 150–159, 2008.
    Google ScholarLocate open access versionFindings
  • A. Bell and T. Sejnowski, “The independent components of natural scenes are edge filters,” Vis. Res., vol. 37, no. 23, pp. 3327–3338, 1997.
    Google ScholarLocate open access versionFindings
  • S. Hornillo-Mellado, R. Martin-Clemente, C. Puntonet, and J. Gorriz, “Connections between ICA and sparse coding revisited,” Lecture Notes Comput. Sci., vol. 3512, pp. 1035–1042, 2005.
    Google ScholarLocate open access versionFindings
  • B. Olshausen and D. Field, “Sparse coding with an overcomplete basis set: A strategy employed by V1?,” Vis. Res., vol. 37, pp. 3313–3325, 1997.
    Google ScholarLocate open access versionFindings
  • A. Hyvarinen and P. Hoyer, “A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images,” Vis. Res., vol. 41, no. 18, pp. 2413–2423, 2002.
    Google ScholarLocate open access versionFindings
  • D. Lee and H. Seung, “Learning the parts of objects with nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
    Google ScholarLocate open access versionFindings
  • J. Weng and M. Luciw, “Dually optimal neuronal layers: Lobe component analysis,” IEEE Trans. Autonom. Mental Develop., vol. 1, no. 1, pp. 68–85, May 2009.
    Google ScholarLocate open access versionFindings
  • T. Ahonen, A. Hadid, and M. Pietikainen, “Face recognition with local binary patterns,” in Proc. 8th Eur. Conf. Comput. Vis., Prague, Czech Republic, 2004, vol. 3021, pp. 469–481.
    Google ScholarLocate open access versionFindings
  • V. Vapnik, The Nature of Statistical Learning Theory. New York: Wiley, 1995.
    Google ScholarFindings
  • T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001.
    Google ScholarFindings
  • The Face Database of the University of Bern [Online]. Available: ftp:// iamftp.unibe.ch/pub/Images/FaceImages/ 2008
    Google ScholarFindings
  • Between Bottom-Up and Top-Down What is “The Much In-Between”? Panel Session for IJCNN, 2010 [Online]. Available: http://www.cse.msu.edu/ei/IJCNN10panel
    Findings
  • F. Rosenblatt, “Perceptron simulation experiments,” in Proc. Inst. Radio Eng., 1960, vol. 48, pp. 301–309.
    Google ScholarLocate open access versionFindings
  • S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 1999.
    Google ScholarFindings
  • R. Caruana, “Multitask learning,” Mach. Learn., vol. 28, no. 1, pp. 41–75, 1997.
    Google ScholarLocate open access versionFindings
  • R. Raina, A. Ng, and D. Koller, “Constructing informative priors using transfer learning,” in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, 2006.
    Google ScholarLocate open access versionFindings
  • W. Dai, Y. Chen, G. Xue, Q. Yang, and Y. Yu, “Translated learning: Transfer learning across different feature spaces,” in Proc. Adv. Neural Inform. Process. Syst., Vancouver, BC, Canada, 2008, vol. 21. Baixian Zou received the M.Sc. degree in applied mathematics from South East University, Nanjing, China, in 1996, and the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2004.
    Google ScholarLocate open access versionFindings
  • Jun Miao (S’00–M’04) received the B.Sc. and M.Sc. degrees in computer science from Beijing University of Technology, Beijing, China, in 1993 and 1999, respectively. He received the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2005.
    Google ScholarLocate open access versionFindings
  • Laiyun Qing (S’03–M’09) received the B.Sc. and M.Sc. degrees in computer science from Northeastern University, Shenyang, China, in 1996 and 1999, respectively. She received the Ph.D. degree in computer science from Chinese Academy of Sciences, Beijing, in 2005.
    Google ScholarLocate open access versionFindings
  • Lijuan Duan (M’08) received the B.Sc. and M.Sc. degrees in computer science from Zhengzhou University of Technology, Zhengzhou, China, in 1995 and 1998, respectively. She received the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2003.
    Google ScholarLocate open access versionFindings
  • She is currently an Associated Professor at the College of Computer Science and Technology, Beijing University of Technology, China. Her research interests include artificial intelligence, image processing and machine vision, and information security. She has published more than 40 research articles in refereed journals and proceedings on image retrieval, neural oscillation, image segmentation, visual perception, and cognition.
    Google ScholarLocate open access versionFindings
  • Wen Gao (M’88–SM’05–F’09) received the B.Sc. and M.Sc. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 1985 and 1988, respectively. He received the Ph.D. degree in electronics engineering from the University of Tokyo, Tokyo, Japan, in 1991.
    Google ScholarFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn