AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Video structural description aims at parsing video content into the text information, which uses spatiotemporal segmentation, feature selection, object recognition, and semantic web technology

Video structural description technology for the new generation video surveillance systems

Frontiers of Computer Science, no. 6 (2015): 980.0-989.0

Cited: 38|Views37
EI

Abstract

The increasing need of video based applications issues the importance of parsing and organizing the content in videos. However, the accurate understanding and managing video contents at the semantic level is still insufficient. The semantic gap between low level features and high level semantics cannot be bridged by manual or semi-automat...More

Code:

Data:

0
Introduction
  • Recent research shows that videos “in the wild” are growing at a staggering rate [1,2].
  • A semantic based model for representing and organizing video resources is proposed for bridging the gap between low-level representative features and high-level semantic content in terms of object, event, and semantic relation extraction.
  • Video structural description (VSD) aims at parsing video content into the text information, which uses spatiotemporal segmentation, feature selection, object recognition, and semantic web technology.
Highlights
  • Recent research shows that videos “in the wild” are growing at a staggering rate [1,2]
  • Video structural description aims at parsing video content into the text information, which uses spatiotemporal segmentation [10], feature selection [11], object recognition [12], and semantic web technology [13,14,15,16]
  • A number of concepts including people, vehicle, and traffic sign is given, which can be used by users for annotating and representing video traffic events unambiguously
  • The spatial and temporal relations in event is proposed, which can be used by users for annotating and representing the semantic relations between objects in video traffic events
  • Video surveillance plays a key role in ensuring security at many institutions such as airports, banks, and casinos
Results
  • Different from the existing video content extraction and representation method, VSD uses the domain ontology including basic objects, events, and relations.
  • The spatial and temporal relations are defined, which can be used by users for annotating and representing the semantic relations between objects in video events.
  • Users can search, annotate, and browse the related video resources.
  • Attributes, spatial relations, temporal relations, and events are basic components of the proposed domain ontology framework.
  • In the VSD model, a wide-domain applicable video representation and annotation framework is proposed in order to model the semantic content in videos.
  • Based on the organized video resources layer, the related application is introduced.
  • A semantic based model named video structural description (VSD) for representing and organizing the content in videos is proposed.
  • Video structural description aims at parsing video content into the text information, which uses spatiotemporal segmentation, feature selection, object recognition, and semantic web technology.
  • 1) In this paper, a new model named video structural description (VSD) is proposed for bridging the gap between low-level features and high-level semantics of the contents in the videos.
  • The VSD model considers the objects, events, concepts, and semantic relations in the videos.
Conclusion
  • Besides the content extraction from the videos, the VSD model focuses on organizing videos resources based on their semantic relations.
  • The spatial and temporal relations in event is proposed, which can be used by users for annotating and representing the semantic relations between objects in video traffic events.
  • The major contributions of this paper are summarized as follows
Related work
  • The important issue in semantic content extraction from videos is the representation of the semantic content. Many researchers have studied this from different aspects. A simple representation method may be associated the video events with low level features (texture, shape, color, etc.) using frames or shots from videos. These simple methods do not use any relations between features such as spatial or temporal relations. Obviously, using spatial or temporal relations between objects in videos is important for achieving accurate extraction of events. Researches such as BilVideo [17], extended-AVIS [18], multiView [19] and classView [20] were conducted by using spatial and temporal relations but did not have ontology-based models for semantic content representation. Bai et al [21] presented a semantic based framework using domain ontology. Their work is used to represent video events with temporal description logic. However, the event extraction is manual and event descriptions only include temporal information. Nevatia et al [22] gave an ontology model using spatial temporal relations to extract complex events where the extraction process is manual. In Ref. [23], each defined concept is related to a corresponding visual concept with only temporal relations for soccer videos. Nevatia et al [22] built an event ontology for natural representation of complex spatial temporal events given simpler events. A video
Funding
  • This work was supported by the National Science and Technology Major Project (2013ZX01033002-003), the National High Technology Research and Development Program of China (863 Program) (2013AA014601, 2013AA014603), National Key Technology Support Program (2012BAH07B01), the National Natural Science Foundation of China (Grant No 61300202), and the Science Foundation of Shanghai (13ZR1452900)
Reference
  • Xu Z, Liu Y, Mei L, Hu C, Chen L. Semantic based representing and organizing surveillance big data using video structural description technology. Journal of Systems and Software, 2015, 102: 217–225
    Google ScholarLocate open access versionFindings
  • Hu C, Xu Z, Liu Y, Mei L, Chen L, Luo X. Semantic link networkbased model for organizing multimedia big data. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 376–387
    Google ScholarLocate open access versionFindings
  • Wu L, Wang Y. The process of criminal investigation based on grey hazy set. In: Proceedings of IEEE International Conference on System Man and Cybernetics. 2010, 26–28
    Google ScholarLocate open access versionFindings
  • Liu L, Li Z, Delp E J. Efficient and low-complexity surveillance video compression using backward-channel aware Wyner-Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology, 2009, 19(4): 452–465
    Google ScholarLocate open access versionFindings
  • Zhang J, Zulkernine M, Haque A. Random-forests-based network intrusion detection systems. IEEE Transactions on Systems, Man, and Cybernetics (Part C: Applications and Reviews), 2008, 38(5): 649–659
    Google ScholarLocate open access versionFindings
  • Yu H Q, Pedrinaci C, Dietze S, Domingue J. Using linked data to annotate and search educational video resources for supporting distance learning. IEEE Transactions on Learning Technologies, 2012, 5(2): 130–142
    Google ScholarLocate open access versionFindings
  • Xu C, Zhang Y F, Zhu G, Rui Y, Lu H, Huang Q. Using webcast text for semantic event detection in broadcast sports video. IEEE Transactions on Multimedia, 2008, 10(7): 1342–1355
    Google ScholarLocate open access versionFindings
  • Berners-Lee T, Hendler J, Lassila O. The semantic web. Scientific American, 2001, 284(5): 34–43
    Google ScholarLocate open access versionFindings
  • Ma H, Zhu J, Lyu M R T, King I. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia, 2010, 12(5): 462–473
    Google ScholarLocate open access versionFindings
  • Chen H T, Ahuja N. Exploiting nonlocal spatiotemporal structure for video segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 741–748
    Google ScholarLocate open access versionFindings
  • Javed K, Babri H, Saeed M. Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(3): 465–477
    Google ScholarLocate open access versionFindings
  • Choi M, Torralba A, Willsky A. A Tree-based context model for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(2): 240–252
    Google ScholarLocate open access versionFindings
  • Luo X, Xu Z, Yu J, Chen X. Building association link network for semantic link on web resources. IEEE transactions on automation science and engineering, 2011, 8(3): 482–494
    Google ScholarLocate open access versionFindings
  • Xu Z, Luo X, Wang L. Incremental building association link network. Computer Systems Science and Engineering, 2011, 26(3): 153–162
    Google ScholarLocate open access versionFindings
  • Liu Y, Zhu Y, Ni L M, Xue G. A reliability-oriented transmission service in wireless sensor networks. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(12): 2100–2107
    Google ScholarLocate open access versionFindings
  • Liu Y, Zhang Q, Ni L M. Opportunity-based topology control in wireless sensor networks. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(3): 405–416
    Google ScholarLocate open access versionFindings
  • Donderler M, Saykol E, Arslan U, Ulusoy O, Gudukbay U. Bilvideo: design and implementation of a video database management system. Multimedia Tools Applications, 2005, 27(1): 79–104
    Google ScholarLocate open access versionFindings
  • Sevilmis T, Bastan M, Gudukbay U, Ulusoy O. Automatic detection of salient objects and spatial relations in videos for a video database system. Image Vision Computing, 2008, 26(10): 1384–1396
    Google ScholarLocate open access versionFindings
  • Fan J, Aref W G, Elmagarmid A K, Hacid M S, Marzouk M S, Zhu X. Multiview: multilevel video content representation and retrieval. Journal of Electronic Imaging, 2001, 10(4): 895–908
    Google ScholarLocate open access versionFindings
  • Fan J, Elmagarmid A K, Zhu X, Aref W G, Wu L. Classview: hierarchical video shot classification, indexing, and accessing. IEEE Transactions on Multimedia, 2004, 6(1): 70–86
    Google ScholarLocate open access versionFindings
  • Bai L, Lao S, Jones G J, Smeaton A F. Video semantic content analysis based on ontology. In: Proceedings of the 11th International Machine Vision and Image Processing Conference. 2007, 117–124
    Google ScholarLocate open access versionFindings
  • Nevatia R, Natarajan P. EDF: a framework for semantic annotation of video. In: Proceedings of the 10th IEEE International Conference on Computer Vision Workshops. 2005, 1876
    Google ScholarLocate open access versionFindings
  • Bagdanov A D, Bertini M, Del Bimbo A, Torniai C, Serra G. Semantic annotation and retrieval of video events using multimedia ontologies. In: Proceedings of IEEE International Conference on Semantic Computing. 2007, 713–720
    Google ScholarLocate open access versionFindings
  • Francois A R, Nevatia R, Hobbs J, Bolles R, Smith J R. VERL: an ontology framework for representing and annotating video events. IEEE Multimedia, 2005, 12(4): 76–86
    Google ScholarLocate open access versionFindings
  • Akdemir U, Turaga P, Chellappa R. An ontology based approach for activity recognition from video. In: Proceedings of the ACM International Conference on Multimedia. 2008, 709–712
    Google ScholarLocate open access versionFindings
  • Marszalek M, Schmid C. Semantic hierarchies for visual object recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2007, 1–7
    Google ScholarLocate open access versionFindings
  • Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: a largescale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
    Google ScholarLocate open access versionFindings
  • Yao B, Yang X, Lin L, Lee M W, Zhu S C. I2t: image parsing to text description. Proceedings of the IEEE, 2010, 98(8): 1485–1508
    Google ScholarLocate open access versionFindings
  • Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8
    Google ScholarLocate open access versionFindings
  • Felzenszwalb P, Girshick R, McAllester D, Ramanan D. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627–1645
    Google ScholarLocate open access versionFindings
  • Felzenszwalb P F, Girshick R B, McAllester D. Cascade object detection with deformable part models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2010, 2241–2248
    Google ScholarLocate open access versionFindings
  • Chen N, Zhou Q Y, and Prasanna V. Understanding web image by object relation network. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 291–300
    Google ScholarLocate open access versionFindings
  • Kulkarni G, Premraj V, Dhar S, Li S, Choi Y, Berg A C, Berg T L. Baby talk: understanding and generating image descriptions. In: Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition. 2011 34. Qi G J, Aggarwal C, Huang T. Towards semantic knowledge propagation from text corpus to web images. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 297–306
    Google ScholarLocate open access versionFindings
  • Zheng Xu received his PhD from the School of Computing Engineering and Science, Shanghai University, China in 2007 and 2012, respectively. He is currently working in the Third Research Institute of the Ministry of Public Security and working for his postdoctoral in Tsinghua University, China. His current research interests include intelligent surveillance systems, big data, and crowdsourcing.
    Google ScholarFindings
  • Yunhuai Liu is a professor in the Third Research Institute of Ministry of Public Security, China. He received his PhD from Hong Kong University of Science and Technology (HKUST), China in 2008. His main research interests include wireless sensor networks, pervasive computing, and wireless network.
    Google ScholarFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn