AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We propose a novel Deep Multimodal Hashing with Orthogonal Regularization for performing similarity search on multimodal data

Deep multimodal hashing with orthogonal regularization

IJCAI, pp.2291-2297, (2015)

被引用91|浏览131
EI
下载 PDF 全文
引用
微博一下

摘要

Hashing is an important method for performing efficient similarity search. With the explosive growth of multimodal data, how to learn hashing-based compact representations for multimodal data becomes highly non-trivial. Compared with shallow-structured models, deep models present superiority in capturing multimodal correlations due to the...更多

代码

数据

0
简介
  • People have generated huge volumes of multimodal contents on the Internet, such as texts, videos and images.
  • Recommendation systems aim to find preferred multimodal items for users, and image search systems aim to search images for text queries.
  • Among these important applications, multimodal search, which integrates multimodal information for similarity search is a fundamental problem.
  • The fundamental problem of multimodal hashing is to capture the correlation of multiple modalities to learn compact binary hash codes.
  • It is difficult for shallow models to learn such a high-level correlation [Bengio, 2009]
重点内容
  • Nowadays, people have generated huge volumes of multimodal contents on the Internet, such as texts, videos and images
  • To address the above problems, we propose a Deep Multimodal Hashing model with Orthogonal Regularization for mapping multimodal data into a common hamming space
  • As the correlation of multiple modalities exists in the highlevel space, we propose a multimodal Deep Belief Network as shown in Figure 1(a)
  • The result shows that Deep Multimodal Hashing with Orthogonal Regularization outperforms Crossmodality AE, which demonstrates that removing the redundancy from hashing is critical in improving the per
  • We propose a novel Deep Multimodal Hashing with Orthogonal Regularization (DMHOR) for performing similarity search on multimodal data
  • Experimental results demonstrate a substantial gain of our method compared with state-of-the-art on two widely used public datasets
方法
  • WIKI 8 bit 12 bit 16 bit 20 bit 32 bit.
  • NUS-WIDE 8 bit 12 bit 16 bit 20 bit DMHOR Bimodal-DBN DMVH CHMIS CVH PDH.
  • MAP formance, and the proposed orthogonal regularization method can well address the redundancy problem.
  • When the length of codes increases, the performance of DMHOR improves significantly than other baseline methods improve.
  • The reason is that the methods well reduce redundant information.
  • The authors can make use of the increasing bits to represent more useful information
结果
  • Table 3 shows the MAP when the authors vary the number of hashing bits in {8, 12, 16, 20, 32} in both dataset.

    From these comparison results, some observations and analysis are included as follows:

    The deep-structured models can consistently and obviously outperform other shallow-structured models, which demonstrates that the multimodal correlations cannot be well captured by shallow-structured model, and deep models have much merit in this aspect due to its intrinsic nonlinearity.

    The result shows that DMHOR outperforms Crossmodality AE, which demonstrates that removing the redundancy from hashing is critical in improving the per-.
  • Table 3 shows the MAP when the authors vary the number of hashing bits in {8, 12, 16, 20, 32} in both dataset.
  • From these comparison results, some observations and analysis are included as follows:.
  • The result shows that DMHOR outperforms Crossmodality AE, which demonstrates that removing the redundancy from hashing is critical in improving the per-
结论
  • The authors propose a novel Deep Multimodal Hashing with Orthogonal Regularization (DMHOR) for performing similarity search on multimodal data.
  • The proposed model with orthogonal regularization solves the redundancy problem.
  • The authors' strategy of applying different numbers of layers to different modalities makes a more precise representation and more compact learning process.
  • Experimental results demonstrate a substantial gain of the method compared with state-of-the-art on two widely used public datasets.
  • The authors' future work will aim at automatically determining the ideal number of layers for different modalities
表格
  • Table1: Terms and Notations
  • Table2: Number of units on NUS-WIDE and WIKI
  • Table3: MAP on WIKI and NUS-WIDE with varying length of hash codes
  • Table4: MAP for WIKI with varying orthogonality constraints with 16-bit codes
Download tables as Excel
相关工作
  • In recent years hashing methods have experienced great success in many real-world applications because of their superiority in searching efficiency and storage requirements. In general, there are mainly two different ways for hashing to generate hash codes: data-independent and data-dependent ways.

    Data-independent hashing methods often generate random projections as hash functions. Locality Sensitive Hashing (LSH) [Datar et al, 2004] is one of the most well-known representative. It uses a set of random locality sensitive hashing functions to map examples to hash codes. Further improvements such as multi-probe LSH [Lv et al, 2007] are proposed but the performance is still limited by the random projection technique. Data-dependent hashing methods were then proposed. They use machine learning to utilize the distribution of data to help improve the retrieval quality. Spectral Hashing [Weiss et al, 2008] is a representative. Then some other hashing methods are proposed, including shallow structured methods [Norouzi and Blei, 2011; Liu et al, 2012; Wang et al, 2010] and deep learning based methods [Salakhutdinov and Hinton, 2009; Xia et al, 2014].
基金
  • This work was supported in part by the National Basic Research Program of China under Grant No 2015CB352300; National Natural Science Foundation of China, No 61370022 and No 61210008
  • Thanks for the support of NExT Research Center funded by MDA, Singapore, under the research grant, WBS:R-252-300-001-490
引用论文
  • [Bengio, 2009] Yoshua Bengio. Learning deep architectures for ai. Foundations and trends R in Machine Learning, 2(1):1–127, 2009.
    Google ScholarLocate open access versionFindings
  • [Blei et al., 2003] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.
    Google ScholarLocate open access versionFindings
  • [Bronstein et al., 2010] Michael M Bronstein, Alexander M Bronstein, Fabrice Michel, and Nikos Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594–3601. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • [Chua et al., 2009] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. Nus-wide: a realworld web image database from national university of singapore. In CIVR, page 48. ACM, 2009.
    Google ScholarLocate open access versionFindings
  • [Datar et al., 2004] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SOCG, pages 253–262. ACM, 2004.
    Google ScholarLocate open access versionFindings
  • [Erhan et al., 2010] Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why does unsupervised pre-training help deep learning? JMLR, 11:625–660, 2010.
    Google ScholarLocate open access versionFindings
  • [Feng et al., 2014] Fangxiang Feng, Xiaojie Wang, and Ruifan Li. Cross-modal retrieval with correspondence autoencoder. In ACM MM, pages 7–16. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • [Hinton et al., 2006] Geoffrey E Hinton, Simon Osindero, and YeeWhye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.
    Google ScholarLocate open access versionFindings
  • [Hinton et al., 2012] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
    Findings
  • [Hinton, 2002] Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
    Google ScholarLocate open access versionFindings
  • [Kang et al., 2012] Yoonseop Kang, Saehoon Kim, and Seungjin Choi. Deep learning to hash with multiple representations. In ICDM, pages 930–935, 2012.
    Google ScholarLocate open access versionFindings
  • [Kumar and Udupa, 2011] Shaishav Kumar and Raghavendra Udupa. Learning hash functions for cross-view similarity search. In IJCAI, volume 22, page 1360, 2011.
    Google ScholarLocate open access versionFindings
  • [Liu et al., 2012] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. Supervised hashing with kernels. In CVPR, pages 2074–2081. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • [Lowe, 1999] David G Lowe. Object recognition from local scaleinvariant features. In ICCV, volume 2, pages 1150–1157.
    Google ScholarLocate open access versionFindings
  • [Lv et al., 2007] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, pages 950–961, 2007.
    Google ScholarLocate open access versionFindings
  • [Ngiam et al., 2011a] Jiquan Ngiam, Adam Coates, Ahbik Lahiri, Bobby Prochnow, Quoc V Le, and Andrew Y Ng. On optimization methods for deep learning. In ICML, pages 265–272, 2011.
    Google ScholarLocate open access versionFindings
  • [Ngiam et al., 2011b] Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. Multimodal deep learning. In ICML, pages 689–696, 2011.
    Google ScholarLocate open access versionFindings
  • [Norouzi and Blei, 2011] Mohammad Norouzi and David M Blei. Minimal loss hashing for compact binary codes. In ICML, pages 353–360, 2011.
    Google ScholarLocate open access versionFindings
  • [Ou et al., 2013] Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, and Shiqiang Yang. Comparing apples to oranges: a scalable solution with heterogeneous hashing. In SIGKDD, pages 230–238. ACM, 2013.
    Google ScholarLocate open access versionFindings
  • [Ou et al., 2015] Mingdong Ou, Peng Cui, Jun Wang, Fei Wang, and Wenwu Zhu. Probabilistic attributed hashing. In AAAI, 2015.
    Google ScholarLocate open access versionFindings
  • [Rasiwasia et al., 2010] Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM MM, pages 251–260. ACM, 2010.
    Google ScholarLocate open access versionFindings
  • [Rastegari et al., 2013] Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. Predictable dualview hashing. In ICML, pages 1328–1336, 2013.
    Google ScholarLocate open access versionFindings
  • [Salakhutdinov and Hinton, 2009] Ruslan Salakhutdinov and Geoffrey Hinton. Semantic hashing. IJAR, 50(7):969–978, 2009.
    Google ScholarLocate open access versionFindings
  • [Song et al., 2013] Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD, pages 785–796. ACM, 2013.
    Google ScholarLocate open access versionFindings
  • [Srivastava and Salakhutdinov, 2012a] Nitish Srivastava and Ruslan Salakhutdinov. Learning representations for multimodal data with deep belief nets. In ICML Workshop, 2012.
    Google ScholarLocate open access versionFindings
  • [Srivastava and Salakhutdinov, 2012b] Nitish Srivastava and Ruslan Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2231–2239, 2012.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2010] Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424–3431. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2014a] Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment, 7(8), 2014.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2014b] Zhiyu Wang, Peng Cui, Fangtao Li, Edward Chang, and Shiqiang Yang. A data-driven study of image feature extraction and fusion. Information Sciences, 281:536–558, 2014.
    Google ScholarLocate open access versionFindings
  • [Weiss et al., 2008] Yair Weiss, Antonio Torralba, and Robert Fergus. Spectral hashing. In NIPS, volume 9, page 6, 2008.
    Google ScholarLocate open access versionFindings
  • [Welling et al., 2004] Max Welling, Michal Rosen-Zvi, and Geoffrey E Hinton. Exponential family harmoniums with an application to information retrieval. In NIPS, pages 1481–1488, 2004.
    Google ScholarLocate open access versionFindings
  • [Xia et al., 2014] Rongkai Xia, Yan Pan, Hanjiang Lai, Cong Liu, and Shuicheng Yan. Supervised hashing for image retrieval via image representation learning. In AAAI, 2014.
    Google ScholarLocate open access versionFindings
  • [Zhai et al., 2013] Deming Zhai, Hong Chang, Yi Zhen, Xianming Liu, Xilin Chen, and Wen Gao. Parametric local multimodal hashing for cross-view similarity search. In IJCAI, 2013.
    Google ScholarFindings
  • [Zhang and Li, 2014] Dongqing Zhang and Wu-Jun Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In AAAI, 2014.
    Google ScholarFindings
  • [Zhang et al., 2011] Dan Zhang, Fei Wang, and Luo Si. Composite hashing with multiple information sources. In SIGIR, pages 225– 234. ACM, 2011.
    Google ScholarLocate open access versionFindings
  • [Zhen and Yeung, 2012] Yi Zhen and Dit-Yan Yeung. A probabilistic model for multimodal hash function learning. In SIGKDD, pages 940–948. ACM, 2012.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科