AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Experimental results show that with 45 million parameters in a RBM and one million examples, the GPU-based implementation increases the speed of deep belief networks learning by a factor of up to 70, compared to a dual-core CPU implementation
Big Data Deep Learning: Challenges and Perspectives
IEEE Access, (2014): 514-525
Deep learning is currently an extremely active research area in machine learning and pattern recognition society. It has gained huge successes in a broad area of applications such as speech recognition, computer vision, and natural language processing. With the sheer size of data available today, big data brings big opportunities and tran...More
PPT (Upload PPT)
- INTRODUCTION Deep learning and Big
Data are two hottest trends in the rapidly growing digital world.
- A deep belief network (DBN) uses a deep architecture that is capable of learning feature representations from both the labeled and unlabeled data presented to it .
- Some algorithmic approaches have been explored for large-scale learning: for example, locally connected networks , , improved optimizers , and new structures that can be implemented in parallel .
- INTRODUCTION Deep learning and Big
Data are two hottest trends in the rapidly growing digital world
- A deep belief network (DBN) uses a deep architecture that is capable of learning feature representations from both the labeled and unlabeled data presented to it 
- Experimental results show that with 45 million parameters in a RBM and one million examples, the GPU-based implementation increases the speed of deep belief networks learning by a factor of up to 70, compared to a dual-core CPU implementation 
- Large-scale convolutional neural networks learning is often implemented on GPUs with several hundred parallel processing cores
- For parallelizing forward propagation, one or more blocks are assigned for each feature map depending on the size of maps 
- Each thread in a block is devoted to a single neuron in a map
- The use of great computing power to speed up the training process has shown significant potential in Big Data deep learning.
- A. LARGE-SCALE DEEP BELIEF NETWORKS Raina et al  proposed a GPU-based framework for massively parallelizing unsupervised learning models including DBNs and sparse coding .
- B. LARGE-SCALE CONVOLUTIONAL NEURAL NETWORKS CNN is a type of locally connected deep learning methods.
- C. COMBINATION OF DATA- AND MODEL-PARALLEL SCHEMES DistBelief is a software framework recently designed for distributed training and learning in deep networks with very large models and large-scale data sets.
- For large-scale data with high dimensionality, deep learning often involves many densely connected layers with a large number of free parameters.
- This very large scale deep learning system is capable of training with more than 11 billion parameters, which is the largest model reported by far, with much less machines.
- Data and models are divided into blocks that fit with in-memory data; the forward and backward propagations can be implemented effectively in parallel , , deep learning algorithms are not trivially parallel.
- To build the future deep learning system scalable to Big Data, one needs to develop high performance computing infrastructure-based systems together with theoretically sound parallel learning algorithms or novel architectures.
- Deep learning can leverage both high variety and velocity of Big Data by transfer learning or domain adaption, where training and test data may be sampled from different distributions –.
- Glorot et al implemented a stacked denoising auto-encoder based deep architecture for domain adaption, where one trains an unsupervised representation on a large number of unlabeled data from a set of domains, which is applied to train a classifier with few labeled examples from only one domain .
- Bengio applied deep learning of multiple level representations for transfer learning where training examples may not well represent test data .
- Big Data presents significant challenges to deep learning, including large scale, heterogeneity, noisy labels, and non-stationary distribution, among many others.
- Table1: Summary of recent research progress in large-scale deep learning
- National Security Agency. The National Security Agency: Missions, Authorities, Oversight and Partnerships [Online]. Available: http://www.nsa.gov/public_info/_files/speeches_testimonies/2013_08_09 _the_nsa_story.pdf
- J. Gantz and D. Reinsel, Extracting Value from Chaos. Hopkinton, MA, USA: EMC, Jun. 2011.
- J. Gantz and D. Reinsel, The Digital Universe Decade—Are You Ready. Hopkinton, MA, USA: EMC, May 2010.
- (2011, May). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute [Online]. Available: http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
- J. Lin and A. Kolcz, ‘‘Large-scale machine learning at twitter,’’ in Proc. ACM SIGMOD, Scottsdale, Arizona, USA, 2012, pp. 793–804.
- A. Smola and S. Narayanamurthy, ‘‘An architecture for parallel topic models,’’ Proc. VLDB Endowment, vol. 3, no. 1, pp. 703–710, 2010.
- A. Ng et al., ‘‘Map-reduce for machine learning on multicore,’’ in Proc. Adv. Neural Inf. Procees. Syst., vol. 19. 2006, pp. 281–288.
- B. Panda, J. Herbach, S. Basu, and R. Bayardo, ‘‘MapReduce and its application to massively parallel learning of decision tree ensembles,’’ in Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge, U.K.: Cambridge Univ. Press, 2012.
- E. Crego, G. Munoz, and F. Islam. (2013, Dec. 8). Big data and deep learning: Big deals or big delusions? Business [Online]. Available: http://www.huffingtonpost.com/george-munoz-frank-islamand-ed-crego/big-data-and-deep-learnin_b_3325352.html
- Y. Bengio and S. Bengio, ‘‘Modeling high-dimensional discrete data with multi-layer neural networks,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 12. 2000, pp. 400–406.
- Y. Marc’Aurelio Ranzato, L. Boureau, and Y. LeCun, ‘‘Sparse feature learning for deep belief networks,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 20. 2007, pp. 1185–1192.
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, ‘‘Context-dependent pretrained deep neural networks for large-vocabulary speech recognition,’’ IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30–41, Jan. 2012.
- G. Hinton et al., ‘‘Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,’’ IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.
- R. Salakhutdinov, A. Mnih, and G. Hinton, ‘‘Restricted Boltzmann machines for collaborative filtering,’’ in Proc. 24th Int. Conf. Mach. Learn., 2007, pp. 791–798.
- D. Cireşan, U. Meler, L. Cambardella, and J. Schmidhuber, ‘‘Deep, big, simple neural nets for handwritten digit recognition,’’ Neural Comput., vol. 22, no. 12, pp. 3207–3220, 2010.
- M. Zeiler, G. Taylor, and R. Fergus, ‘‘Adaptive deconvolutional networks for mid and high level feature learning,’’ in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 2018–2025.
- A. Efrati. (2013, Dec. 11). How ‘deep learning’ works at Apple, beyond. Information [Online]. Available: https://www.theinformation.com/How-Deep-Learning-Works-at-Apple-Beyond
- N. Jones, ‘‘Computer science: The learning machines,’’ Nature, vol. 505, no. 7482, pp. 146–148, 2014.
- Y. Wang, D. Yu, Y. Ju, and A. Acero, ‘‘Voice search,’’ in Language Understanding: Systems for Extracting Semantic Information From Speech, G. Tur and R. De Mori, Eds. New York, NY, USA: Wiley, 2011, ch. 5.
- J. Kirk. (2013, Oct. 1). Universities, IBM join forces to build a brain-like computer. PCWorld [Online]. Available: http://www.pcworld.com/article/2051501/universities-join-ibm-in-cognitive-computing-researchproject.html
- G. Hinton and R. Salakhutdinov, ‘‘Reducing the dimensionality of data with neural networks,’’ Science, vol. 313, no. 5786, pp. 504–507, 2006.
- Y. Bengio, ‘‘Learning deep architectures for AI,’’ Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009.
- V. Nair and G. Hinton, ‘‘3D object recongition with deep belief nets,’’ in Proc. Adv. NIPS, vol. 22. 2009, pp. 1339–1347.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learning applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, ‘‘Natural language processing almost from scratch,’’ J. Mach. Learn. Res., vol. 12, pp. 2493–2537, Nov. 2011.
- P. Le Callet, C. Viard-Gaudin, and D. Barba, ‘‘A convolutional neural network approach for objective video quality assessment,’’ IEEE Trans. Neural Netw., vol. 17, no. 5, pp. 1316–1327, Sep. 2006.
- D. Rumelhart, G. Hinton, and R. Williams, ‘‘Learning representations by back-propagating errors,’’ Nature, vol. 323, pp. 533–536, Oct. 1986.
- G. Hinton, ‘‘A practical guide to training restricted Boltzmann machines,’’ Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, Tech. Rep. UTML TR 2010-003, 2010.
- G. Hinton, S. Osindero, and Y. Teh, ‘‘A fast learning algorithm for deep belief nets,’’ Neural Comput., vol. 18, no. 7, pp. 1327–1554, 2006.
- Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, ‘‘Greedy layerwise training of deep networks,’’ in Proc. Neural Inf. Process. Syst., 2006, pp. 153–160.
- G. Hinton, ‘‘Training products of experts by minimizing contrastive divergence,’’ Neural Comput., vol. 14, no. 8, pp. 1771–1800, 2002.
- P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol ‘‘Extracting and composing robust features with denoising autoencoders,’’ in Proc. 25th Int. Conf. Mach. Learn., 2008, pp. 1096–1103.
- H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, ‘‘Exploring strategies for training deep neural networks,’’ J. Mach. Learn. Res., vol. 10, pp. 1–40, Jan. 2009.
- H. Lee, A. Battle, R. Raina, and A. Ng, ‘‘Efficient sparse coding algorithms,’’ in Proc. Neural Inf. Procees. Syst., 2006, pp. 801–808.
- F. Seide, G. Li, and D. Yu, ‘‘Conversational speech transcription using context-dependent deep neural networks,’’ in Proc. Interspeech, 2011, pp. 437–440.
- D. C. Cireşan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber, ‘‘Flexible, high performance convolutional neural networks for image classification,’’ in Proc. 22nd Int. Conf. Artif. Intell., 2011, pp. 1237–1242.
- D. Scherer, A. Müller, and S. Behnke, ‘‘Evaluation of pooling operations in convolutional architectures for object recognition,’’ in Proc. Int. Conf. Artif. Neural Netw., 2010, pp. 92–101.
- Y. LeCun, L. Bottou, G. Orr, and K. Muller, ‘‘Efficient backprop,’’ in Neural Networks: Tricks of the Trade, G. Orr and K. Muller, Eds. New York, NY, USA: Springer-Verlag, 1998.
- K. Kavukcuoglu, M. A. Ranzato, R. Fergus, and Y. LeCun, ‘‘Learning invariant features through topographic filter maps,’’ in Proc. Int. Conf. CVPR, 2009, pp. 1605–1612.
- D. Hubel and T. Wiesel, ‘‘Receptive fields and functional architecture of monkey striate cortex,’’ J. Physiol., vol. 195, pp. 215–243, Mar. 1968.
- R. Raina, A. Madhavan, and A. Ng, ‘‘Large-scale deep unsupervised learning using graphics processors,’’ in Proc. 26th Int. Conf. Mach. Learn., Montreal, QC, Canada, 2009, pp. 873–880.
- J. Martens, ‘‘Deep learning via Hessian-free optimization,’’ in Proc. 27th Int. Conf. Mach. Learn., 2010.
- K. Zhang and X. Chen, ‘‘Large-scale deep belief nets with MapReduce,’’ IEEE Access, vol. 2, pp. 395–403, Apr. 2014.
- L. Deng, D. Yu, and J. Platt, ‘‘Scalable stacking and learning for building deep architectures,’’ in Proc. IEEE ICASSP, Mar. 2012, pp. 2133–2136.
- B. Hutchinson, L. Deng, and D. Yu, ‘‘Tensor deep stacking networks,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1944–1957, Aug. 2013.
- V. Vanhoucke, A. Senior, and M. Mao, ‘‘Improving the speed of neural networks on CPUs,’’ in Proc. Deep Learn. Unsupervised Feature Learn. Workshop, 2011.
- A. Krizhevsky, ‘‘Learning multiple layers of features from tiny images,’’ Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009.
- C. Farabet et al., ‘‘Large-scale FPGA-based convolutional networks,’’ in Machine Learning on Very Large Data Sets, R. Bekkerman, M. Bilenko, and J. Langford, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2011.
- CUDA C Programming Guide, PG-02829-001_v5.5, NVIDIA Corporation, Santa Clara, CA, USA, Jul. 2013.
- Q. Le et al., ‘‘Building high-level features using large scale unsupervised learning,’’ in Proc. Int. Conf. Mach. Learn., 2012.
- M. Ranzato and M. Szummer, ‘‘Semi-supervised learning of compact document representations with deep networks,’’ in Proc. Int. Conf. Mach. Learn., 2008, pp. 792–799.
- S. Geman and D. Geman, ‘‘Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 6, no. 6, pp. 721–741, Nov. 1984.
- G. Casella and E. George, ‘‘Explaining the Gibbs sampler,’’ Amer. Statist., vol. 46, no. 3, pp. 167–174, 1992.
- P. Simard, D. Steinkraus, and J. Platt, ‘‘Best practices for convolutional neural networks applied to visual document analysis,’’ in Proc. 7th ICDAR, 2003, pp. 958–963.
- A. Krizhevsky, I. Sutskever, and G. Hinton, ‘‘ImageNet classification with deep convolutional neural networks,’’ in Proc. Adv. NIPS, 2012, pp. 1106–1114.
- J. Dean et al., ‘‘Large scale distributed deep networks,’’ in Proc. Adv. NIPS, 2012, pp. 1232–1240.
- J. Duchi, E. Hazan, and Y. Singer, ‘‘Adaptive subgradient methods for online learning and stochastic optimization,’’ J. Mach. Learn. Res., vol. 12, pp. 2121–2159, Jul. 2011.
- A. Coats, B. Huval, T. Wng, D. Wu, and A. Wu, ‘‘Deep Learning with COTS HPS systems,’’ J. Mach. Learn. Res., vol. 28, no. 3, pp. 1337–1345, 2013.
- S. Tomov, R. Nath, P. Du, and J. Dongarra. (2011). MAGMA users guide. ICL, Univ. Tennessee, Knoxville, TN, USA [Online]. Available: http://icl.cs.utk.edu/magma
- (2012). Obama Administration Unveils ‘Big Data’ Initiative Announces 200 Million in New R&D Investments. Office of Science and Technology Policy, Executive Office of the President, Washington, DC, USA [Online]. Available: http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf
- K. Haberlin, B. McGilpin, and C. Ouellette. Governor Patrick Announces New Initiative to Strengthen Massachusetts’ Position as a World Leader in Big Data. Commonwealth of Massachusetts [Online]. Available: http://www.mass.gov/governor/pressoffice/pressreleases/2012/2012530-governor-announces-big-data-initiative.html
- Fact Sheet: Brain Initiative, Office of the Press Secretary, The White House, Washington, DC, USA, 2013.
- D. Laney, The Importance of ‘Big Data’: A Definition. Stamford, CT, USA: Gartner, 2012.
- A. Torralba, R. Fergus, and W. Freeman, ‘‘80 million tiny images: A large data set for nonparametric object and scene recognition,’’ IEEE Trans. Softw. Eng., vol. 30, no. 11, pp. 1958–1970, Nov. 2008.
- J. Wang and X. Shen, ‘‘Large margin semi-supervised learning,’’ J. Mach. Learn. Res., vol. 8, no. 8, pp. 1867–1891, 2007.
- J. Weston, F. Ratle, and R. Collobert, ‘‘Deep learning via semi-supervised embedding,’’ in Proc. 25th Int. Conf. Mach. Learn., Helsinki, Finland, 2008.
- K. Sinha and M. Belkin, ‘‘Semi-supervised learning using sparse eigenfunction bases,’’ in Proc. Adv. NIPS, 2009, pp. 1687–1695.
- R. Fergus, Y. Weiss, and A. Torralba, ‘‘Semi-supervised learning in gigantic image collections,’’ in Proc. Adv. NIPS, 2009, pp. 522–530.
- J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Ng, ‘‘Multimodal deep learning,’’ in Proc. 28th Int. Conf. Mach. Learn., Bellevue, WA, USA, 2011.
- N. Srivastava and R. Salakhutdinov, ‘‘Multimodal learning with deep Boltzmann machines,’’ in Proc. Adv. NIPS, 2012.
- L. Bottou, ‘‘Online algorithms and stochastic approximations,’’ in On-Line Learning in Neural Networks, D. Saad, Ed. Cambridge, U.K.: Cambridge Univ. Press, 1998.
- A. Blum and C. Burch, ‘‘On-line learning and the metrical task system problem,’’ in Proc. 10th Annu. Conf. Comput. Learn. Theory, 1997, pp. 45–53.
- N. Cesa-Bianchi, Y. Freund, D. Helmbold, and M. Warmuth, ‘‘On-line prediction and conversation strategies,’’ in Proc. Conf. Comput. Learn. Theory Eurocolt, vol.
- 53. Oxford, U.K., 1994, pp. 205–216.
-  Y. Freund and R. Schapire, ‘‘Game theory, on-line prediction and boosting,’’ in Proc. 9th Annu. Conf. Comput. Learn. Theory, 1996, pp. 325–332.
-  N. Littlestone, P. M. Long, and M. K. Warmuth, ‘‘On-line learning of linear functions,’’ in Proc. 23rd Symp. Theory Comput., 1991, pp. 465–475.
-  S. Shalev-Shwartz, ‘‘Online learning and online convex optimization,’’ Found. Trends Mach. Learn., vol. 4, no. 2, pp. 107–194, 2012.
-  T. M. Heskes and B. Kappen, ‘‘On-line learning processes in artificial neural networks,’’ North-Holland Math. Library, vol. 51, pp. 199–233, 1993.
-  R. Marti and A. El-Fallahi, ‘‘Multilayer neural networks: An experimental evaluation of on-line training methods,’’ Comput. Operat. Res., vol. 31, no. 9, pp. 1491–1513, 2004.
-  C. P. Lim and R. F. Harrison, ‘‘Online pattern classification with multiple neural network systems: An experimental study,’’ IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 33, no. 2, pp. 235–247, May 2003.
-  M. Rattray and D. Saad, ‘‘Globally optimal on-line learning rules for multi-layer neural networks,’’ J. Phys. A, Math. General, vol. 30, no. 22, pp. L771–776, 1997.
-  P. Riegler and M. Biehl, ‘‘On-line backpropagation in two-layered neural networks,’’ J. Phys. A, vol. 28, no. 20, pp. L507–L513, 1995.
-  D. Saad and S. Solla, ‘‘Exact solution for on-line learning in multilayer neural networks,’’ Phys. Rev. Lett., vol. 74, no. 21, pp. 4337–4340, 1995.
-  A. West and D. Saad, ‘‘On-line learning with adaptive back-propagation in two-layer networks,’’ Phys. Rev. E, vol. 56, no. 3, pp. 3426–3445, 1997.
-  P. Campolucci, A. Uncini, F. Piazza, and B. Rao, ‘‘On-line learning algorithms for locally recurrent neural networks,’’ IEEE Trans. Neural Netw., vol. 10, no. 2, pp. 253–271, Mar. 1999.
-  N. Liang, G. Huang, P. Saratchandran, and N. Sundararajan, ‘‘A fast and accurate online sequential learning algorithm for feedforward networks,’’ IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1411–1423, Nov. 2006.
-  V. Ruiz de Angulo and C. Torras, ‘‘On-line learning with minimal degradation in feedforward networks,’’ IEEE Trans. Neural Netw., vol. 6, no. 3, pp. 657–668, May 1995.
-  M. Choy, D. Srinivasan, and R. Cheu, ‘‘Neural networks for continuous online learning and control,’’ IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1511–1531, Nov. 2006.
-  L. Bottou and O. Bousequet, ‘‘Stochastic gradient learning in neural networks,’’ in Proc. Neuro-Nimes, 1991.
-  S. Shalev-Shwartz, Y. Singer, and N. Srebro, ‘‘Pegasos: Primal estimated sub-gradient solver for SVM,’’ in Proc. Int. Conf. Mach. Learn., 2007.
-  J. Chien and H. Hsieh, ‘‘Nonstationary source separation using sequential and variational Bayesian learning,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 5, pp. 681–694, May 2013.
-  M. Sugiyama and M. Kawanabe, Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation. Cambridge, MA, USA: MIT Press, Mar. 2012.
-  R. Elwell and R. Polikar, ‘‘Incremental learning in nonstationary environments with controlled forgetting,’’ in Proc. Int. Joint Conf. Neural Netw., 2009, pp. 771–778.
-  R. Elwell and R. Polikar, ‘‘Incremental learning of concept drift in nonstationary environments,’’ IEEE Trans. Neural Netw., vol. 22, no. 10, pp. 1517–1531, Oct. 2011.
-  C. Alippi and M. Roveru, ‘‘Just-in-time adaptive classifiers—Part I: Detecting nonstationary changes,’’ IEEE Trans. Neural Netw., vol. 19, no. 7, pp. 1145–1153, Jul. 2008.
-  C. Alippi and M. Roveru, ‘‘Just-in-time adaptive classifiers—Part II: Designing the classifier,’’ IEEE Trans. Neural Netw., vol. 19, no. 12, pp. 2053–2064, Dec. 2008.
-  L. Rutkowski, ‘‘Adaptive probabilistic neural networks for pattern classification in time-varying environment,’’ IEEE Trans. Neural Netw., vol. 15, no. 4, pp. 811–827, Jul. 2004.
-  W. de Oliveira, ‘‘The Rosenblatt Bayesian algorithm learning in a nonstationary environment,’’ IEEE Trans. Neural Netw., vol. 18, no. 2, pp. 584–588, Mar. 2007.
-  P. Bartlett, ‘‘Optimal online prediction in adversarial environments,’’ in Proc. 13th Int. Conf. DS, 2010, p. 371.
-  Y. Bengio, ‘‘Deep learning of representations for unsupervised and transfer learning,’’ J. Mach. Learn. Res., vol. 27, pp. 17–37, 2012.
-  X. Glorot, A. Bordes, and Y. Bengio, ‘‘Domain adaptation for large-scale sentiment classification: A deep learning approach,’’ in Proc. 28th Int. Conf. Mach. Learn., Bellevue, WA, USA, 2011.
-  G. Mesnil et al., ‘‘Unsupervised and transfer learning challenge: A deep learning approach,’’ J. Mach. Learn. Res., vol. 7, pp. 1–15, 2011.
-  S. J. Pan and Q. Yang, ‘‘A survey on transfer learning,’’ IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
-  S. Gutstein, O. Fuentes, and E. Freudenthal, ‘‘Knowledge transfer in deep convolutional neural nets,’’ Int. J. Artif. Intell. Tools, vol. 17, no. 3, pp. 555–567, 2008.
-  A. Blum and T. Mitchell, ‘‘Combining labeled and unlabeled data with co-training,’’ in Proc. 11th Annu. Conf. Comput. Learn. Theory, 1998, pp. 92–100.
-  R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, ‘‘Self-taught learning: Transfer learning from unlabeled data,’’ in Proc. 24th ICML, 2007.
-  S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, ‘‘Domain adaptation via transfer component analysis,’’ IEEE Trans. Neural Netw., vol. 22, no. 2, pp. 199–210, Feb. 2011.
-  G. Mesnil, S. Rifai, A. Bordes, X. Glorot, Y. Bengio, and P. Vincent, ‘‘Unsupervised and transfer learning under uncertainty: From object detections to scene categorization,’’ in Proc. ICPRAM, 2013, pp. 345–354. XUE-WEN CHEN (M’00–SM’03) is currently a Professor and the Chair with the Department of Computer Science, Wayne State University, Detroit, MI, USA. He received the Ph.D. degree from Carnegie Mellon University, Pittsburgh, PA, USA, in 2001. He is currently serving as an Associate Editor or an Editorial Board Member for several international journals, including IEEE ACCESS, BMC Systems Biology, and the IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE. He served as a Conference Chair or Program Chair for a number of conferences such as the 21st ACM Conference on Information and Knowledge Management in 2012 and the 10th IEEE International Conference on Machine Learning and Applications in 2011. He is a Senior Member of the IEEE Computer Society.
- XIAOTONG LIN is currently a Visiting Assistant Professor with the Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA. She received the Ph.D. degree from the University of Kansas, Lawrence, KS, USA, in 2012, and the M.Sc. degree from the University of Pittsburgh, Pittsburgh, PA, USA, in 1999. Her research interests include large scale machine learning, data mining, high-performance computing, and bioinformatics.