Label-Free Supervision of Neural Networks with Physics and Domain Knowledge

Russell Stewart
Russell Stewart

national conference on artificial intelligence, 2017.

Cited by: 170|Bibtex|Views135
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We have introduced a new method for using physics and other domain constraints to supervise neural networks

Abstract:

In many machine learning applications, labeled data is scarce and obtaining more labels is expensive. We introduce a new approach to supervising neural networks by specifying constraints that should hold over the output space, rather than direct examples of input-output pairs. These constraints are derived from prior domain knowledge, e.g...More

Code:

Data:

0
Introduction
  • Applications of machine learning are often encumbered by the need for large amounts of labeled training data.
  • Neural networks have made large amounts of labeled data even more crucial to success (Krizhevsky, Sutskever, and Hinton 2012; LeCun, Bengio, and Hinton 2015).
  • The authors observe that humans are often able to learn without direct examples, opting instead for high level instructions for how a task should be performed, or what it will look like when completed.
  • The authors ask whether a similar principle can be applied to teaching machines; can the authors supervise networks without individual examples by instead describing only the structure of desired outputs?
Highlights
  • Applications of machine learning are often encumbered by the need for large amounts of labeled training data
  • We observe that humans are often able to learn without direct examples, opting instead for high level instructions for how a task should be performed, or what it will look like when completed
  • We explored options for incorporating constraints pertaining to dynamics equations in real world phenomena, i.e., prior knowledge derived from elementary physics
  • We presented a new strategy for incorporating domain knowledge in three computer vision tasks
  • We have introduced a new method for using physics and other domain constraints to supervise neural networks
Methods
  • The goal of the method is to train a network, f , mapping from inputs to outputs that the authors care about, without needing direct examples of those outputs.
  • In the first two experiments, the authors construct a mapping from an image to the location of an object it contains.
  • Learning is made possible by exploiting structure that holds in images over time.
  • The authors map an image to two boolean variables describing whether or not the image contains two special objects.
  • Learning exploits the unique causal semantics existing between these objects.
  • The authors provide labels only for the purpose of evaluation
Results
  • The authors manually labeled the height of the falling objects in pixel space.
  • The authors trained a supervised network on the labels to directly predict the height of the object in pixels.
  • This network achieved a correlation of 94.5%, this task is somewhat easier as it does not require the network to compensate for the object’s distance from the camera.
  • The experiment shows that sophisticated sufficiency conditions can be key to success when learning from constraints
Conclusion
  • The authors have introduced a new method for using physics and other domain constraints to supervise neural networks.
  • Future challenges include extending these results to larger data sets with multiple objects per image, and simplifying the process of picking sufficiency terms for new and interesting problems.
  • By freeing the operator from collecting labels, the small scale experiments show promise for the future of training neural networks with weak supervision
Summary
  • Introduction:

    Applications of machine learning are often encumbered by the need for large amounts of labeled training data.
  • Neural networks have made large amounts of labeled data even more crucial to success (Krizhevsky, Sutskever, and Hinton 2012; LeCun, Bengio, and Hinton 2015).
  • The authors observe that humans are often able to learn without direct examples, opting instead for high level instructions for how a task should be performed, or what it will look like when completed.
  • The authors ask whether a similar principle can be applied to teaching machines; can the authors supervise networks without individual examples by instead describing only the structure of desired outputs?
  • Methods:

    The goal of the method is to train a network, f , mapping from inputs to outputs that the authors care about, without needing direct examples of those outputs.
  • In the first two experiments, the authors construct a mapping from an image to the location of an object it contains.
  • Learning is made possible by exploiting structure that holds in images over time.
  • The authors map an image to two boolean variables describing whether or not the image contains two special objects.
  • Learning exploits the unique causal semantics existing between these objects.
  • The authors provide labels only for the purpose of evaluation
  • Results:

    The authors manually labeled the height of the falling objects in pixel space.
  • The authors trained a supervised network on the labels to directly predict the height of the object in pixels.
  • This network achieved a correlation of 94.5%, this task is somewhat easier as it does not require the network to compensate for the object’s distance from the camera.
  • The experiment shows that sophisticated sufficiency conditions can be key to success when learning from constraints
  • Conclusion:

    The authors have introduced a new method for using physics and other domain constraints to supervise neural networks.
  • Future challenges include extending these results to larger data sets with multiple objects per image, and simplifying the process of picking sufficiency terms for new and interesting problems.
  • By freeing the operator from collecting labels, the small scale experiments show promise for the future of training neural networks with weak supervision
Related work
  • In this work, we presented a new strategy for incorporating domain knowledge in three computer vision tasks. The networks in our experiments learn without labels by exploiting high level instructions in the form of constraints.

    Constraint learning is a generalization of supervised learning that allows for more creative methods of supervision. For example, multiple-instance learning as proposed by (Dietterich, Lathrop, and Lozano-Perez 1997; Zhou and Xu 2007) allows for more efficient labeling by providing annotations over groups of images and learning to predict properties that hold over at least one input in a group, rather than providing individual labels. In rank learning, labels may given as orderings between inputs with the objective being to find an embedding of inputs that respects the ordering relation (Joachims 2002). Inductive logic programming approaches rely on logical formalisms and constraints to represent background knowledge and learn hypotheses from data (Muggleton and De Raedt 1994; De Raedt 2008; De Raedt and Kersting 2003). Various types of constraints have also been used extensively to guide unsupervised learning algorithms, such as clustering and dimensionality reduction techniques (Lee and Seung 2001; Basu, Davidson, and
Funding
  • This work was supported by a grant from the SAIL-Toyota Center for AI Research
Reference
  • Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G. S.; Davis, A.; Dean, J.; Devin, M.; et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
    Findings
  • Basu, S.; Davidson, I.; and Wagstaff, K. 2008. Constrained clustering: Advances in algorithms, theory, and applications. CRC Press.
    Google ScholarFindings
  • Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; and Taylor, J. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 1247–1250. ACM.
    Google ScholarLocate open access versionFindings
  • Chang, M.-W.; Ratinov, L.; and Roth, D. 2007. Guiding semi-supervision with constraint-driven learning. In Annual Meeting-Association for Computational Linguistics, volume 45.
    Google ScholarLocate open access versionFindings
  • De Raedt, L., and Kersting, K. 2003. Probabilistic logic learning. ACM SIGKDD Explorations Newsletter 5(1):31– 48.
    Google ScholarLocate open access versionFindings
  • De Raedt, L. 2008. Logical and relational learning. Springer Science & Business Media.
    Google ScholarFindings
  • Dietterich, T. G.; Lathrop, R. H.; and Lozano-Perez, T. 199Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence 89(1):31–71.
    Google ScholarLocate open access versionFindings
  • Ermon, S.; Le Bras, R.; Suram, S. K.; Gregoire, J. M.; Gomes, C. P.; Selman, B.; and van Dover, R. B. 2015. Pattern decomposition with complex combinatorial constraints: Application to materials discovery. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Ganchev, K.; Gillenwater, J.; Taskar, B.; et al. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research 11(Jul).
    Google ScholarLocate open access versionFindings
  • Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 133–142. ACM.
    Google ScholarLocate open access versionFindings
  • Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Kingma, D. P., and Welling, M. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
    Findings
  • Kotzias, D.; Denil, M.; de Freitas, N.; and Smyth, P. 2015. From group to individual labels using deep features. In ACM SIGKDD.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097–1105.
    Google ScholarLocate open access versionFindings
  • Lafferty, J.; McCallum, A.; and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth International Conference on Machine Learning, volume 1, 282–289.
    Google ScholarLocate open access versionFindings
  • Le, Q. V. 2013. Building high-level features using large scale unsupervised learning. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8595–8598. IEEE.
    Google ScholarLocate open access versionFindings
  • LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. Nature 521(7553):436–444.
    Google ScholarLocate open access versionFindings
  • Lee, D. D., and Seung, H. S. 2001. Algorithms for nonnegative matrix factorization. In Advances in neural information processing systems, 556–562.
    Google ScholarLocate open access versionFindings
  • Lenat, D. B. 1995. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM 38(11):33–38.
    Google ScholarLocate open access versionFindings
  • Liang, P.; Jordan, M. I.; and Klein, D. 2009. Learning from measurements in exponential families. In Proceedings of the 26th annual International Conference on Machine Learning. ACM.
    Google ScholarLocate open access versionFindings
  • Lin, K.; Lu, J.; Chen, C.-S.; and Zhou, J. 2016. Learning compact binary descriptors with unsupervised deep neural networks. CVPR.
    Google ScholarLocate open access versionFindings
  • Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; and Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540):529–533.
    Google ScholarLocate open access versionFindings
  • Muggleton, S., and De Raedt, L. 1994. Inductive logic programming: Theory and methods. The Journal of Logic Programming 19:629–679.
    Google ScholarLocate open access versionFindings
  • Ratner, A.; De Sa, C.; Wu, S.; Selsam, D.; and Re, C. 2016. Data programming: Creating large training sets, quickly. arXiv preprint arXiv:1605.07723.
    Findings
  • Richardson, M., and Domingos, P. 2006. Markov logic networks. Machine Learning 62(1):107–136.
    Google ScholarLocate open access versionFindings
  • Shcherbatyi, I., and Andres, B. 2016. Convexification of learning from constraints. arXiv preprint arXiv:1602.06746.
    Findings
  • Wolpert, D. H. 2002. The supervised learning no-free-lunch theorems. In Soft Computing and Industry. Springer. 25–42.
    Google ScholarFindings
  • Zhi, W.; Wang, X.; Qian, B.; Butler, P.; Ramakrishnan, N.; and Davidson, I. 2013. Clustering with complex constraintsalgorithms and applications. In AAAI.
    Google ScholarFindings
  • Zhou, Z.-H., and Xu, J.-M. 2007. On the relation between multi-instance learning and semi-supervised learning. In Proceedings of the 24th International Conference on Machine learning, 1167–1174. ACM.
    Google ScholarLocate open access versionFindings
  • Zhuang, B.; Lin, G.; Shen, C.; and Reid, I. 2016. Fast training of triplet-based deep binary embedding networks. arXiv preprint arXiv:1603.02844.
    Findings
Full Text
Your rating :
0

 

Best Paper
Best Paper of AAAI, 2017
Tags
Comments