# Learning Set Functions that are Sparse in Non-Orthogonal Fourier Bases

Weibo:

Abstract:

Many applications of machine learning on discrete domains, such as learning preference functions in recommender systems or auctions, can be reduced to estimating a set function that is sparse in the Fourier domain. In this work, we present a new family of algorithms for learning Fourier-sparse set functions. They require at most $nk - k...More

Code:

Data:

Introduction

- Numerous problems in machine learning on discrete domains involve learning set functions, i.e., functions s : 2N → R that map subsets of some ground set N to the real numbers.
- A natural way to learn such set functions is to compute their respective sparse Fourier transforms.
- The authors present the algorithm for learning Fourier-sparse set functions w.r.t. model 4.

Highlights

- Numerous problems in machine learning on discrete domains involve learning set functions, i.e., functions s : 2N → R that map subsets of some ground set N to the real numbers
- In this paper we develop, analyze, and evaluate novel algorithms for computing the sparse Fourier transform under the various notions of Fourier basis introduced by Püschel (2018): 1
- We introduce background and definitions for set functions and associated Fourier bases, following the discrete-set signal processing (DSSP) introduced by (Püschel 2018; Püschel and Wendler 2020)
- We introduced an algorithm for learning set functions that are sparse with respect to various generalized, nonorthogonal Fourier bases
- Our work significantly expands the set of efficiently learnable set functions

Results

- Exactly learning a k-Fourier-sparse set function is equivalent to computing its k non-zero Fourier coefficients and associated support.
- Given oracle access to query a kFourier-sparse set function s, compute its Fourier support and associated Fourier coefficients.
- As a result the authors can solve Problem 1 with the algorithm SSFT, under mild conditions on the coefficients, by successively computing the non-zero Fourier coefficients of restricted set functions along the chain s ↓2∅ = s ↓2∅ , s ↓2{x1} , s ↓2{x1,x2} , .
- The authors consider set functions s that are k-Fouriersparse (but not (k − 1)-Fourier-sparse) with support supp(s) = {B1, .
- Building on the analysis of SSFT, recall that S denotes the set of k-Fourier-sparse (but not (k − 1)-Fouriersparse) set functions and PCMi are the elements B ∈ supp(s) satisfying B ∩ Mi = C.
- SSFT Sparse set function Fourier transform of s
- There is a substantial body of research concerned with learning Fourier/WHT-sparse set functions (Stobbe and Krause 2012; Scheibler, Haghighatshoar, and Vetterli 2013; Kocaoglu et al 2014; Li and Ramchandran 2015; Cheraghchi and Indyk 2017; Amrollahi et al 2019).
- Kocaoglu et al (2014) propose a method to compute the WHT of a k-Fourier-sparse set function that satisfies a so-called unique sign property using queries polynomial in n and 2k.
- In a different line of work, Stobbe and Krause (2012) utilize results from compressive sensing to compute the WHT of k-WHT-sparse set functions, for which a super-set P of the support is known.
- If the facility locations function is k sparse w.r.t. model 4 for some |N | = n, the authors set the expected sparsity parameter of R-WHT to different multiples αk up to the first α for which the algorithm runs out of memory.
- The authors learn these bidders using the prior Fourier-sparse learning algorithms, this time including SSFT+, but excluding CS-WHT, since P is not known in this scenario.

Conclusion

- The authors introduced an algorithm for learning set functions that are sparse with respect to various generalized, nonorthogonal Fourier bases.
- The authors' approach is motivated by a range of real world applications, including modeling preferences in recommender systems and combinatorial auctions, that require the modeling, processing, and analysis of set functions, which is notoriously difficult due to their exponential size.
- The new notions of sparsity connect well with preference functions in recommender systems, which the authors consider an exciting avenue for future research

Summary

- Numerous problems in machine learning on discrete domains involve learning set functions, i.e., functions s : 2N → R that map subsets of some ground set N to the real numbers.
- A natural way to learn such set functions is to compute their respective sparse Fourier transforms.
- The authors present the algorithm for learning Fourier-sparse set functions w.r.t. model 4.
- Exactly learning a k-Fourier-sparse set function is equivalent to computing its k non-zero Fourier coefficients and associated support.
- Given oracle access to query a kFourier-sparse set function s, compute its Fourier support and associated Fourier coefficients.
- As a result the authors can solve Problem 1 with the algorithm SSFT, under mild conditions on the coefficients, by successively computing the non-zero Fourier coefficients of restricted set functions along the chain s ↓2∅ = s ↓2∅ , s ↓2{x1} , s ↓2{x1,x2} , .
- The authors consider set functions s that are k-Fouriersparse (but not (k − 1)-Fourier-sparse) with support supp(s) = {B1, .
- Building on the analysis of SSFT, recall that S denotes the set of k-Fourier-sparse (but not (k − 1)-Fouriersparse) set functions and PCMi are the elements B ∈ supp(s) satisfying B ∩ Mi = C.
- SSFT Sparse set function Fourier transform of s
- There is a substantial body of research concerned with learning Fourier/WHT-sparse set functions (Stobbe and Krause 2012; Scheibler, Haghighatshoar, and Vetterli 2013; Kocaoglu et al 2014; Li and Ramchandran 2015; Cheraghchi and Indyk 2017; Amrollahi et al 2019).
- Kocaoglu et al (2014) propose a method to compute the WHT of a k-Fourier-sparse set function that satisfies a so-called unique sign property using queries polynomial in n and 2k.
- In a different line of work, Stobbe and Krause (2012) utilize results from compressive sensing to compute the WHT of k-WHT-sparse set functions, for which a super-set P of the support is known.
- If the facility locations function is k sparse w.r.t. model 4 for some |N | = n, the authors set the expected sparsity parameter of R-WHT to different multiples αk up to the first α for which the algorithm runs out of memory.
- The authors learn these bidders using the prior Fourier-sparse learning algorithms, this time including SSFT+, but excluding CS-WHT, since P is not known in this scenario.
- The authors introduced an algorithm for learning set functions that are sparse with respect to various generalized, nonorthogonal Fourier bases.
- The authors' approach is motivated by a range of real world applications, including modeling preferences in recommender systems and combinatorial auctions, that require the modeling, processing, and analysis of set functions, which is notoriously difficult due to their exponential size.
- The new notions of sparsity connect well with preference functions in recommender systems, which the authors consider an exciting avenue for future research

- Table1: Shifts and Fourier concepts
- Table2: Multi-region valuation model (n = 98). Each row corresponds to a different bidder type
- Table3: Comparison of model 4 sparsity (SSFT) against WHT sparsity (R-WHT) of facility locations functions in terms of reconstruction error p − p′ / p for varying |N |; The italic results are averages over 10 runs

Related work

- We briefly discuss related work on learning set functions. Fourier-sparse learning. There is a substantial body of research concerned with learning Fourier/WHT-sparse set functions (Stobbe and Krause 2012; Scheibler, Haghighatshoar, and Vetterli 2013; Kocaoglu et al 2014; Li and Ramchandran 2015; Cheraghchi and Indyk 2017; Amrollahi et al 2019). Recently, Amrollahi et al (2019) have imported ideas from the hashing based sparse Fourier transform algorithm (Hassanieh et al 2012) to the set function setting. The resulting algorithms compute the WHT of k-WHT-sparse set functions with a query complexity O(nk) for general frequencies, O(kd log n) for low degree (≤ d) frequencies and O(kd log n log(d log n)) for low degree set functions that are only approximately sparse. To the best of our knowledge this latest work improves on previous algorithms, such as the ones by Scheibler, Haghighatshoar, and Vetterli (2013), Kocaoglu et al (2014), Li and Ramchandran (2015), and Cheraghchi and Indyk (2017), providing the best guarantees in terms of both query complexity and runtime. E.g., Scheibler, Haghighatshoar, and Vetterli (2013) utilize similar ideas like hashing/aliasing to derive sparse WHT algorithms that work under random support (the frequencies are uniformly distributed on 2N ) and random coefficient (the coefficients are samples from continuous distributions) assumptions. Kocaoglu et al (2014) propose a method to compute the WHT of a k-Fourier-sparse set function that satisfies a so-called unique sign property using queries polynomial in n and 2k.

Study subjects and analysis

samples: 10000

We learn these bidders using the prior Fourier-sparse learning algorithms, this time including SSFT+, but excluding CS-WHT, since P is not known in this scenario. Table 2 shows the results: means and standard deviations of the number of queries, number of Fourier coefficients, and relative error (estimated using 10,000 samples) taken over the bidder types and 25 runs. Interpretation of results

data: 98

Shifts and Fourier concepts. Multi-region valuation model (n = 98). Each row corresponds to a different bidder type. Comparison of model 4 sparsity (SSFT) against WHT sparsity (R-WHT) of facility locations functions in terms of reconstruction error p − p′ / p for varying |N |; The italic results are averages over 10 runs

Reference

- Amrollahi, A.; Zandieh, A.; Kapralov, M.; and Krause, A. 2019. Efficiently Learning Fourier Sparse Set Functions. In Advances in Neural Information Processing Systems, 15094– 15103.
- Balog, K.; Radlinski, F.; and Arakelyan, S. 2019.
- Bernasconi, A.; Codenotti, B.; and Simon, J. 1996. On the Fourier analysis of Boolean functions. preprint 1–24.
- Björklund, A.; Husfeldt, T.; Kaski, P.; and Koivisto, M. 2007. Fourier Meets Möbius: Fast Subset Convolution. In Proc ACM Symposium on Theory of Computing, 67–74.
- Brero, G.; Lubin, B.; and Seuken, S. 2019. Machine Learning-powered Iterative Combinatorial Auctions. arXiv preprint arXiv:1911.08042.
- Buathong, P.; Ginsbourger, D.; and Krityakierne, T. 2020. Kernels over Sets of Finite Sets using RKHS Embeddings, with Application to Bayesian (Combinatorial) Optimization. In International Conference on Artificial Intelligence and Statistics, 2731–2741.
- Chakrabarty, D.; and Huang, Z. 2012. Testing Coverage Functions. In International Colloquium on Automata, Languages, and Programming, 170–181. Springer.
- Cheraghchi, M.; and Indyk, P. 2017. Nearly optimal deterministic algorithm for sparse Walsh-Hadamard transform. ACM Transactions on Algorithms (TALG) 13(3): 1–36.
- De Wolf, R. 2008. A brief introduction to Fourier analysis on the Boolean cube. Theory of Computing 1–20.
- Djolonga, J.; and Krause, A. 2017. Differentiable Learning of Submodular Models. In Advances in Neural Information Processing Systems, 1013–1023.
- Djolonga, J.; Tschiatschek, S.; and Krause, A. 2016. Variational Inference in Mixed Probabilistic Submodular Models. In Advances in Neural Information Processing Systems, 1759–1767.
- Dolhansky, B. W.; and Bilmes, J. A. 2016. Deep Submodular Functions: Definitions and Learning. In Advances in Neural Information Processing Systems, 3404–3412.
- Feldman, V.; Kothari, P.; and Vondrák, J. 2013.
- Hassanieh, H.; Indyk, P.; Katabi, D.; and Price, E. 2012. Nearly Optimal Sparse Fourier Transform. In Proc. ACM Symposium on Theory of Computing, 563–578.
- Khare, A. 2009. Vector spaces as unions of proper subspaces. Linear algebra and its applications 431(9): 1681–1686.
- Kocaoglu, M.; Shanmugam, K.; Dimakis, A. G.; and Klivans, A. 2014. Sparse Polynomial Learning and Graph Sketching. In Advances in Neural Information Processing Systems, 3122–3130.
- Krause, A.; and Golovin, D. 2014. Submodular function maximization.
- Krause, A.; Singh, A.; and Guestrin, C. 2008. Near-optimal Sensor Placements in Gaussian processes: Theory, Efficient Algorithms and Empirical Studies. Journal of Machine Learning Research 9: 235–284.
- Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; and Glance, N. 2007. Cost-effective Outbreak Detection in Networks. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 420–429.
- Li, X.; and Ramchandran, K. 2015. An Active Learning Framework using Sparse-Graph Codes for Sparse Polynomials and Graph Sketching. In Advances in Neural Information Processing Systems, 2170–2178.
- Nemhauser, G. L.; Wolsey, L. A.; and Fisher, M. L. 1978. An analysis of approximations for maximizing submodular set functions — I. Mathematical programming 14(1): 265–294.
- Ostfeld, A.; Uber, J. G.; Salomons, E.; Berry, J. W.; Hart, W. E.; Phillips, C. A.; Watson, J.-P.; Dorini, G.; Jonkergouw, P.; Kapelan, Z.; et al. 2008. The Battle of the Water Sensor Networks (BWSN): A Design Challenge for Engineers and Algorithms. Journal of Water Resources Planning and Management 134(6): 556–568.
- Püschel, M. 2018. A Discrete Signal Processing Framework for Set Functions. In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4359– 4363. IEEE.
- Püschel, M.; and Moura, J. 2006. Algebraic Signal Processing Theory. arXiv preprint arXiv:cs/0612077v1.
- Püschel, M.; and Moura, J. M. 2008. Algebraic signal processing theory: Foundation and 1-D time. IEEE Trans. on Signal Processing 56(8): 3572–3585.
- Püschel, M.; and Wendler, C. 2020. Discrete Signal Processing with Set Functions. arXiv preprint arXiv:2001.10290.
- Raskhodnikova, S.; and Yaroslavtsev, G. 2013. Learning pseudo-Boolean k-DNF and Submodular Functions. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 1356–1368.
- Scheibler, R.; Haghighatshoar, S.; and Vetterli, M. 2013. A Fast Hadamard Transform for Signals with Sub-linear Sparsity. In Proc. Annual Allerton Conference on Communication, Control, and Computing, 1250–1257. IEEE.
- Sharma, M.; Harper, F. M.; and Karypis, G. 2019. Learning from Sets of Items in Recommender Systems. ACM Trans. on Interactive Intelligent Systems (TiiS) 9(4): 1–26.
- Srinivas, N.; Krause, A.; Kakade, S. M.; and Seeger, M. 2010. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In Proc. International Conference on Machine Learning (ICML), 1015–1022.
- Stobbe, P.; and Krause, A. 2012. Learning Fourier Sparse Set Functions. In Artificial Intelligence and Statistics, 1125– 1133.
- Tschiatschek, S.; Sahin, A.; and Krause, A. 2018. Differentiable Submodular Maximization. In Proc. International Joint Conference on Artificial Intelligence, 2731–2738.
- Vlastelica, M.; Paulus, A.; Musil, V.; Martius, G.; and Rolínek, M. 2019. Differentiation of Blackbox Combinatorial Solvers. arXiv preprint arXiv:1912.02175.
- Wang, P.-W.; Donti, P. L.; Wilder, B.; and Kolter, Z. 2019. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. arXiv preprint arXiv:1905.12149.
- Weiss, M.; Lubin, B.; and Seuken, S. 2017. SATS: A Universal Spectrum Auction Test Suite. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 51–59.
- Weissteiner, J.; Ionescu, S.; Olberg, N.; and Seuken, S. 2020a. Deep Learning-powered Iterative Combinatorial Auctions. In 34th AAAI Conference on Artificial Intelligence.
- Weissteiner, J.; Wendler, C.; Seuken, S.; Lubin, B.; and Püschel, M. 2020b. Fourier Analysis-based Iterative Combinatorial Auctions. arXiv preprint arXiv:2009.10749.
- Wendler, C.; and Püschel, M. 2019. Sampling Signals on Meet/Join Lattices. In Proc. Global Conference on Signal and Information Processing (GlobalSIP).
- Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; Salakhutdinov, R. R.; and Smola, A. J. 2017. Deep Sets. In Advances in Neural Information Processing Systems, 3391– 3401.

Full Text

Tags

Comments