AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Learning Based Distributed Tracking
KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event..., pp.2040-2050, (2020)
- The great success of machine learning in the past decade has proven the assumption that data in practice follows certain patterns (e.g., The DT problem setting.
- In the DT problem, there is a coordinator and k players (a.k.a. sites); between each player and the coordinator, there is a two-way communication channel.
- There is at most one player having its counter increased by one, conceptually representing an item arrives at the player.
- The job of the coordinator is to raise an alarm at the exact moment that the N -th item arrives, equivalently, the moment that the sum of the counters of all the players reaches.
- The efficiency of an algorithm for solving the DT problem is measured by the communication cost, i.e., the total number of messages that received and sent by the coordinator, where each message can only carry at most O(1) words
- The great success of machine learning in the past decade has proven the assumption that data in practice follows certain patterns (e.g., The Distributed Tracking problem setting
- The experimental results show that the communication cost of our algorithms is as least as
- The efficiency of an algorithm for solving the Distributed Tracking problem is measured by the communication cost, i.e., the total number of messages that received and sent by the coordinator, where each message can only carry at most O(1) words
- When the datasets are generated from some distribution, our algorithms perform consistently better than the CMY algorithms, regardless of the distribution
- Our algorithms are equipped with backup mechanism that guarantees comparable performance as the data-independent CMY algorithm when the distribution fluctuates
- We experimentally evaluate our algorithms against the state-of-the-art competitors, using both real and synthetic datasets
- Results on the Real Datasets
Figure 2-4 illustrate the results of the algorithms on three real datasets.
- All plots in the figures refer to the percentage of untracked items as a function of the number of used communications.
- StcSlk-KwnDst and DynSlk-KwnDst perform the best, with DynSlk-KwnDst slightly better in most cases
- This is as expected because they know the frequency information and have more knowledge than the other algorithms.
- Figure 5 shows the efficiency of the algorithms under various distributions
- It plots the percentage of untracked items as a function of the number of communications.
- When the distribution is stable, the static slacks capture more accurately the number of items a player will receive
- The paper exploits the counter increment distribution and presents four data-dependent algorithms that utilize knowledge on the data distribution.
- All the algorithms have communication cost O(k log log.
- +k log log k δ δ is a parameter controls the failure probability, improving the state-of-the-art O(k log.
- The authors experimentally evaluate the algorithms against the state-of-the-art competitors, using both real and synthetic datasets.
- The authors' experimental results show the efficiency and robustness of the algorithms
- Table1: Frequently used notations we mean here is the Multinomial Distribution of the counter increments, more specifically, the probability distribution of a counter increment happening in the players. For the example shown in
- Table2: Dataset Characteristics (M = 106 and K = 103)
- The distributed tracking (DT) problem has been well studied in terms of both upper bounds and lower bounds, since it was first proposed. Prior to the CMY algorithm, an uniform slack (called UniSlk) algorithm was proposed by Cormode et al . The communication cost of
UniSlk is bounded by O (k 2 log The design
UniSlk is based on the observation that for N items arriving at k players, there must be at least one of player received N /k items. The
UniSlk algorithm runs in rounds. At the at the beginning of the first round, the coordinator broadcasts a slack N /k to each players.
The player notifies the coordinator when its counter exceeds N /k.
- Junhao Gan is supported by Australian Research Council (ARC) DECRA DE190101118
- Anders Aamand, Piotr Indyk, and Ali Vakilian. 2019. (Learned) Frequency Estimation Algorithms under Zipfian Distribution. CoRR abs/1908.05198 (2019). arXiv:1908.05198
- Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. 2009. Explorationexploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 410, 19 (2009), 1876–1902.
- Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings.
- Fan R. K. Chung and Lincoln Lu. 2006. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Mathematics 3, 1 (2006), 79–127.
- Graham Cormode. 2013. The continuous distributed monitoring model. SIGMOD Record 42, 1 (2013), 5–14.
- Graham Cormode, Minos N. Garofalakis, S. Muthukrishnan, and Rajeev Rastogi.
- Graham Cormode, S. Muthukrishnan, and Ke Yi. 2011. Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7, 2 (2011), 21:1–21:20.
- Nikos Giatrakos, Antonios Deligiannakis, Minos N. Garofalakis, Izchak Sharfman, and Assaf Schuster. 2012. Prediction-based geometric monitoring over distributed data streams. In Proceedings of the International Conference on Management of Data, SIGMOD, Scottsdale, AZ, USA, May 20-24, 2012. 265–276.
- Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. 201Learning-Based Frequency Estimation Algorithms. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
- Zengfeng Huang, Ke Yi, and Qin Zhang. 2019. Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks. Algorithmica 81, 6 (2019), 2222–2243.
- Ram Keralapura, Graham Cormode, and Jeyashankher Ramamirtham. 2006. Communication-efficient distributed monitoring of thresholded counts. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006. 289–300.
- Elias B. Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning Combinatorial Optimization Algorithms over Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 6348–6358.
- David Kotz, Tristan Henderson, Ilya Abyzov, and Jihwang Yeo. 2009. CRAWDAD
- Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018. 489–504.
- Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada. 462–471.
- Miao Qiao, Junhao Gan, and Yufei Tao. 20Range Thresholding on Streams. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016. 571–582.
- Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In Annual Conference on Neural Information Processing Systems (NeuIPS), December 7-12, 2015, Montreal, Quebec, Canada. 2692–2700.
- 1. Therefore, f (t) ≤ t +