## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion

NIPS 2020, (2020)

EI

Abstract

We consider the problem of reconstructing a rank-one matrix from a revealed subset of its entries when some of the revealed entries are corrupted with perturbations that are unknown and can be arbitrarily large. It is not known which revealed entries are corrupted. We propose a new algorithm combining alternating minimization with extre...More

Code:

Data:

Introduction

- Matrix completion [10] [9] [13] refers to the problem of recovering a low-rank matrix from a subset of its entries.
- A fundamental challenge in the study of matrix completion is that, in some applications, the revealed entries will be inaccurate or corrupted
- When these perturbations can be arbitrarily large, we will refer to the problem as “robust matrix completion.” In particular, the motivating application for this paper is estimation of worker reliability in crowdsourcing [30] [41] [21][19][45], where this issue appears if some workers deviate from their instructions.
- It has previously been observed that optimal estimation in the D&S model requires estimation of worker reliabilities [1], which in turn can be framed as a rank-one matrix completion problem [30]

Highlights

- Matrix completion [10] [9] [13] refers to the problem of recovering a low-rank matrix from a subset of its entries
- A fundamental challenge in the study of matrix completion is that, in some applications, the revealed entries will be inaccurate or corrupted. When these perturbations can be arbitrarily large, we will refer to the problem as “robust matrix completion.” In particular, the motivating application for this paper is estimation of worker reliability in crowdsourcing [30] [41] [21][19][45], where this issue appears if some workers deviate from their instructions
- It has previously been observed that optimal estimation in the Dawid & Skene (D&S) model requires estimation of worker reliabilities [1], which in turn can be framed as a rank-one matrix completion problem [30]
- We propose a new algorithm for robust rank-one matrix completion which, in at least one regime, is provably optimal
- Because approaches based on sparse recovery are not able to handle arbitrary A, we proposed a new algorithm, M-MSR, for skill determination in this context
- Our algorithm is based on a connection to the robust rank-1 matrix completion

Methods

- The authors will compare the average prediction error (.
- The authors implemented similar experiments on 17 publicly available data sets that are commonly used to evaluate the crowdsourcing algorithms.
- As shown in Figure 2 and Figure 6 (Supplementary Sec. D), the M-MSR algorithm consistently outperforms all the baseline methods.
- When the number of the corrupted workers increases, the prediction error of M-MSR algorithm maintains the smallest on almost every dataset

Conclusion

**Discussion and Conclusions**

The authors studied a crowdsourcing model with (i) The presence of users who might choose adversarial responses (ii) General worker-task assignment sets resulting in arbitrary interaction graphs G(Ω) among workers.- Because approaches based on sparse recovery are not able to handle arbitrary A, the authors proposed a new algorithm, M-MSR, for skill determination in this context.
- ̄j = 2 (αj−1 − (1 − α) ), j < j−1, ̄j < ̄j−1.
- If both sets SM (t0 + j, j) and Sm(t0 + j, ̄j) are nonempty, the authors can repeat the analysis above for time-step t0 + j.

- Table1: Synthetic dataset: characteristics values of the original dataset
- Table2: Real data: characteristic values after removing workers who provide less than 10 labels

Related work

- Matrix Completion: the standard approach to low-rank matrix completion [2][3] usually proceeds by nuclear norm minimization: minL L ∗

s.t. [L]ij = [L0]ij, ∀(i, j) ∈ Ω, (1)

where L ∗ is the nuclear norm of matrix L, Ω is the set of locations of the observed entries, L0 is the matrix to be recovered. Candès and Recht [2] proved that L0 can be recovered with high probability via solving (1) if L0 is incoherent and Ω is sampled uniformly at random. These are strong assumptions and many papers, including this one, have sought to relax them. A popular approach has been to focus on non-uniform sampling. In particular, Negahban et al [31] relaxed the condition of uniform sampling to weighted entrywise sampling. Király et al [20] considered deterministic sampling. Liu et al proposed a new hypothesis called "isomeric condition" in [27], which is weaker than uniform sampling, and proved that the matrix L0 can be recovered by a nonconvex approach under this condition.

Funding

- Acknowledgments and Disclosure of Funding This work is supported by NSF awards 1914792 and 1933027

Study subjects and analysis

publicly available data sets: 17

A detailed description of all of these methods can be found in.

We implemented similar experiments on 17 publicly available data sets that are commonly used to evaluate the crowdsourcing algorithms. A detailed discussion of all the datasets can be found in Supplementary Sec

**Experiments on real data**We implemented similar experiments on 17 publicly available data sets that are commonly used to evaluate the crowdsourcing algorithms. A detailed discussion of all the datasets can be found in Supplementary Sec

publicly available data sets: 17

Experiments on real data. We implemented similar experiments on 17 publicly available data sets that are commonly used to evaluate the crowdsourcing algorithms. A detailed discussion of all the datasets can be found in Supplementary Sec

real datasets: 17

It is the only method which can handle around n 2 (n is the number of the total workers) corrupted workers on these datasets. Out of 17 real datasets, our algorithm is the best on 16 of them. The only exception is dataset Surprise ( Figure 6 in Supplementary Sec

Reference

- D. Berend and A. Kontorovich. Consistency of weighted majority votes. In Proceedings of Advances in Neural Information Processing Systems, pages 3446–3454, 2014.
- E. J. Candès and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.
- E. J. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
- I. Dagan, O. Glickman, and B. Magnini. The pascal recognising textual entailment challenge. In Proceedings of Machine Learning Challenges Workshop, pages 177–190.
- N. Dalvi, A. Dasgupta, R. Kumar, and V. Rastogi. Aggregating crowdsourced binary ratings. In Proceedings of the 22nd International Conference on World Wide Web, pages 285–294, 2013.
- A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1): 20–28, 1979.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255.
- D. Dolev, C. Dwork, O. Waarts, and M. Yung. Perfectly secure message transmission. Journal of the ACM, 40(1):17–47, 1993.
- S. Fattahi and S. Sojoudi. Exact guarantees on the absence of spurious local minima for nonnegative robust principal component analysis. Journal of Machine Learning Research, 21:1–51, 2020.
- D. Gamarnik and S. Misra. A note on alternating minimization algorithm for the matrix completion problem. IEEE Signal Processing Letters, 23(10):1340–1343, 2016.
- A. Ghosh, S. Kale, and P. McAfee. Who moderates the moderators? crowdsourcing abuse detection in user-generated content. In Proceedings of the 12th ACM Conference on Electronic Commerce, pages 167–176, 2011.
- N. Hartsfield and G. Ringel. Pearls in graph theory: a comprehensive introduction. Courier Corporation, 2013.
- J. M. Hendrickx, A. Olshevsky, and V. Saligrama. Minimax rank-1 factorization. In Proceedings of 23rd International Conference on Artificial Intelligence and Statistics, 2020.
- J. Hromkovic, R. Klasing, A. Pelc, P. Ruzicka, and W. Unger. Dissemination of Information in Communication Networks: Broadcasting, Gossiping, Leader Election, and Fault-tolerance. Springer Science & Business Media, 2005.
- S. Ibrahim, X. Fu, N. Kargas, and K. Huang. Crowdsourcing via pairwise co-occurrences: Identifiability and algorithms. In Proceedings of Advances in Neural Information Processing Systems, pages 7845–7855, 2019.
- P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, pages 64–67, 2010.
- S. Jagabathula, L. Subramanian, and A. Venkataraman. Identifying unreliable and adversarial workers in crowdsourced labeling tasks. The Journal of Machine Learning Research, 18(1): 3233–3299, 2017.
- D. R. Karger, S. Oh, and D. Shah. Efficient crowdsourcing for multi-class labeling. In Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, pages 81–92, 2013.
- A. Khetan and S. Oh. Achieving budget-optimality with adaptive schemes in crowdsourcing. In Advances in Neural Information Processing Systems 29, pages 4844–4852. 2016.
- F. J. Király, L. Theran, and R. Tomioka. The algebraic combinatorial approach for low-rank matrix completion. Journal of Machine Learning Research, pages 1391–1436, 2015.
- M. Kleindessner and P. Awasthi. Crowdsourcing with arbitrary adversaries. In Proceedings of International Conference on Machine Learning, pages 2708–2717, 2018.
- W. Kordecki. Poisson convergence of numbers of vertices of a given degree in random graphs. Discussiones Mathematicae Graph Theory, 16(2):157–172, 1996.
- H. Landau and A. Odlyzko. Bounds for eigenvalues of certain stochastic matrices. Linear Algebra and its Applications, 38:5–15, 1981.
- M. Lease and G. Kazai. Overview of the trec 2011 crowdsourcing track. In Proceedings of the Text Retrieval Conference, 2011.
- H. J. LeBlanc, H. Zhang, X. Koutsoukos, and S. Sundaram. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications, 31(4):766–781, 2013.
- H. Li and B. Yu. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086, 2014.
- G. Liu, Q. Liu, and X. Yuan. A new theory for matrix completion. In Proceedings of Advances in Neural Information Processing Systems, pages 785–794, 2017.
- Q. Liu, J. Peng, and A. T. Ihler. Variational inference for crowdsourcing. In Advances in Neural Information Processing Systems 25, pages 692–700. 2012.
- B. Loni, M. Menendez, M. Georgescu, L. Galli, C. Massari, I. S. Altingovde, D. Martinenghi, M. Melenhorst, R. Vliegendhart, and M. Larson. Fashion-focused creative commons social dataset. In Proceedings of the 4th ACM Multimedia Systems Conference, pages 72–77, 2013.
- Y. Ma, A. Olshevsky, C. Szepesvari, and V. Saligrama. Gradient descent for sparse rankone matrix completion for crowd-sourced aggregation of sparsely interacting workers. In Proceedings of International Conference on Machine Learning, pages 3335–3344, 2018.
- S. Negahban and M. J. Wainwright. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. Journal of Machine Learning Research, 13(May):1665–1697, 2012.
- S. Pradhan, E. Loper, D. Dligach, and M. Palmer. Semeval-2007 task-17: English lexical sample, srl and all words. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pages 87–92, 2007.
- J. Pustejovsky, P. Hanks, R. Sauri, A. See, R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim, D. Day, L. Ferro, et al. The TIMEBANK Corpus. In Proceedings of Corpus Linguistics, pages 647–656. Lancaster, UK., 2003.
- V. C. Raykar and S. Yu. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research, 13(Feb):491–518, 2012.
- N. B. Shah, S. Balakrishnan, and M. J. Wainwright. A permutation-based model for crowd labeling: Optimal estimation and robustness. arXiv preprint arXiv:1606.09632, 2016.
- R. Snow, B. O’connor, D. Jurafsky, and A. Y. Ng. Cheap and fast–but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pages 254–263, 2008.
- C. Strapparava and R. Mihalcea. Semeval-2007 task 14: Affective text. In Proceedings of the Fourth International Workshop on Semantic Evaluations, pages 70–74, 2007.
- P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In Proceedings of Advances in Neural Information Processing Systems, pages 2424–2432, 2010.
- H. Xiao, J. Gao, Q. Li, F. Ma, L. Su, Y. Feng, and A. Zhang. Towards confidence interval estimation in truth discovery. IEEE Transactions on Knowledge and Data Engineering, 31(3): 575–588, 2018.
- H. Zhang, E. Fata, and S. Sundaram. A notion of robustness in complex networks. IEEE Transactions on Control of Network Systems, 2(3):310–320, 2015.
- Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Spectral methods meet em: A provably optimal algorithm for crowdsourcing. In Proceedings of Advances in Neural Information Processing Systems, pages 1260–1268, 2014.
- D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2195–2203. 2012.
- D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In Proceedings of Advances in Neural Information Processing Systems, pages 2195– 2203, 2012.
- D. Zhou, Q. Liu, J. Platt, and C. Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. Proceedings of Machine Learning Research, 32(2):262–270, 2014.
- Y. Zhou and J. He. Crowdsourcing via tensor augmentation and completion. In IJCAI, pages 2435–2441, 2016.
- Y. Zhou, L. Ying, and J. He. Multic2: an optimization framework for learning from task and worker dual heterogeneity. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 579–587. SIAM, 2017.
- [25] Consider an undirected arbitrary graph G, suppose each normal node begins with some private value xi(0) ∈ R (The initial values can be arbitrary). The nodes interact synchronously by conveying their values to their neighbors in the graph. Each normal node updates its own value over time according to a prescribed rule, which is modeled as xi(t + 1) = fi(xij(·)), j ∈ Ωi, i ∈ N, where xij(·) is the value sent from node j to node i before time-step t + 1. The update rule f (·) can be arbitrary deterministic function, and may be different for different nodes. Then the normal nodes of G are said to achieve resilient asymptotic consensus in the presence of malicious nodes if
- [22] If np → ∞ but np/nα = o(1) for every α > 0, then the distribution of Xr → Po(λ) if λr(n) → λ < ∞ and if λr(n) → ∞, then the distribution of (Xr − λr(n))/ λr(n) → N(0, 1).
- 1. Lemma 6. For any r ∈ Z≥1, if a graph G is r-robust, then G is at least r-connected.
- 7. Consider a random bipartite graph

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn