AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
It has previously been observed that optimal estimation in the Dawid & Skene model requires estimation of worker reliabilities, which in turn can be framed as a rank-one matrix completion problem

Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion

NIPS 2020, (2020)

Cited by: 0|Views19
EI
Full Text
Bibtex
Weibo

Abstract

We consider the problem of reconstructing a rank-one matrix from a revealed subset of its entries when some of the revealed entries are corrupted with perturbations that are unknown and can be arbitrarily large. It is not known which revealed entries are corrupted. We propose a new algorithm combining alternating minimization with extre...More

Code:

Data:

0
Introduction
  • Matrix completion [10] [9] [13] refers to the problem of recovering a low-rank matrix from a subset of its entries.
  • A fundamental challenge in the study of matrix completion is that, in some applications, the revealed entries will be inaccurate or corrupted
  • When these perturbations can be arbitrarily large, we will refer to the problem as “robust matrix completion.” In particular, the motivating application for this paper is estimation of worker reliability in crowdsourcing [30] [41] [21][19][45], where this issue appears if some workers deviate from their instructions.
  • It has previously been observed that optimal estimation in the D&S model requires estimation of worker reliabilities [1], which in turn can be framed as a rank-one matrix completion problem [30]
Highlights
  • Matrix completion [10] [9] [13] refers to the problem of recovering a low-rank matrix from a subset of its entries
  • A fundamental challenge in the study of matrix completion is that, in some applications, the revealed entries will be inaccurate or corrupted. When these perturbations can be arbitrarily large, we will refer to the problem as “robust matrix completion.” In particular, the motivating application for this paper is estimation of worker reliability in crowdsourcing [30] [41] [21][19][45], where this issue appears if some workers deviate from their instructions
  • It has previously been observed that optimal estimation in the Dawid & Skene (D&S) model requires estimation of worker reliabilities [1], which in turn can be framed as a rank-one matrix completion problem [30]
  • We propose a new algorithm for robust rank-one matrix completion which, in at least one regime, is provably optimal
  • Because approaches based on sparse recovery are not able to handle arbitrary A, we proposed a new algorithm, M-MSR, for skill determination in this context
  • Our algorithm is based on a connection to the robust rank-1 matrix completion
Methods
  • The authors will compare the average prediction error (.
  • The authors implemented similar experiments on 17 publicly available data sets that are commonly used to evaluate the crowdsourcing algorithms.
  • As shown in Figure 2 and Figure 6 (Supplementary Sec. D), the M-MSR algorithm consistently outperforms all the baseline methods.
  • When the number of the corrupted workers increases, the prediction error of M-MSR algorithm maintains the smallest on almost every dataset
Conclusion
  • Discussion and Conclusions

    The authors studied a crowdsourcing model with (i) The presence of users who might choose adversarial responses (ii) General worker-task assignment sets resulting in arbitrary interaction graphs G(Ω) among workers.
  • Because approaches based on sparse recovery are not able to handle arbitrary A, the authors proposed a new algorithm, M-MSR, for skill determination in this context.
  • ̄j = 2 (αj−1 − (1 − α) ), j < j−1, ̄j < ̄j−1.
  • If both sets SM (t0 + j, j) and Sm(t0 + j, ̄j) are nonempty, the authors can repeat the analysis above for time-step t0 + j.
Tables
  • Table1: Synthetic dataset: characteristics values of the original dataset
  • Table2: Real data: characteristic values after removing workers who provide less than 10 labels
Download tables as Excel
Related work
  • Matrix Completion: the standard approach to low-rank matrix completion [2][3] usually proceeds by nuclear norm minimization: minL L ∗

    s.t. [L]ij = [L0]ij, ∀(i, j) ∈ Ω, (1)

    where L ∗ is the nuclear norm of matrix L, Ω is the set of locations of the observed entries, L0 is the matrix to be recovered. Candès and Recht [2] proved that L0 can be recovered with high probability via solving (1) if L0 is incoherent and Ω is sampled uniformly at random. These are strong assumptions and many papers, including this one, have sought to relax them. A popular approach has been to focus on non-uniform sampling. In particular, Negahban et al [31] relaxed the condition of uniform sampling to weighted entrywise sampling. Király et al [20] considered deterministic sampling. Liu et al proposed a new hypothesis called "isomeric condition" in [27], which is weaker than uniform sampling, and proved that the matrix L0 can be recovered by a nonconvex approach under this condition.
Funding
  • Acknowledgments and Disclosure of Funding This work is supported by NSF awards 1914792 and 1933027
Study subjects and analysis
publicly available data sets: 17
A detailed description of all of these methods can be found in. Experiments on real data

We implemented similar experiments on 17 publicly available data sets that are commonly used to evaluate the crowdsourcing algorithms
. A detailed discussion of all the datasets can be found in Supplementary Sec

publicly available data sets: 17
Experiments on real data. We implemented similar experiments on 17 publicly available data sets that are commonly used to evaluate the crowdsourcing algorithms. A detailed discussion of all the datasets can be found in Supplementary Sec

real datasets: 17
It is the only method which can handle around n 2 (n is the number of the total workers) corrupted workers on these datasets. Out of 17 real datasets, our algorithm is the best on 16 of them. The only exception is dataset Surprise ( Figure 6 in Supplementary Sec

Reference
  • D. Berend and A. Kontorovich. Consistency of weighted majority votes. In Proceedings of Advances in Neural Information Processing Systems, pages 3446–3454, 2014.
    Google ScholarLocate open access versionFindings
  • E. J. Candès and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.
    Google ScholarLocate open access versionFindings
  • E. J. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
    Google ScholarLocate open access versionFindings
  • I. Dagan, O. Glickman, and B. Magnini. The pascal recognising textual entailment challenge. In Proceedings of Machine Learning Challenges Workshop, pages 177–190.
    Google ScholarLocate open access versionFindings
  • N. Dalvi, A. Dasgupta, R. Kumar, and V. Rastogi. Aggregating crowdsourced binary ratings. In Proceedings of the 22nd International Conference on World Wide Web, pages 285–294, 2013.
    Google ScholarLocate open access versionFindings
  • A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1): 20–28, 1979.
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255.
    Google ScholarLocate open access versionFindings
  • D. Dolev, C. Dwork, O. Waarts, and M. Yung. Perfectly secure message transmission. Journal of the ACM, 40(1):17–47, 1993.
    Google ScholarLocate open access versionFindings
  • S. Fattahi and S. Sojoudi. Exact guarantees on the absence of spurious local minima for nonnegative robust principal component analysis. Journal of Machine Learning Research, 21:1–51, 2020.
    Google ScholarLocate open access versionFindings
  • D. Gamarnik and S. Misra. A note on alternating minimization algorithm for the matrix completion problem. IEEE Signal Processing Letters, 23(10):1340–1343, 2016.
    Google ScholarLocate open access versionFindings
  • A. Ghosh, S. Kale, and P. McAfee. Who moderates the moderators? crowdsourcing abuse detection in user-generated content. In Proceedings of the 12th ACM Conference on Electronic Commerce, pages 167–176, 2011.
    Google ScholarLocate open access versionFindings
  • N. Hartsfield and G. Ringel. Pearls in graph theory: a comprehensive introduction. Courier Corporation, 2013.
    Google ScholarFindings
  • J. M. Hendrickx, A. Olshevsky, and V. Saligrama. Minimax rank-1 factorization. In Proceedings of 23rd International Conference on Artificial Intelligence and Statistics, 2020.
    Google ScholarLocate open access versionFindings
  • J. Hromkovic, R. Klasing, A. Pelc, P. Ruzicka, and W. Unger. Dissemination of Information in Communication Networks: Broadcasting, Gossiping, Leader Election, and Fault-tolerance. Springer Science & Business Media, 2005.
    Google ScholarFindings
  • S. Ibrahim, X. Fu, N. Kargas, and K. Huang. Crowdsourcing via pairwise co-occurrences: Identifiability and algorithms. In Proceedings of Advances in Neural Information Processing Systems, pages 7845–7855, 2019.
    Google ScholarLocate open access versionFindings
  • P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, pages 64–67, 2010.
    Google ScholarLocate open access versionFindings
  • S. Jagabathula, L. Subramanian, and A. Venkataraman. Identifying unreliable and adversarial workers in crowdsourced labeling tasks. The Journal of Machine Learning Research, 18(1): 3233–3299, 2017.
    Google ScholarLocate open access versionFindings
  • D. R. Karger, S. Oh, and D. Shah. Efficient crowdsourcing for multi-class labeling. In Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, pages 81–92, 2013.
    Google ScholarLocate open access versionFindings
  • A. Khetan and S. Oh. Achieving budget-optimality with adaptive schemes in crowdsourcing. In Advances in Neural Information Processing Systems 29, pages 4844–4852. 2016.
    Google ScholarLocate open access versionFindings
  • F. J. Király, L. Theran, and R. Tomioka. The algebraic combinatorial approach for low-rank matrix completion. Journal of Machine Learning Research, pages 1391–1436, 2015.
    Google ScholarLocate open access versionFindings
  • M. Kleindessner and P. Awasthi. Crowdsourcing with arbitrary adversaries. In Proceedings of International Conference on Machine Learning, pages 2708–2717, 2018.
    Google ScholarLocate open access versionFindings
  • W. Kordecki. Poisson convergence of numbers of vertices of a given degree in random graphs. Discussiones Mathematicae Graph Theory, 16(2):157–172, 1996.
    Google ScholarLocate open access versionFindings
  • H. Landau and A. Odlyzko. Bounds for eigenvalues of certain stochastic matrices. Linear Algebra and its Applications, 38:5–15, 1981.
    Google ScholarLocate open access versionFindings
  • M. Lease and G. Kazai. Overview of the trec 2011 crowdsourcing track. In Proceedings of the Text Retrieval Conference, 2011.
    Google ScholarLocate open access versionFindings
  • H. J. LeBlanc, H. Zhang, X. Koutsoukos, and S. Sundaram. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications, 31(4):766–781, 2013.
    Google ScholarLocate open access versionFindings
  • H. Li and B. Yu. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086, 2014.
    Findings
  • G. Liu, Q. Liu, and X. Yuan. A new theory for matrix completion. In Proceedings of Advances in Neural Information Processing Systems, pages 785–794, 2017.
    Google ScholarLocate open access versionFindings
  • Q. Liu, J. Peng, and A. T. Ihler. Variational inference for crowdsourcing. In Advances in Neural Information Processing Systems 25, pages 692–700. 2012.
    Google ScholarLocate open access versionFindings
  • B. Loni, M. Menendez, M. Georgescu, L. Galli, C. Massari, I. S. Altingovde, D. Martinenghi, M. Melenhorst, R. Vliegendhart, and M. Larson. Fashion-focused creative commons social dataset. In Proceedings of the 4th ACM Multimedia Systems Conference, pages 72–77, 2013.
    Google ScholarLocate open access versionFindings
  • Y. Ma, A. Olshevsky, C. Szepesvari, and V. Saligrama. Gradient descent for sparse rankone matrix completion for crowd-sourced aggregation of sparsely interacting workers. In Proceedings of International Conference on Machine Learning, pages 3335–3344, 2018.
    Google ScholarLocate open access versionFindings
  • S. Negahban and M. J. Wainwright. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. Journal of Machine Learning Research, 13(May):1665–1697, 2012.
    Google ScholarLocate open access versionFindings
  • S. Pradhan, E. Loper, D. Dligach, and M. Palmer. Semeval-2007 task-17: English lexical sample, srl and all words. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pages 87–92, 2007.
    Google ScholarLocate open access versionFindings
  • J. Pustejovsky, P. Hanks, R. Sauri, A. See, R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim, D. Day, L. Ferro, et al. The TIMEBANK Corpus. In Proceedings of Corpus Linguistics, pages 647–656. Lancaster, UK., 2003.
    Google ScholarLocate open access versionFindings
  • V. C. Raykar and S. Yu. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research, 13(Feb):491–518, 2012.
    Google ScholarLocate open access versionFindings
  • N. B. Shah, S. Balakrishnan, and M. J. Wainwright. A permutation-based model for crowd labeling: Optimal estimation and robustness. arXiv preprint arXiv:1606.09632, 2016.
    Findings
  • R. Snow, B. O’connor, D. Jurafsky, and A. Y. Ng. Cheap and fast–but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pages 254–263, 2008.
    Google ScholarLocate open access versionFindings
  • C. Strapparava and R. Mihalcea. Semeval-2007 task 14: Affective text. In Proceedings of the Fourth International Workshop on Semantic Evaluations, pages 70–74, 2007.
    Google ScholarLocate open access versionFindings
  • P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In Proceedings of Advances in Neural Information Processing Systems, pages 2424–2432, 2010.
    Google ScholarLocate open access versionFindings
  • H. Xiao, J. Gao, Q. Li, F. Ma, L. Su, Y. Feng, and A. Zhang. Towards confidence interval estimation in truth discovery. IEEE Transactions on Knowledge and Data Engineering, 31(3): 575–588, 2018.
    Google ScholarLocate open access versionFindings
  • H. Zhang, E. Fata, and S. Sundaram. A notion of robustness in complex networks. IEEE Transactions on Control of Network Systems, 2(3):310–320, 2015.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Spectral methods meet em: A provably optimal algorithm for crowdsourcing. In Proceedings of Advances in Neural Information Processing Systems, pages 1260–1268, 2014.
    Google ScholarLocate open access versionFindings
  • D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2195–2203. 2012.
    Google ScholarLocate open access versionFindings
  • D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In Proceedings of Advances in Neural Information Processing Systems, pages 2195– 2203, 2012.
    Google ScholarLocate open access versionFindings
  • D. Zhou, Q. Liu, J. Platt, and C. Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. Proceedings of Machine Learning Research, 32(2):262–270, 2014.
    Google ScholarLocate open access versionFindings
  • Y. Zhou and J. He. Crowdsourcing via tensor augmentation and completion. In IJCAI, pages 2435–2441, 2016.
    Google ScholarLocate open access versionFindings
  • Y. Zhou, L. Ying, and J. He. Multic2: an optimization framework for learning from task and worker dual heterogeneity. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 579–587. SIAM, 2017.
    Google ScholarLocate open access versionFindings
  • [25] Consider an undirected arbitrary graph G, suppose each normal node begins with some private value xi(0) ∈ R (The initial values can be arbitrary). The nodes interact synchronously by conveying their values to their neighbors in the graph. Each normal node updates its own value over time according to a prescribed rule, which is modeled as xi(t + 1) = fi(xij(·)), j ∈ Ωi, i ∈ N, where xij(·) is the value sent from node j to node i before time-step t + 1. The update rule f (·) can be arbitrary deterministic function, and may be different for different nodes. Then the normal nodes of G are said to achieve resilient asymptotic consensus in the presence of malicious nodes if
    Google ScholarFindings
  • [22] If np → ∞ but np/nα = o(1) for every α > 0, then the distribution of Xr → Po(λ) if λr(n) → λ < ∞ and if λr(n) → ∞, then the distribution of (Xr − λr(n))/ λr(n) → N(0, 1).
    Google ScholarFindings
  • 1. Lemma 6. For any r ∈ Z≥1, if a graph G is r-robust, then G is at least r-connected.
    Google ScholarFindings
  • 7. Consider a random bipartite graph
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科