# SliceNDice - Mining Suspicious Multi-Attribute Entity Groups with Multi-View Graphs

DSAA, pp. 351-363, 2019.

EI

Weibo:

Abstract:

Given the reach of web platforms, bad actors have considerable incentives to manipulate and defraud users at the expense of platform integrity. This has spurred research in numerous suspicious behavior detection tasks, including detection of sybil accounts, false information, and payment scams/fraud. In this paper, we draw the insight tha...More

Code:

Data:

Introduction

- Online services and social networks (Snapchat, Quora, Amazon, etc.) have become common means of engagement in human activities including socialization, information sharing and commerce
- Given their dissemination power and reach, they have been long-exploited by bad actors looking to manipulate user perception, spread misinformation, and falsely promote bad content.
- Given the ever-increasing multitude of new abuse vectors, application-specific anti-abuse solutions and rich labeled sets are not always feasible or timely, motivating research towards flexible, unsupervised methods

Highlights

- Online services and social networks (Snapchat, Quora, Amazon, etc.) have become common means of engagement in human activities including socialization, information sharing and commerce
- Upper-case variables denote properties of G, while lower-case letters denote properties of X. Given these terms, which are summarized in Table 1, we propose the following axioms below which should be satisfied by an multiview subgraph (MVSG) scoring metric
- We propose an MVSG scoring metric based on an underlying data model for G in which undirected edges between the N nodes are distributed i.i.d within each of the K views
- Prior work has suggested Mass, average degree (AvgDeg) and Dens as suspiciousness metrics for single graph views [24], [23], [14]. We extend these to multi-view cases by construing an aggregated view with edge weights summed across the K views. [22] proposes CSSusp for suspiciousness in discrete, multi-modal tensor data; we can apply this by construing an MVSG X as a 3-mode tensor of n × n × K
- We propose and formalize intuitive desiderata (Axioms 1-5) that MVSG scoring metrics should obey to match human intuition, and designed a novel suspiciousness metric based (a) Time to seed (b) Time vs. # entities (c) Time vs. # iterations on the proposed MVERE model which satisfies these metrics, unlike alternatives
- We propose the SliceNDice algorithm which enables efficient extraction of highly suspicious entity groups, and demonstrate its practicality in production, in terms of strong detection performance and discoveries on Snapchat's large advertiser ecosystem (89% precision and numerous discoveries of real fraud rings), marked outperformance of baselines and linear scalability
- We proposed the SLICENDICE algorithm, which enables scalable ranking and discovery of MVSGs suspicious according to our metric, and discussed practical implementation details which help result relevance and computational efficiency

Results

- The authors' experiments aim to answer the following questions. Q1. Detection performance: How effective is SLI-

CENDICE in detecting suspicious behaviors in real and simulated settings? How does it perform in comparison to prior works? Q2.

Conclusion

- The authors tackled the problem of scoring and discovering suspicious behavior in multi-attribute entity data.
- The authors construe this data as a multi-view graph, and formulate this task in terms of mining suspiciously dense multi-view subgraphs (MVSGs).
- The authors proposed the SLICENDICE algorithm, which enables scalable ranking and discovery of MVSGs suspicious according to the metric, and discussed practical implementation details which help result relevance and computational efficiency.
- The authors demonstrated strong empirical results, including experiments on real data from the Snapchat advertiser platform where the authors achieved 89% precision over 2.7K organizations and uncovered numerous fraudulent advertiser rings, consistently high precision/recall and outperformance of several state-of-the-art group mining algorithms, and linear scalability

Summary

## Introduction:

Online services and social networks (Snapchat, Quora, Amazon, etc.) have become common means of engagement in human activities including socialization, information sharing and commerce- Given their dissemination power and reach, they have been long-exploited by bad actors looking to manipulate user perception, spread misinformation, and falsely promote bad content.
- Given the ever-increasing multitude of new abuse vectors, application-specific anti-abuse solutions and rich labeled sets are not always feasible or timely, motivating research towards flexible, unsupervised methods
## Objectives:

In order to find a highly suspicious group of entities, the authors aim to optimize the view set and node set selection via the UPDATEVIEWS and UPDATENODES methods.- The authors aim to sample views in a weighted fashion, favoring those in which overlap occurs less frequently.
- The authors' detection task is to classify each attribute overlap as suspicious or non-suspicious; the authors aim to label each nonzero entry (“behavior”) in the resulting N × N × K tensor
## Results:

The authors' experiments aim to answer the following questions. Q1. Detection performance: How effective is SLI-

CENDICE in detecting suspicious behaviors in real and simulated settings? How does it perform in comparison to prior works? Q2.## Conclusion:

The authors tackled the problem of scoring and discovering suspicious behavior in multi-attribute entity data.- The authors construe this data as a multi-view graph, and formulate this task in terms of mining suspiciously dense multi-view subgraphs (MVSGs).
- The authors proposed the SLICENDICE algorithm, which enables scalable ranking and discovery of MVSGs suspicious according to the metric, and discussed practical implementation details which help result relevance and computational efficiency.
- The authors demonstrated strong empirical results, including experiments on real data from the Snapchat advertiser platform where the authors achieved 89% precision over 2.7K organizations and uncovered numerous fraudulent advertiser rings, consistently high precision/recall and outperformance of several state-of-the-art group mining algorithms, and linear scalability

- Table1: Frequently used symbols and definitions
- Table2: Comparison with alternative metrics

Related work

- We discuss prior work in two related contexts below. Mining entity groups. Prior works have shown that suspicious behaviors often manifest in synchronous group-level behaviors [10], [9]. Several works assume inputs in the form of a single graph snapshot. [11], [12] mine communities using eigenplots from singular value decomposition (SVD) over adjacency matrices. [13], [14], [15] propose greedy pruning/expansion algorithms for identifying dense subgraphs. [16], [17] co-cluster nodes based on information theoretic measures relating to minimum-description length. Some prior works [18], [19] tackle subgroup mining from a single graph which also has node attributes, via efficient subgroup enumeration using tree-based branch-and-bound approaches which rely on specialized community goodness scoring functions. Unlike these works, our work entails mining suspicious groups based on overlap across multiple attributes and graph views, such that attribute importance is respected and an appropriate suspiciousness measure is used.

Funding

- We propose the SliceNDice algorithm which enables efficient extraction of highly suspicious entity groups, and demonstrate its practicality in production, in terms of strong detection performance and discoveries on Snapchat's large advertiser ecosystem (89% precision and numerous discoveries of real fraud rings), marked outperformance of baselines (over 97% precision/recall in simulated settings) and linear scalability
- We discuss design decisions which improve performance including careful seeding, context-aware similarity weighting and performance optimizations
- Figure 4 shows precision/recall curves for all 5 attack scenarios; note that SLICENDICE significantly outperforms competitors in all cases, often maintaining over 90% precision while making limited false positives
- We demonstrated strong empirical results, including experiments on real data from the Snapchat advertiser platform where we achieved 89% precision over 2.7K organizations and uncovered numerous fraudulent advertiser rings, consistently high precision/recall (over 97%) and outperformance of several state-of-the-art group mining algorithms, and linear scalability

Study subjects and analysis

users: 100

Intuitively, larger groups which share attributes are more suspicious than smaller ones, controlling for density of mass. For example, 100 users sharing the same IP address is more suspicious than 10 users doing the same. Axiom 3 (Contrast)

users: 100

Intuitively, a group with fixed attribute synchrony is more suspicious when background similarities between attributes are rare. For example, 100 users using the same IP address is generally more rare (lower Pi) than 100 users all from the same country (higher Pi ). Axiom 4 (Concentration)

users: 10

Intuitively, a smaller group of entities sharing the same number of similarities is more suspicious than a larger group doing the same. For example, finding 10 instances (edges) of IP sharing between a group of 10 users is more suspicious than finding the same in a group of 100 users. Axiom 5 (Cross-view Distribution)

users: 100

Intuitively, a fixed mass is more suspicious when distributed towards a view with higher edge rarity. For example, given 100 users, it is more suspicious for 100 pairs to share IP addresses (low Pi) and 10 pairs to share the same country (high Pj), than vice versa. This axiom builds from Axiom 3

Reference

- K. Thomas, D. McCoy, C. Grier, A. Kolcz, and V. Paxson, “Trafficking fraudulent accounts: The role of the underground market in twitter spam and abuse.” in USENIX Security, 2013, pp. 195–210.
- P. K. Smith, J. Mahdavi, M. Carvalho, S. Fisher, S. Russell, and N. Tippett, “Cyberbullying: Its nature and impact in secondary school pupils,” J. of Child Psych., vol. 49, no. 4, pp. 376–385, 2008.
- A. Bessi and E. Ferrara, “Social bots distort the 2016 us presidential election online discussion,” 2016.
- Q. Cao, X. Yang, J. Yu, and C. Palow, “Uncovering large groups of active malicious accounts in online social networks,” in CCS. ACM, 2014, pp. 477–488.
- C. Xiao, D. M. Freeman, and T. Hwa, “Detecting clusters of fake accounts in online social networks,” in WAIS. ACM, 2015, pp. 91– 101.
- N. Shah, A. Beutel, B. Gallagher, and C. Faloutsos, “Spotting suspicious link behavior with fbox,” in ICDM. IEEE, 2014, pp. 959–964.
- M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Catchsync: Catching synchronized behavior in large directed graphs,” 2014.
- D. H. P. Chau, C. Nachenberg, J. Wilhelm, A. Wright, and C. Faloutsos, “Polonium: Tera-scale graph mining and inference for malware detection,” in SDM. SIAM, 2011, pp. 131–142.
- S. Kumar and N. Shah, “False information on web and social media: A survey,” arXiv preprint arXiv:1804.08559, 2018.
- N. Shah, H. Lamba, A. Beutel, and C. Faloutsos, “The many faces of link fraud,” in ICDM. IEEE, 2017, pp. 1069–1074.
- B. A. Prakash, A. Sridharan, M. Seshadri, S. Machiraju, and C. Faloutsos, “Eigenspokes: Surprising patterns and scalable community chipping in large graphs,” in PAKDD. Springer, 2010, pp. 435–448.
- M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Inferring lockstep behavior from connectivity pattern in large graphs,” PAKDD, vol. 48, no. 2, pp. 399–428, 2016.
- M. Charikar, “Greedy approximation algorithms for finding dense components in a graph,” in APPROX. Springer, 2000, pp. 84–95.
- B. Hooi, H. A. Song, A. Beutel, N. Shah, K. Shin, and C. Faloutsos, “Fraudar: Bounding graph fraud in the face of camouflage,” in KDD. ACM, 2016, pp. 895–904.
- V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” JSM, vol. 2008, no. 10, p. P10008, 2008.
- D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos, “Fully automatic cross-associations,” in KDD. ACM, 2004, pp. 79–88.
- I. S. Dhillon, S. Mallela, and D. S. Modha, “Information-theoretic coclustering,” in KDD. ACM, 2003, pp. 89–98.
- M. Atzmueller, S. Doerfel, and F. Mitzlaff, “Description-oriented community detection using exhaustive subgroup discovery,” Information Sciences, vol. 329, pp. 965–984, 2016.
- M. Atzmueller and F. Mitzlaff, “Efficient descriptive community mining,” in Twenty-Fourth International FLAIRS Conference, 2011.
- J. Kim and J.-G. Lee, “Community detection in multi-layer graphs: A survey,” SIGMOD, vol. 44, no. 3, pp. 37–48, 2015.
- H.-H. Mao, C.-J. Wu, E. E. Papalexakis, C. Faloutsos, K.-C. Lee, and T.-C. Kao, “Malspot: Multi 2 malicious network behavior patterns analysis,” in PAKDD. Springer, 2014, pp. 1–14.
- M. Jiang, A. Beutel, P. Cui, B. Hooi, S. Yang, and C. Faloutsos, “Spotting suspicious behaviors in multimodal data: A general metric and algorithms,” TKDE, vol. 28, no. 8, pp. 2187–2200, 2016.
- A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos, “Copycatch: stopping group attacks by spotting lockstep behavior in social networks,” in WWW. ACM, 2013, pp. 119–130.
- K. Shin, B. Hooi, and C. Faloutsos, “M-zoom: Fast dense-block detection in tensors with quality guarantees,” in ECML-PKDD. Springer, 2016, pp. 264–280.
- N. Shah, D. Koutra, T. Zou, B. Gallagher, and C. Faloutsos, “Timecrunch: Interpretable dynamic graph summarization,” in KDD. ACM, 2015, pp. 1055–1064.
- A. Metwally, J.-Y. Pan, M. Doan, and C. Faloutsos, “Scalable community discovery from multi-faceted graphs,” in BigData. IEEE, 2015, pp. 1053–1062.
- L. Chen, Y. Zhou, and D. M. Chiu, “Analysis and detection of fake views in online video services,” TOMM, vol. 11, no. 2s, p. 44, 2015.
- D. M. Freeman, “Using naive bayes to detect spammy names in social networks,” in WAIS. ACM, 2013, pp. 3–12.
- J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: learning to detect malicious web sites from suspicious urls,” in KDD. ACM, 2009, pp. 1245–1254.
- N. Shah, A. Beutel, B. Hooi, L. Akoglu, S. Gunnemann, D. Makhija, M. Kumar, and C. Faloutsos, “Edgecentric: Anomaly detection in edgeattributed networks,” in ICDMW. IEEE, 2016, pp. 327–334.
- B. Hooi, N. Shah, A. Beutel, S. Gunnemann, L. Akoglu, M. Kumar, D. Makhija, and C. Faloutsos, “Birdnest: Bayesian inference for ratingsfraud detection,” in SDM. SIAM, 2016, pp. 495–503.
- L. Akoglu, M. McGlohon, and C. Faloutsos, “Oddball: Spotting anomalies in weighted graphs,” in PAKDD. Springer, 2010, pp. 410–421.
- H. Lamba, B. Hooi, K. Shin, C. Faloutsos, and J. Pfeffer, “zoo r ank: Ranking suspicious entities in time-evolving tensors,” in ECML-PKDD. Springer, 2017, pp. 68–84.
- Z. Gyongyi, H. Garcia-Molina, and J. Pedersen, “Combating web spam with trustrank,” in VLDB, 2004, pp. 576–587.
- G. B. Guacho, S. Abdali, N. Shah, and E. E. Papalexakis, “Semisupervised content-based detection of misinformation via tensor embeddings,” ASONAM, 2018.
- V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal, “A survey of algorithms for dense subgraph discovery,” in Managing and Mining Graph Data. Springer, 2010, pp. 303–336.
- M. E. Newman, D. J. Watts, and S. H. Strogatz, “Random graph models of social networks,” PNAS, vol. 99, no. suppl 1, pp. 2566–2572, 2002.
- M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Inferring strange behavior from connectivity pattern in social networks,” in PAKDD. Springer, 2014, pp. 126–138.
- C. C. Aggarwal and C. Zhai, Mining text data. Springer Science & Business Media, 2012.
- E. E. Papalexakis, L. Akoglu, and D. Ienco, “Do more views of a graph help? community detection and clustering in multi-graphs.” in FUSION. Citeseer, 2013, pp. 899–905.
- K. Maruhashi, F. Guo, and C. Faloutsos, “Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis,” in ASONAM. IEEE, 2011, pp. 203–210.

Tags

Comments