SliceNDice - Mining Suspicious Multi-Attribute Entity Groups with Multi-View Graphs

Hamed Nilforoshan
Hamed Nilforoshan

DSAA, pp. 351-363, 2019.

Cited by: 0|Views15
EI
Weibo:
We propose a novel suspiciousness metric for scoring entity groups given the abnormality of their synchronicity across multiple views, which obeys intuitive desiderata that existing metrics do not

Abstract:

Given the reach of web platforms, bad actors have considerable incentives to manipulate and defraud users at the expense of platform integrity. This has spurred research in numerous suspicious behavior detection tasks, including detection of sybil accounts, false information, and payment scams/fraud. In this paper, we draw the insight tha...More

Code:

Data:

0
Introduction
  • Online services and social networks (Snapchat, Quora, Amazon, etc.) have become common means of engagement in human activities including socialization, information sharing and commerce
  • Given their dissemination power and reach, they have been long-exploited by bad actors looking to manipulate user perception, spread misinformation, and falsely promote bad content.
  • Given the ever-increasing multitude of new abuse vectors, application-specific anti-abuse solutions and rich labeled sets are not always feasible or timely, motivating research towards flexible, unsupervised methods
Highlights
  • Online services and social networks (Snapchat, Quora, Amazon, etc.) have become common means of engagement in human activities including socialization, information sharing and commerce
  • Upper-case variables denote properties of G, while lower-case letters denote properties of X. Given these terms, which are summarized in Table 1, we propose the following axioms below which should be satisfied by an multiview subgraph (MVSG) scoring metric
  • We propose an MVSG scoring metric based on an underlying data model for G in which undirected edges between the N nodes are distributed i.i.d within each of the K views
  • Prior work has suggested Mass, average degree (AvgDeg) and Dens as suspiciousness metrics for single graph views [24], [23], [14]. We extend these to multi-view cases by construing an aggregated view with edge weights summed across the K views. [22] proposes CSSusp for suspiciousness in discrete, multi-modal tensor data; we can apply this by construing an MVSG X as a 3-mode tensor of n × n × K
  • We propose and formalize intuitive desiderata (Axioms 1-5) that MVSG scoring metrics should obey to match human intuition, and designed a novel suspiciousness metric based (a) Time to seed (b) Time vs. # entities (c) Time vs. # iterations on the proposed MVERE model which satisfies these metrics, unlike alternatives
  • We propose the SliceNDice algorithm which enables efficient extraction of highly suspicious entity groups, and demonstrate its practicality in production, in terms of strong detection performance and discoveries on Snapchat's large advertiser ecosystem (89% precision and numerous discoveries of real fraud rings), marked outperformance of baselines and linear scalability
  • We proposed the SLICENDICE algorithm, which enables scalable ranking and discovery of MVSGs suspicious according to our metric, and discussed practical implementation details which help result relevance and computational efficiency
Results
  • The authors' experiments aim to answer the following questions. Q1. Detection performance: How effective is SLI-

    CENDICE in detecting suspicious behaviors in real and simulated settings? How does it perform in comparison to prior works? Q2.
Conclusion
  • The authors tackled the problem of scoring and discovering suspicious behavior in multi-attribute entity data.
  • The authors construe this data as a multi-view graph, and formulate this task in terms of mining suspiciously dense multi-view subgraphs (MVSGs).
  • The authors proposed the SLICENDICE algorithm, which enables scalable ranking and discovery of MVSGs suspicious according to the metric, and discussed practical implementation details which help result relevance and computational efficiency.
  • The authors demonstrated strong empirical results, including experiments on real data from the Snapchat advertiser platform where the authors achieved 89% precision over 2.7K organizations and uncovered numerous fraudulent advertiser rings, consistently high precision/recall and outperformance of several state-of-the-art group mining algorithms, and linear scalability
Summary
  • Introduction:

    Online services and social networks (Snapchat, Quora, Amazon, etc.) have become common means of engagement in human activities including socialization, information sharing and commerce
  • Given their dissemination power and reach, they have been long-exploited by bad actors looking to manipulate user perception, spread misinformation, and falsely promote bad content.
  • Given the ever-increasing multitude of new abuse vectors, application-specific anti-abuse solutions and rich labeled sets are not always feasible or timely, motivating research towards flexible, unsupervised methods
  • Objectives:

    In order to find a highly suspicious group of entities, the authors aim to optimize the view set and node set selection via the UPDATEVIEWS and UPDATENODES methods.
  • The authors aim to sample views in a weighted fashion, favoring those in which overlap occurs less frequently.
  • The authors' detection task is to classify each attribute overlap as suspicious or non-suspicious; the authors aim to label each nonzero entry (“behavior”) in the resulting N × N × K tensor
  • Results:

    The authors' experiments aim to answer the following questions. Q1. Detection performance: How effective is SLI-

    CENDICE in detecting suspicious behaviors in real and simulated settings? How does it perform in comparison to prior works? Q2.
  • Conclusion:

    The authors tackled the problem of scoring and discovering suspicious behavior in multi-attribute entity data.
  • The authors construe this data as a multi-view graph, and formulate this task in terms of mining suspiciously dense multi-view subgraphs (MVSGs).
  • The authors proposed the SLICENDICE algorithm, which enables scalable ranking and discovery of MVSGs suspicious according to the metric, and discussed practical implementation details which help result relevance and computational efficiency.
  • The authors demonstrated strong empirical results, including experiments on real data from the Snapchat advertiser platform where the authors achieved 89% precision over 2.7K organizations and uncovered numerous fraudulent advertiser rings, consistently high precision/recall and outperformance of several state-of-the-art group mining algorithms, and linear scalability
Tables
  • Table1: Frequently used symbols and definitions
  • Table2: Comparison with alternative metrics
Download tables as Excel
Related work
  • We discuss prior work in two related contexts below. Mining entity groups. Prior works have shown that suspicious behaviors often manifest in synchronous group-level behaviors [10], [9]. Several works assume inputs in the form of a single graph snapshot. [11], [12] mine communities using eigenplots from singular value decomposition (SVD) over adjacency matrices. [13], [14], [15] propose greedy pruning/expansion algorithms for identifying dense subgraphs. [16], [17] co-cluster nodes based on information theoretic measures relating to minimum-description length. Some prior works [18], [19] tackle subgroup mining from a single graph which also has node attributes, via efficient subgroup enumeration using tree-based branch-and-bound approaches which rely on specialized community goodness scoring functions. Unlike these works, our work entails mining suspicious groups based on overlap across multiple attributes and graph views, such that attribute importance is respected and an appropriate suspiciousness measure is used.
Funding
  • We propose the SliceNDice algorithm which enables efficient extraction of highly suspicious entity groups, and demonstrate its practicality in production, in terms of strong detection performance and discoveries on Snapchat's large advertiser ecosystem (89% precision and numerous discoveries of real fraud rings), marked outperformance of baselines (over 97% precision/recall in simulated settings) and linear scalability
  • We discuss design decisions which improve performance including careful seeding, context-aware similarity weighting and performance optimizations
  • Figure 4 shows precision/recall curves for all 5 attack scenarios; note that SLICENDICE significantly outperforms competitors in all cases, often maintaining over 90% precision while making limited false positives
  • We demonstrated strong empirical results, including experiments on real data from the Snapchat advertiser platform where we achieved 89% precision over 2.7K organizations and uncovered numerous fraudulent advertiser rings, consistently high precision/recall (over 97%) and outperformance of several state-of-the-art group mining algorithms, and linear scalability
Study subjects and analysis
users: 100
Intuitively, larger groups which share attributes are more suspicious than smaller ones, controlling for density of mass. For example, 100 users sharing the same IP address is more suspicious than 10 users doing the same. Axiom 3 (Contrast)

users: 100
Intuitively, a group with fixed attribute synchrony is more suspicious when background similarities between attributes are rare. For example, 100 users using the same IP address is generally more rare (lower Pi) than 100 users all from the same country (higher Pi ). Axiom 4 (Concentration)

users: 10
Intuitively, a smaller group of entities sharing the same number of similarities is more suspicious than a larger group doing the same. For example, finding 10 instances (edges) of IP sharing between a group of 10 users is more suspicious than finding the same in a group of 100 users. Axiom 5 (Cross-view Distribution)

users: 100
Intuitively, a fixed mass is more suspicious when distributed towards a view with higher edge rarity. For example, given 100 users, it is more suspicious for 100 pairs to share IP addresses (low Pi) and 10 pairs to share the same country (high Pj), than vice versa. This axiom builds from Axiom 3

Reference
  • K. Thomas, D. McCoy, C. Grier, A. Kolcz, and V. Paxson, “Trafficking fraudulent accounts: The role of the underground market in twitter spam and abuse.” in USENIX Security, 2013, pp. 195–210.
    Google ScholarFindings
  • P. K. Smith, J. Mahdavi, M. Carvalho, S. Fisher, S. Russell, and N. Tippett, “Cyberbullying: Its nature and impact in secondary school pupils,” J. of Child Psych., vol. 49, no. 4, pp. 376–385, 2008.
    Google ScholarLocate open access versionFindings
  • A. Bessi and E. Ferrara, “Social bots distort the 2016 us presidential election online discussion,” 2016.
    Google ScholarFindings
  • Q. Cao, X. Yang, J. Yu, and C. Palow, “Uncovering large groups of active malicious accounts in online social networks,” in CCS. ACM, 2014, pp. 477–488.
    Google ScholarLocate open access versionFindings
  • C. Xiao, D. M. Freeman, and T. Hwa, “Detecting clusters of fake accounts in online social networks,” in WAIS. ACM, 2015, pp. 91– 101.
    Google ScholarLocate open access versionFindings
  • N. Shah, A. Beutel, B. Gallagher, and C. Faloutsos, “Spotting suspicious link behavior with fbox,” in ICDM. IEEE, 2014, pp. 959–964.
    Google ScholarLocate open access versionFindings
  • M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Catchsync: Catching synchronized behavior in large directed graphs,” 2014.
    Google ScholarFindings
  • D. H. P. Chau, C. Nachenberg, J. Wilhelm, A. Wright, and C. Faloutsos, “Polonium: Tera-scale graph mining and inference for malware detection,” in SDM. SIAM, 2011, pp. 131–142.
    Google ScholarLocate open access versionFindings
  • S. Kumar and N. Shah, “False information on web and social media: A survey,” arXiv preprint arXiv:1804.08559, 2018.
    Findings
  • N. Shah, H. Lamba, A. Beutel, and C. Faloutsos, “The many faces of link fraud,” in ICDM. IEEE, 2017, pp. 1069–1074.
    Google ScholarLocate open access versionFindings
  • B. A. Prakash, A. Sridharan, M. Seshadri, S. Machiraju, and C. Faloutsos, “Eigenspokes: Surprising patterns and scalable community chipping in large graphs,” in PAKDD. Springer, 2010, pp. 435–448.
    Google ScholarFindings
  • M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Inferring lockstep behavior from connectivity pattern in large graphs,” PAKDD, vol. 48, no. 2, pp. 399–428, 2016.
    Google ScholarLocate open access versionFindings
  • M. Charikar, “Greedy approximation algorithms for finding dense components in a graph,” in APPROX. Springer, 2000, pp. 84–95.
    Google ScholarFindings
  • B. Hooi, H. A. Song, A. Beutel, N. Shah, K. Shin, and C. Faloutsos, “Fraudar: Bounding graph fraud in the face of camouflage,” in KDD. ACM, 2016, pp. 895–904.
    Google ScholarLocate open access versionFindings
  • V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” JSM, vol. 2008, no. 10, p. P10008, 2008.
    Google ScholarLocate open access versionFindings
  • D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos, “Fully automatic cross-associations,” in KDD. ACM, 2004, pp. 79–88.
    Google ScholarLocate open access versionFindings
  • I. S. Dhillon, S. Mallela, and D. S. Modha, “Information-theoretic coclustering,” in KDD. ACM, 2003, pp. 89–98.
    Google ScholarLocate open access versionFindings
  • M. Atzmueller, S. Doerfel, and F. Mitzlaff, “Description-oriented community detection using exhaustive subgroup discovery,” Information Sciences, vol. 329, pp. 965–984, 2016.
    Google ScholarLocate open access versionFindings
  • M. Atzmueller and F. Mitzlaff, “Efficient descriptive community mining,” in Twenty-Fourth International FLAIRS Conference, 2011.
    Google ScholarLocate open access versionFindings
  • J. Kim and J.-G. Lee, “Community detection in multi-layer graphs: A survey,” SIGMOD, vol. 44, no. 3, pp. 37–48, 2015.
    Google ScholarLocate open access versionFindings
  • H.-H. Mao, C.-J. Wu, E. E. Papalexakis, C. Faloutsos, K.-C. Lee, and T.-C. Kao, “Malspot: Multi 2 malicious network behavior patterns analysis,” in PAKDD. Springer, 2014, pp. 1–14.
    Google ScholarFindings
  • M. Jiang, A. Beutel, P. Cui, B. Hooi, S. Yang, and C. Faloutsos, “Spotting suspicious behaviors in multimodal data: A general metric and algorithms,” TKDE, vol. 28, no. 8, pp. 2187–2200, 2016.
    Google ScholarLocate open access versionFindings
  • A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos, “Copycatch: stopping group attacks by spotting lockstep behavior in social networks,” in WWW. ACM, 2013, pp. 119–130.
    Google ScholarLocate open access versionFindings
  • K. Shin, B. Hooi, and C. Faloutsos, “M-zoom: Fast dense-block detection in tensors with quality guarantees,” in ECML-PKDD. Springer, 2016, pp. 264–280.
    Google ScholarFindings
  • N. Shah, D. Koutra, T. Zou, B. Gallagher, and C. Faloutsos, “Timecrunch: Interpretable dynamic graph summarization,” in KDD. ACM, 2015, pp. 1055–1064.
    Google ScholarLocate open access versionFindings
  • A. Metwally, J.-Y. Pan, M. Doan, and C. Faloutsos, “Scalable community discovery from multi-faceted graphs,” in BigData. IEEE, 2015, pp. 1053–1062.
    Google ScholarLocate open access versionFindings
  • L. Chen, Y. Zhou, and D. M. Chiu, “Analysis and detection of fake views in online video services,” TOMM, vol. 11, no. 2s, p. 44, 2015.
    Google ScholarLocate open access versionFindings
  • D. M. Freeman, “Using naive bayes to detect spammy names in social networks,” in WAIS. ACM, 2013, pp. 3–12.
    Google ScholarLocate open access versionFindings
  • J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: learning to detect malicious web sites from suspicious urls,” in KDD. ACM, 2009, pp. 1245–1254.
    Google ScholarLocate open access versionFindings
  • N. Shah, A. Beutel, B. Hooi, L. Akoglu, S. Gunnemann, D. Makhija, M. Kumar, and C. Faloutsos, “Edgecentric: Anomaly detection in edgeattributed networks,” in ICDMW. IEEE, 2016, pp. 327–334.
    Google ScholarLocate open access versionFindings
  • B. Hooi, N. Shah, A. Beutel, S. Gunnemann, L. Akoglu, M. Kumar, D. Makhija, and C. Faloutsos, “Birdnest: Bayesian inference for ratingsfraud detection,” in SDM. SIAM, 2016, pp. 495–503.
    Google ScholarLocate open access versionFindings
  • L. Akoglu, M. McGlohon, and C. Faloutsos, “Oddball: Spotting anomalies in weighted graphs,” in PAKDD. Springer, 2010, pp. 410–421.
    Google ScholarFindings
  • H. Lamba, B. Hooi, K. Shin, C. Faloutsos, and J. Pfeffer, “zoo r ank: Ranking suspicious entities in time-evolving tensors,” in ECML-PKDD. Springer, 2017, pp. 68–84.
    Google ScholarFindings
  • Z. Gyongyi, H. Garcia-Molina, and J. Pedersen, “Combating web spam with trustrank,” in VLDB, 2004, pp. 576–587.
    Google ScholarFindings
  • G. B. Guacho, S. Abdali, N. Shah, and E. E. Papalexakis, “Semisupervised content-based detection of misinformation via tensor embeddings,” ASONAM, 2018.
    Google ScholarLocate open access versionFindings
  • V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal, “A survey of algorithms for dense subgraph discovery,” in Managing and Mining Graph Data. Springer, 2010, pp. 303–336.
    Google ScholarFindings
  • M. E. Newman, D. J. Watts, and S. H. Strogatz, “Random graph models of social networks,” PNAS, vol. 99, no. suppl 1, pp. 2566–2572, 2002.
    Google ScholarLocate open access versionFindings
  • M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Inferring strange behavior from connectivity pattern in social networks,” in PAKDD. Springer, 2014, pp. 126–138.
    Google ScholarFindings
  • C. C. Aggarwal and C. Zhai, Mining text data. Springer Science & Business Media, 2012.
    Google ScholarFindings
  • E. E. Papalexakis, L. Akoglu, and D. Ienco, “Do more views of a graph help? community detection and clustering in multi-graphs.” in FUSION. Citeseer, 2013, pp. 899–905.
    Google ScholarLocate open access versionFindings
  • K. Maruhashi, F. Guo, and C. Faloutsos, “Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis,” in ASONAM. IEEE, 2011, pp. 203–210.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments