Diverse Rule Sets

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event CA USA July, 2020, pp. 1532-1541, 2020.

Cited by: 0|Bibtex|Views22|DOI:https://doi.org/10.1145/3394486.3403204
EI
Other Links: arxiv.org|dl.acm.org|dblp.uni-trier.de
Weibo:
Inspired by a recent line of work on diversification, we introduce a novel formulation for a diverse rule set that has both excellent discriminative power and diversity, and it can be optimized with an approximation guarantee

Abstract:

While machine-learning models are flourishing and transforming many aspects of everyday life, the inability of humans to understand complex models poses difficulties for these models to be fully trusted and embraced. Thus, interpretability of models has been recognized as an equally important quality as their predictive power. In particul...More

Code:

Data:

0
Introduction
  • There is a general consensus in the data-science community that interpretability is vital for data-driven models to be understood, trusted, and used by practitioners.
  • This is especially true in safetycritical applications, such as disease diagnosis and criminal justice systems [32].
  • Since overlapping rules create ambiguity and need a conflict resolution strategy, it is rational to consider small overlap as a new effective criterion to improve interpretability
Highlights
  • There is a general consensus in the data-science community that interpretability is vital for data-driven models to be understood, trusted, and used by practitioners
  • Rule sets are generally considered easier to interpret than decision lists, due to a flatter representation [20, 30]
  • We investigate several options for a candidate rule set in an exponential-size search space, including frequent and accurate rules, and examine their effectiveness and hardness
  • A rule sampling algorithm described in Section 5 is designed to sample large-coverage candidate rules only among uncovered records, which simulates a similar rule discovery process as sequential covering algorithm
  • We discuss desirable properties of a rule set in order to be considered interpretable
  • Inspired by a recent line of work on diversification, we introduce a novel formulation for a diverse rule set that has both excellent discriminative power and diversity, and it can be optimized with an approximation guarantee
Methods
  • The authors evaluate the performance of the algorithm from three different perspectives — sensitivity to hyper-parameters, predictive power, and interpretability.
  • The authors compare the model DRS with three baselines — CBA [33], CN2 [16], and IDS [30].
  • Models CBA and IDS use frequent rules as candidates.
  • The baselines serve as excellent opponents to examine two important aspects of the model: predictive power and diversity.
  • The metrics include balanced accuracy, ROC AUC, average diversity, the number of distinct data records that are covered by more than one rule.
Conclusion
  • Objective functions in the form of Equation (6) can be approximated within a factor 2 via a greedy algorithm [11], provided the d(·) is a metric and U can be enumerated in polynomial time.
  • Maximizing Equation (6) will give a desirable rule set in terms of the goal, i.e., accuracy and small overlap.
  • Apart from a low model complexity, small overlap among decision rules has been identified to be essential.
  • DRS CN2 CBA m1 ax n2umb3er of4 rule5s major contribution in this work is an efficient sampling algorithm that directly samples decision rules that are discriminative and have small overlap, from an exponential-size search space, with a distribution that perfectly suits the objective.
  • Potential future directions include extensions to other notions of diversity
Summary
  • Introduction:

    There is a general consensus in the data-science community that interpretability is vital for data-driven models to be understood, trusted, and used by practitioners.
  • This is especially true in safetycritical applications, such as disease diagnosis and criminal justice systems [32].
  • Since overlapping rules create ambiguity and need a conflict resolution strategy, it is rational to consider small overlap as a new effective criterion to improve interpretability
  • Methods:

    The authors evaluate the performance of the algorithm from three different perspectives — sensitivity to hyper-parameters, predictive power, and interpretability.
  • The authors compare the model DRS with three baselines — CBA [33], CN2 [16], and IDS [30].
  • Models CBA and IDS use frequent rules as candidates.
  • The baselines serve as excellent opponents to examine two important aspects of the model: predictive power and diversity.
  • The metrics include balanced accuracy, ROC AUC, average diversity, the number of distinct data records that are covered by more than one rule.
  • Conclusion:

    Objective functions in the form of Equation (6) can be approximated within a factor 2 via a greedy algorithm [11], provided the d(·) is a metric and U can be enumerated in polynomial time.
  • Maximizing Equation (6) will give a desirable rule set in terms of the goal, i.e., accuracy and small overlap.
  • Apart from a low model complexity, small overlap among decision rules has been identified to be essential.
  • DRS CN2 CBA m1 ax n2umb3er of4 rule5s major contribution in this work is an efficient sampling algorithm that directly samples decision rules that are discriminative and have small overlap, from an exponential-size search space, with a distribution that perfectly suits the objective.
  • Potential future directions include extensions to other notions of diversity
Tables
  • Table1: Datasets characteristics: n = |T |; ncls = |Y |; imbalance = maxy ∈Y |T y |/miny ∈Y |T y |; and |I | is the size of binary features
  • Table2: Sensitivity to λ
  • Table3: Predictive power. DRS1 is the rule set obtained in the first run of Algorithm 2 dataset model nrules nconds bacc auc div overlap iris
Download tables as Excel
Related work
  • Rule learning. Learning theory inspects rule learning from a computational perspective. Valiant [38] introduced PAC learning and asked whether polynomial-size DNF can be efficiently PAC-learned in a noise-free setting. This question remains open, and researchers try to attack the problem in restricted forms, however, these scenarios are less practical for real-world applications with noisy data.

    Predominate practical rule-learning paradigms for rule sets include sequential covering algorithms [26], and associative classifiers [33]. The former iteratively learns one rule at a time over the uncovered data, typically by means of generalization or specialization, i.e., adding of removing a condition to the rule body [21]. Popular variants include CN2 [16] and RIPPER [17]. Associative classifiers use association rules, which are usually pre-mined using itemset-mining techniques. A set of rules is selected from candidate association rules via heuristics [33] or by optimizing an objective [4, 30, 39]. Our method falls into the second paradigm.
Funding
  • This research is supported by three Academy of Finland projects (286211, 313927, 317085), the ERC Advanced Grant REBOUND (834862), the EC H2020 RIA project “SoBigData++” (871042), and the Wallenberg AI, Autonomous Systems and Software Program (WASP)
Reference
  • Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In VLDB, Vol. 1215. 487–499.
    Google ScholarLocate open access versionFindings
  • Mohammad Al Hasan and Mohammed Zaki. 2009. Musk: Uniform sampling of k maximal patterns. In ICDM. SIAM, 650–661.
    Google ScholarLocate open access versionFindings
  • Mohammad Al Hasan and Mohammed J Zaki. 2009. Output space sampling for graph patterns. VLDB 2, 1 (2009), 730–741.
    Google ScholarLocate open access versionFindings
  • Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, and Cynthia Rudin. 2017. Learning certifiably optimal rule lists for categorical data. JMLR 18, 1 (2017), 8753–8830.
    Google ScholarLocate open access versionFindings
  • Roberto J Bayardo, Rakesh Agrawal, and Dimitrios Gunopulos. 1999. Constraintbased rule mining in large, dense databases. In ICDE. IEEE, 188–197.
    Google ScholarLocate open access versionFindings
  • Mario Boley. 2007. On approximating minimum infrequent and maximum frequent sets. In International Conference on Discovery Science. Springer, 68–77.
    Google ScholarLocate open access versionFindings
  • Mario Boley, Thomas Gärtner, and Henrik Grosskreutz. 2010. Formal concept sampling for counting and threshold-free local pattern mining. In SDM. SIAM, 177–188.
    Google ScholarLocate open access versionFindings
  • Mario Boley and Henrik Grosskreutz. 200A randomized approach for approximating the number of frequent sets. In ICDM. IEEE, 43–52.
    Google ScholarLocate open access versionFindings
  • Mario Boley, Claudio Lucchese, Daniel Paurat, and Thomas Gärtner. 2011. Direct local pattern sampling by efficient two-step random procedures. In KDD. ACM, 582–590.
    Google ScholarLocate open access versionFindings
  • Mario Boley, Sandy Moens, and Thomas Gärtner. 2012. Linear space direct pattern sampling using coupling from the past. In KDD. ACM, 69–77.
    Google ScholarLocate open access versionFindings
  • Allan Borodin, Hyun Chul Lee, and Yuli Ye. 2012. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS.
    Google ScholarFindings
  • Endre Boros, Vladimir Gurvich, Leonid Khachiyan, and Kazuhisa Makino. 2002. On the complexity of generating maximal frequent and minimal infrequent sets. In STACS. Springer, 133–141.
    Google ScholarFindings
  • Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. 2015. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM J. Comput. 44, 5 (2015), 1384–1402.
    Google ScholarLocate open access versionFindings
  • Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, Jeremy Besson, and Mohammed J. Zaki. 2008. Origami: A novel and effective approach for mining representative orthogonal graph patterns. SADM 1, 2 (2008), 67–84.
    Google ScholarLocate open access versionFindings
  • Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. 2007. Discriminative frequent pattern analysis for effective classification. In ICDE. IEEE, 716–725.
    Google ScholarLocate open access versionFindings
  • Peter Clark and Tim Niblett. 1989. The CN2 induction algorithm. Machine learning 3, 4 (1989), 261–283.
    Google ScholarFindings
  • William W Cohen. 1995. Fast effective rule induction. In Machine learning proceedings 1995.
    Google ScholarLocate open access versionFindings
  • Sanjeeb Dash, Oktay Gunluk, and Dennis Wei. 20Boolean decision rules via column generation. In NeuIPS. 4655–4665.
    Google ScholarFindings
  • Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
    Findings
  • Alex A Freitas. 2014. Comprehensible classification models: a position paper. KDD 15, 1 (2014), 1–10.
    Google ScholarLocate open access versionFindings
  • Johannes Fürnkranz, Dragan Gamberger, and Nada Lavrač. 2012. Foundations of rule learning. Springer Science & Business Media.
    Google ScholarFindings
  • Michael R Garey and David S Johnson. 2002. Computers and intractability. Vol. 29. wh freeman New York.
    Google ScholarFindings
  • Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. In DSAA. IEEE, 80–89.
    Google ScholarLocate open access versionFindings
  • Sreenivas Gollapudi and Aneesh Sharma. 2009. An axiomatic approach for result diversification. In WWW.
    Google ScholarFindings
  • Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, and Ram Sewak Sharma. 2003. Discovering all most specific sentences. TODS 28, 2 (2003), 140–174.
    Google ScholarLocate open access versionFindings
  • Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data mining: concepts and techniques. Elsevier.
    Google ScholarFindings
  • Mark Jerrum. 2003.
    Google ScholarFindings
  • Subhash Khot. 2004. Ruling Out PTAS for Graph Min-Bisection, Densest Subgraph and Bipartite Clique ÐČ. (2004).
    Google ScholarFindings
  • Arno J Knobbe and Eric KY Ho. 2006. Pattern teams. In ECML PKDD. Springer, 577–584.
    Google ScholarLocate open access versionFindings
  • Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In KDD. ACM, 1675–1684.
    Google ScholarLocate open access versionFindings
  • Dennis Leman, Ad Feelders, and Arno Knobbe. 2008. Exceptional model mining. In ECML PKDD. Springer, 1–16.
    Google ScholarFindings
  • Zachary C Lipton. 2018. The mythos of model interpretability. Queue 16, 3 (2018), 31–57.
    Google ScholarLocate open access versionFindings
  • Bing Liu, Wynne Hsu, Yiming Ma, et al. 1998. Integrating classification and association rule mining.. In KDD, Vol. 98. 80–86.
    Google ScholarLocate open access versionFindings
  • Dmitry Malioutov and Kush Varshney. 2013. Exact rule learning via boolean compressed sensing. In ICML. 765–773.
    Google ScholarLocate open access versionFindings
  • Sekharipuram S Ravi, Daniel J Rosenkrantz, and Giri Kumar Tayi. 1994. Heuristic and special case algorithms for dispersion problems. Operations Research (1994).
    Google ScholarLocate open access versionFindings
  • Guolong Su, Dennis Wei, Kush R Varshney, and Dmitry M Malioutov. 2015. Interpretable two-level boolean rule learning for classification. arXiv preprint arXiv:1511.07361 (2015).
    Findings
  • Hannu Toivonen et al. 1996. Sampling large databases for association rules. In VLDB, Vol. 96. 134–145.
    Google ScholarLocate open access versionFindings
  • Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134–1142.
    Google ScholarLocate open access versionFindings
  • Tong Wang, Cynthia Rudin, Finale Doshi-Velez, Yimin Liu, Erica Klampfl, and Perry MacNeille. 2017. A bayesian framework for learning rule sets for interpretable classification. JMLR 18, 1 (2017), 2357–2393.
    Google ScholarLocate open access versionFindings
  • Guizhen Yang. 2004. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD. ACM, 344–353. 2 https://github.com/lvhimabindu/interpretable_decision_sets
    Locate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments