# Diverse Rule Sets

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event CA USA July, 2020, pp. 1532-1541, 2020.

EI

Weibo:

Abstract:

While machine-learning models are flourishing and transforming many aspects of everyday life, the inability of humans to understand complex models poses difficulties for these models to be fully trusted and embraced. Thus, interpretability of models has been recognized as an equally important quality as their predictive power. In particul...More

Code:

Data:

Introduction

- There is a general consensus in the data-science community that interpretability is vital for data-driven models to be understood, trusted, and used by practitioners.
- This is especially true in safetycritical applications, such as disease diagnosis and criminal justice systems [32].
- Since overlapping rules create ambiguity and need a conflict resolution strategy, it is rational to consider small overlap as a new effective criterion to improve interpretability

Highlights

- There is a general consensus in the data-science community that interpretability is vital for data-driven models to be understood, trusted, and used by practitioners
- Rule sets are generally considered easier to interpret than decision lists, due to a flatter representation [20, 30]
- We investigate several options for a candidate rule set in an exponential-size search space, including frequent and accurate rules, and examine their effectiveness and hardness
- A rule sampling algorithm described in Section 5 is designed to sample large-coverage candidate rules only among uncovered records, which simulates a similar rule discovery process as sequential covering algorithm
- We discuss desirable properties of a rule set in order to be considered interpretable
- Inspired by a recent line of work on diversification, we introduce a novel formulation for a diverse rule set that has both excellent discriminative power and diversity, and it can be optimized with an approximation guarantee

Methods

- The authors evaluate the performance of the algorithm from three different perspectives — sensitivity to hyper-parameters, predictive power, and interpretability.
- The authors compare the model DRS with three baselines — CBA [33], CN2 [16], and IDS [30].
- Models CBA and IDS use frequent rules as candidates.
- The baselines serve as excellent opponents to examine two important aspects of the model: predictive power and diversity.
- The metrics include balanced accuracy, ROC AUC, average diversity, the number of distinct data records that are covered by more than one rule.

Conclusion

- Objective functions in the form of Equation (6) can be approximated within a factor 2 via a greedy algorithm [11], provided the d(·) is a metric and U can be enumerated in polynomial time.
- Maximizing Equation (6) will give a desirable rule set in terms of the goal, i.e., accuracy and small overlap.
- Apart from a low model complexity, small overlap among decision rules has been identified to be essential.
- DRS CN2 CBA m1 ax n2umb3er of4 rule5s major contribution in this work is an efficient sampling algorithm that directly samples decision rules that are discriminative and have small overlap, from an exponential-size search space, with a distribution that perfectly suits the objective.
- Potential future directions include extensions to other notions of diversity

Summary

## Introduction:

There is a general consensus in the data-science community that interpretability is vital for data-driven models to be understood, trusted, and used by practitioners.- This is especially true in safetycritical applications, such as disease diagnosis and criminal justice systems [32].
- Since overlapping rules create ambiguity and need a conflict resolution strategy, it is rational to consider small overlap as a new effective criterion to improve interpretability
## Methods:

The authors evaluate the performance of the algorithm from three different perspectives — sensitivity to hyper-parameters, predictive power, and interpretability.- The authors compare the model DRS with three baselines — CBA [33], CN2 [16], and IDS [30].
- Models CBA and IDS use frequent rules as candidates.
- The baselines serve as excellent opponents to examine two important aspects of the model: predictive power and diversity.
- The metrics include balanced accuracy, ROC AUC, average diversity, the number of distinct data records that are covered by more than one rule.
## Conclusion:

Objective functions in the form of Equation (6) can be approximated within a factor 2 via a greedy algorithm [11], provided the d(·) is a metric and U can be enumerated in polynomial time.- Maximizing Equation (6) will give a desirable rule set in terms of the goal, i.e., accuracy and small overlap.
- Apart from a low model complexity, small overlap among decision rules has been identified to be essential.
- DRS CN2 CBA m1 ax n2umb3er of4 rule5s major contribution in this work is an efficient sampling algorithm that directly samples decision rules that are discriminative and have small overlap, from an exponential-size search space, with a distribution that perfectly suits the objective.
- Potential future directions include extensions to other notions of diversity

- Table1: Datasets characteristics: n = |T |; ncls = |Y |; imbalance = maxy ∈Y |T y |/miny ∈Y |T y |; and |I | is the size of binary features
- Table2: Sensitivity to λ
- Table3: Predictive power. DRS1 is the rule set obtained in the first run of Algorithm 2 dataset model nrules nconds bacc auc div overlap iris

Related work

- Rule learning. Learning theory inspects rule learning from a computational perspective. Valiant [38] introduced PAC learning and asked whether polynomial-size DNF can be efficiently PAC-learned in a noise-free setting. This question remains open, and researchers try to attack the problem in restricted forms, however, these scenarios are less practical for real-world applications with noisy data.

Predominate practical rule-learning paradigms for rule sets include sequential covering algorithms [26], and associative classifiers [33]. The former iteratively learns one rule at a time over the uncovered data, typically by means of generalization or specialization, i.e., adding of removing a condition to the rule body [21]. Popular variants include CN2 [16] and RIPPER [17]. Associative classifiers use association rules, which are usually pre-mined using itemset-mining techniques. A set of rules is selected from candidate association rules via heuristics [33] or by optimizing an objective [4, 30, 39]. Our method falls into the second paradigm.

Funding

- This research is supported by three Academy of Finland projects (286211, 313927, 317085), the ERC Advanced Grant REBOUND (834862), the EC H2020 RIA project “SoBigData++” (871042), and the Wallenberg AI, Autonomous Systems and Software Program (WASP)

Reference

- Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In VLDB, Vol. 1215. 487–499.
- Mohammad Al Hasan and Mohammed Zaki. 2009. Musk: Uniform sampling of k maximal patterns. In ICDM. SIAM, 650–661.
- Mohammad Al Hasan and Mohammed J Zaki. 2009. Output space sampling for graph patterns. VLDB 2, 1 (2009), 730–741.
- Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, and Cynthia Rudin. 2017. Learning certifiably optimal rule lists for categorical data. JMLR 18, 1 (2017), 8753–8830.
- Roberto J Bayardo, Rakesh Agrawal, and Dimitrios Gunopulos. 1999. Constraintbased rule mining in large, dense databases. In ICDE. IEEE, 188–197.
- Mario Boley. 2007. On approximating minimum infrequent and maximum frequent sets. In International Conference on Discovery Science. Springer, 68–77.
- Mario Boley, Thomas Gärtner, and Henrik Grosskreutz. 2010. Formal concept sampling for counting and threshold-free local pattern mining. In SDM. SIAM, 177–188.
- Mario Boley and Henrik Grosskreutz. 200A randomized approach for approximating the number of frequent sets. In ICDM. IEEE, 43–52.
- Mario Boley, Claudio Lucchese, Daniel Paurat, and Thomas Gärtner. 2011. Direct local pattern sampling by efficient two-step random procedures. In KDD. ACM, 582–590.
- Mario Boley, Sandy Moens, and Thomas Gärtner. 2012. Linear space direct pattern sampling using coupling from the past. In KDD. ACM, 69–77.
- Allan Borodin, Hyun Chul Lee, and Yuli Ye. 2012. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS.
- Endre Boros, Vladimir Gurvich, Leonid Khachiyan, and Kazuhisa Makino. 2002. On the complexity of generating maximal frequent and minimal infrequent sets. In STACS. Springer, 133–141.
- Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. 2015. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM J. Comput. 44, 5 (2015), 1384–1402.
- Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, Jeremy Besson, and Mohammed J. Zaki. 2008. Origami: A novel and effective approach for mining representative orthogonal graph patterns. SADM 1, 2 (2008), 67–84.
- Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. 2007. Discriminative frequent pattern analysis for effective classification. In ICDE. IEEE, 716–725.
- Peter Clark and Tim Niblett. 1989. The CN2 induction algorithm. Machine learning 3, 4 (1989), 261–283.
- William W Cohen. 1995. Fast effective rule induction. In Machine learning proceedings 1995.
- Sanjeeb Dash, Oktay Gunluk, and Dennis Wei. 20Boolean decision rules via column generation. In NeuIPS. 4655–4665.
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
- Alex A Freitas. 2014. Comprehensible classification models: a position paper. KDD 15, 1 (2014), 1–10.
- Johannes Fürnkranz, Dragan Gamberger, and Nada Lavrač. 2012. Foundations of rule learning. Springer Science & Business Media.
- Michael R Garey and David S Johnson. 2002. Computers and intractability. Vol. 29. wh freeman New York.
- Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. In DSAA. IEEE, 80–89.
- Sreenivas Gollapudi and Aneesh Sharma. 2009. An axiomatic approach for result diversification. In WWW.
- Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, and Ram Sewak Sharma. 2003. Discovering all most specific sentences. TODS 28, 2 (2003), 140–174.
- Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data mining: concepts and techniques. Elsevier.
- Mark Jerrum. 2003.
- Subhash Khot. 2004. Ruling Out PTAS for Graph Min-Bisection, Densest Subgraph and Bipartite Clique ÐČ. (2004).
- Arno J Knobbe and Eric KY Ho. 2006. Pattern teams. In ECML PKDD. Springer, 577–584.
- Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In KDD. ACM, 1675–1684.
- Dennis Leman, Ad Feelders, and Arno Knobbe. 2008. Exceptional model mining. In ECML PKDD. Springer, 1–16.
- Zachary C Lipton. 2018. The mythos of model interpretability. Queue 16, 3 (2018), 31–57.
- Bing Liu, Wynne Hsu, Yiming Ma, et al. 1998. Integrating classification and association rule mining.. In KDD, Vol. 98. 80–86.
- Dmitry Malioutov and Kush Varshney. 2013. Exact rule learning via boolean compressed sensing. In ICML. 765–773.
- Sekharipuram S Ravi, Daniel J Rosenkrantz, and Giri Kumar Tayi. 1994. Heuristic and special case algorithms for dispersion problems. Operations Research (1994).
- Guolong Su, Dennis Wei, Kush R Varshney, and Dmitry M Malioutov. 2015. Interpretable two-level boolean rule learning for classification. arXiv preprint arXiv:1511.07361 (2015).
- Hannu Toivonen et al. 1996. Sampling large databases for association rules. In VLDB, Vol. 96. 134–145.
- Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134–1142.
- Tong Wang, Cynthia Rudin, Finale Doshi-Velez, Yimin Liu, Erica Klampfl, and Perry MacNeille. 2017. A bayesian framework for learning rule sets for interpretable classification. JMLR 18, 1 (2017), 2357–2393.
- Guizhen Yang. 2004. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD. ACM, 344–353. 2 https://github.com/lvhimabindu/interpretable_decision_sets

Full Text

Tags

Comments