## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Feature Noise Induces Loss Discrepancy Across Groups

ICML, pp.5209-5219, (2020)

EI

Keywords

Abstract

The performance of standard learning procedures has been observed to differ widely across groups. Recent studies usually attribute this loss discrepancy to an information deficiency for one group (e.g., one group has less data). In this work, we point to a more subtle source of loss discrepancy— feature noise. Our main result is that even...More

Code:

Data:

Introduction

- Standard learning procedures such as empirical risk minimization have been shown to result in models that perform well on average but whose performance differ widely across groups such as whites and non-whites (Angwin et al, 2016; Barocas and Selbst, 2016)
- This loss discrepancy across groups is especially problematic in critical applications that impact people’s lives (Berk, 2012; Chouldechova, 2017).
- The authors show that even under very favorable conditions—i.e., no bias in the prediction targets, infinite data, perfect predictive features for both groups, and no hard decisions—adding the same amount of feature noise to all individuals still leads to loss discrepancy

Highlights

- Standard learning procedures such as empirical risk minimization have been shown to result in models that perform well on average but whose performance differ widely across groups such as whites and non-whites (Angwin et al, 2016; Barocas and Selbst, 2016)
- We show that even under very favorable conditions—i.e., no bias in the prediction targets, infinite data, perfect predictive features for both groups, and no hard decisions—adding the same amount of feature noise to all individuals still leads to loss discrepancy
- What if the train and test distributions are different? Our formulation presented in Proposition 1 and 2 can be rewritten in terms of train and test distributions as follows, Counterfactual Loss Discrepancy(o+g, res) = (Λβ) ∆μz Statistical Loss Discrepancy(o+g, res) = (Λβ) (∆μz − ∆μz) Counterfactual Loss Discrepancy(o−g, res) = 0
- We first pointed out that in the presence of feature noise, the best estimate of y depends on the distribution of the inputs, which might result in loss discrepancy for groups with different distributions
- The studied loss discrepancies are not mitigated by collecting more data or designing a group-specific classifier, and designers should think of other methods such as feature replication to estimate the noise and de-noise the predictor
- Our results rely on three main points: (i) we assume the true function is linear, we study the predictor with minimum squared error among linear functions, we consider two observation functions—feature noise with and without group information

Conclusion

- The authors first pointed out that in the presence of feature noise, the best estimate of y depends on the distribution of the inputs, which might result in loss discrepancy for groups with different distributions.
- The studied loss discrepancies are not mitigated by collecting more data or designing a group-specific classifier, and designers should think of other methods such as feature replication to estimate the noise and de-noise the predictor.
- Data and experiments for this paper are available on the Codalab platform at https: //worksheets.codalab.org/worksheets/ 0x7c3fb3bf981646c9bc11c538e881f37e

Summary

## Introduction:

Standard learning procedures such as empirical risk minimization have been shown to result in models that perform well on average but whose performance differ widely across groups such as whites and non-whites (Angwin et al, 2016; Barocas and Selbst, 2016)- This loss discrepancy across groups is especially problematic in critical applications that impact people’s lives (Berk, 2012; Chouldechova, 2017).
- The authors show that even under very favorable conditions—i.e., no bias in the prediction targets, infinite data, perfect predictive features for both groups, and no hard decisions—adding the same amount of feature noise to all individuals still leads to loss discrepancy
## Conclusion:

The authors first pointed out that in the presence of feature noise, the best estimate of y depends on the distribution of the inputs, which might result in loss discrepancy for groups with different distributions.- The studied loss discrepancies are not mitigated by collecting more data or designing a group-specific classifier, and designers should think of other methods such as feature replication to estimate the noise and de-noise the predictor.
- Data and experiments for this paper are available on the Codalab platform at https: //worksheets.codalab.org/worksheets/ 0x7c3fb3bf981646c9bc11c538e881f37e

- Table1: Loss discrepancies between groups, as proved in Proposition 1 and 2. In summary: 1. Feature noise without group information (o−g) causes high SLD (first and third row), 2. Using group information reduces SLD but increases CLD (second and forth row), and 3. In loss discrepancies based on residuals the difference between mean is important while for squared error the difference between variances is important
- Table2: Statistics of the used datasets. Size of the first group is denoted by P[g = 1] and ∆μy and ∆σy2 denote the difference of mean and variance of the prediction target between groups, respectively

Related work

**Related Work and Discussion**

While many papers focus on measuring loss discrepancy (Kusner et al, 2017; Hardt et al, 2016; Pierson et al, 2017; Simoiu et al, 2017; Khani et al, 2019) and mitigating loss discrepancy (Calmon et al, 2017; Hardt et al, 2016; Zafar et al, 2017), there are relatively few that study how loss discrepancy arises in machine learning models.

Chen et al (2018) decompose the loss discrepancy into three components—bias, variance, and noise. They mainly focus on the bias and variance, and also consider scenarios in which available features are not equally predictive for both groups. There are also lines of work which assume the loss discrepancy of the model is because of biased target values (e.g., Madras et al (2019)). Some work states that high loss discrepancy is due to lack of data for minority groups (Chouldechova and Roth, 2018). Some assume different groups have different (sometime in conflict with each other) functions (Dwork et al, 2018), and therefore, fitting the same model for both groups is suboptimal. In this work, we showed even when the prediction target is correct (not biased), with infinite data, the same function for both groups, equal noise for both groups, there is still loss discrepancy.

Funding

- This work was supported by Open Philanthropy Project Award

Reference

- Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., and Wallach, H. (2018). A reductions approach to fair classification. In International Conference on Machine Learning (ICML), pages 60–69.
- Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. ProPublica, 23.
- Arrow, K. (1973). The theory of discrimination. Discrimination in labor markets, 3(10):3–33.
- Barocas, S. and Selbst, A. D. (2016). Big data’s disparate impact.
- Bechavod, Y. and Ligett, K. (2017). Penalizing unfairness in binary classification. arXiv preprint arXiv:1707.00044.
- Berk, R. (2012). Criminal justice forecasts of risk: A machine learning approach. Springer Science & Business Media.
- Bertrand, M. and Mullainathan, S. (2004). Are emily and greg more employable than lakisha and jamal? a field experiment on labor market discrimination. American economic review, 94(4):991–1013.
- Calmon, F., Wei, D., Vinzamuri, B., Ramamurthy, K. N., and Varshney, K. R. (2017). Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems (NeurIPS), pages 3992– 4001.
- Canetti, R., Cohen, A., Dikkala, N., Ramnarayan, G., Scheffler, S., and Smith, A. (2019). From soft classifiers to hard decisions: How fair can we be? In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 309–318.
- Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC.
- Chen, I., Johansson, F. D., and Sontag, D. (2018). Why is my classifier discriminatory? In Advances in Neural Information Processing Systems (NeurIPS), pages 3539– 3550.
- Chiappa, S. (2019). Path-specific counterfactual fairness. In Association for the Advancement of Artificial Intelligence (AAAI), volume 33, pages 7801–7808.
- Chouldechova, A. (2017). A study of bias in recidivism prediciton instruments. Big Data, pages 153–163.
- Chouldechova, A. and Roth, A. (2018). The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810.
- Corbett-Davies, S. and Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023.
- Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., and Huq, A. (2017). Algorithmic decision making and the cost of fairness. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 797–806.
- Cortez, P. and Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. Proceedings of 5th FUture BUsiness TEChnology Conference.
- Dwork, C., Immorlica, N., Kalai, A. T., and Leiserson, M. (2018). Decoupled classifiers for group-fair and efficient machine learning. In Conference on Fairness, Accountability and Transparency, pages 119–133.
- Freedman, D. A. (2004). Graphical models for causation, and the identification problem. Evaluation Review, 28(4):267–293.
- Frisch, R. (1934). Statistical confluence analysis by means of complete regression systems, volume 5. Universitetets Økonomiske Instituut.
- Fuller, W. A. (2009). Measurement error models, volume 305. John Wiley & Sons.
- Greiner, D. J. and Rubin, D. B. (2011). Causal effects of perceived immutable characteristics. Review of Economics and Statistics, 93(3):775–785.
- Hardt, M., Price, E., and Srebo, N. (2016). Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems (NeurIPS), pages 3315– 3323.
- Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396):945– 960.
- Holland, P. W. (2003). Causation and race. ETS Research Report Series, 2003(1).
- Khani, F., Raghunathan, A., and Liang, P. (2019). Maximum weighted loss discrepancy. arXiv preprint arXiv:1906.03518.
- Kilbertus, N., Carulla, M. R., Parascandolo, G., Hardt, M., Janzing, D., and Scholkopf, B. (2017). Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems (NeurIPS), pages 656–666.
- Kusner, M. J., Loftus, J. R., Russell, C., and Silva, R. (2017). Counterfactual fairness. In Advances in Neural Information Processing Systems (NeurIPS), pages 4069–4079.
- Lipton, Z., McAuley, J., and Chouldechova, A. (2018). Does mitigating ml’s impact disparity require treatment disparity? In Advances in Neural Information Processing Systems (NeurIPS), pages 8125–8135.
- Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., and Hardt, M. (2018). Delayed impact of fair machine learning. arXiv preprint arXiv:1803.04383.
- Loftus, J. R., Russell, C., Kusner, M. J., and Silva, R. (2018). Causal reasoning for algorithmic fairness. arXiv preprint arXiv:1805.05859.
- Madras, D., Creager, E., Pitassi, T., and Zemel, R. (2019). Fairness through causal awareness: Learning causal latent-variable models for biased data. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 349–358.
- Simoiu, C., Corbett-Davies, S., Goel, S., et al. (2017). The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics, 11(3):1193– 1216.
- Wightman, L. F. and Ramsey, H. (1998). LSAC national longitudinal bar passage study. Law School Admission Council.
- Woodworth, B., Gunasekar, S., Ohannessian, M. I., and Srebro, N. (2017). Learning non-discriminatory predictors. In Conference on Learning Theory (COLT), pages 1920–1953.
- Zafar, M. B., Valera, I., Rodriguez, M. G., and Gummadi, K. P. (2017). Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In World Wide Web (WWW), pages 1171– 1180.
- Nabi, R. and Shpitser, I. (2018). Fair inference on outcomes. In Association for the Advancement of Artificial Intelligence (AAAI).
- Phelps, E. S. (1972). The statistical theory of racism and sexism. The american economic review, 62(4):659–661.
- Pierson, E., Corbett-Davies, S., and Goel, S. (2017). Fast threshold tests for detecting discrimination. arXiv preprint arXiv:1702.08536.
- Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., and Weinberger, K. Q. (2017). On fairness and calibration. In Advances in Neural Information Processing Systems (NeurIPS), pages 5684–5693.
- Redmond, M. and Baveja, A. (2002). A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 141(3):660–678.
- Rosenbaum, P. R. (1984). The consequences of adjustment for a concomitant variable that has been affected by the treatment. Journal of the Royal Statistical Society: Series A (General), 147(5):656–666.
- Sen, M. and Wasow, O. (2016). Race as a bundle of sticks: Designs that estimate effects of seemingly immutable characteristics. Science, 19.
- Sherman, J. and Morrison, W. J. (1950). Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics, 21(1):124–127.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn