Avoiding Disparate Impact with Counterfactual Distributions


Hao Wang, Berk Ustun, and Flavio P. Calmon. 2018. “Avoiding Disparate Impact with Counterfactual Distributions.” In NeurIPS Workshop on Ethical, Social and Governance Issues in AI.
wesgai18.pdf466 KB


When a classification model is used to make predictions on individuals, it may be undesirable or illegal for the performance of the model to change with respect to a sensitive attribute such as race or gender. In this paper, we aim to evaluate and mitigate such disparities in model performance through a distributional approach. Given a black-box classifier that performs unevenly across sensitive groups, we consider a counterfactual distribution of input variables that minimizes the performance gap. We characterize properties of counterfactual distributions for common fairness criteria. We then present novel machinery to efficiently recover counterfactual distributions given a sample of points from its target population. We describe how counterfactual distributions can be used to avoid discrimination between protected groups by: (i) identifying proxy variables to omit in training; and (ii) building a preprocessor that can mitigate discrimination. We validate both use cases through experiments on a real-world dataset.
Last updated on 10/26/2021