# Highlights from NeurIPS

## A brief summary of some interesting papers related to ethical machine learning

Jan 29, 2019

Two of us from Gradient institute were lucky enough to attend NeurIPS 2018 as co-organisers of the Workshop on Ethical, Social and Governance Issues in AI. With over 1000 accepted papers at the main conference, we only had time to see a small fraction of the amazing work on display. In this post we give a brief summary of three interesting papers on the topic of discrimination and fairness in machine learning. Concerns around fairness and discrimination arise in machine learning when some metric of an algorithm’s predictions (such as accuracy, number of people classified high risk or error rates) differs between groups of people. These groups are typically defined in terms of an attribute deemed sensitive, such as race, gender, religion, etc.

### 1) Does mitigating ML’s impact disparity require treatment disparity?

full paper, Zachary C. Lipton, Alexandra Chouldechova and Julian McAuley

This paper focuses on machine learning approaches to reducing disparate impact - differences in the distributions of predictions between groups. They highlight some limitations of algorithms that make use of the sensitive attribute only at training time (and not at prediction time). This includes a large family of fair machine learning algorithms, including both those that add fairness constraints or regularising terms to the loss function and those based on pre-processing the training data. Such algorithms are promoted on the basis that making decisions explicitly on the basis of a sensitive attribute is regarded as discriminatory (disparate treatment).

Let $A$ represent a sensitive attribute such as race or gender, $X$ the set of other features (or columns) and $Y$ be the outcome we want to predict. This paper demonstrates that ignoring the sensitive attribute, $A$, at decision time is sub-optimal, in terms of a joint loss over accuracy and outcome discrepancy, when compared with estimating the probability given all information $p = P(Y = 1|X=x,A=a)$ and then applying separate thresholds for each group $\hat{y} = \unicode{x1D7D9}(p > \text{threshold}_a)$. The core reasoning behind the optimality of separate thresholds is very simple. If we knew $p$, the Bayes optimal classifier would be obtained by classifying all instances with $p > .5$ as positive. Let’s say we are developing a hiring algorithm. If the proportion of women classified as ‘hire’ ($Y = 1$) is lower than the proportion of men and we want to reduce this discrepancy, then we will have to flip the classification of some women from ‘no hire’ to ‘hire’ and/or change the classification of some men from ‘hire’ to ‘no hire’. To minimise the impact on accuracy, we should pick women with scores just less than $.5$ and men with scores just greater than $.5$. We cannot do any better than this. Algorithms that don’t utilise the sensitive attributes at decision time can only achieve the same level of performance if they make identical decisions, which can occur if the sensitive attribute $A$ can be predicted perfectly from the features $X$. However, in that case it is disingenuous to suggest that the algorithm is not making a decision on the basis of $A$.

Under the hood, all these algorithms rely (at least implicitly) on predicting $A$ from $X$. If $A$ were independent of $X$ then the only way to ensure equalised outcomes would be with an uninformative strategy, such as constant or completely random predictions. The paper illustrates how adjusting the predictions of a model based on inferred rather than actual values of the sensitive attribute can lead to undesirable side effects. For instance, in the hiring example - such algorithms would reduce the probability that women who resembled men in the data (for example because they browsed machine learning blogs) would be hired, while improving the odds for men who resembled women. This can result in the most qualified candidates in the under-represented group being less likely to be hired under the ‘fair’ algorithm than under an algorithm optimised purely for accuracy.

There will be cases in which obtaining sensitive attributes at decision time is not feasible - which makes methods that require this information only at training time valuable. However, we should recognise their limitations - and that they do not fundamentally address the ethical concerns that underlie our discomfort with making decisions explicitly on the basis of sensitive attributes.

### 2) Why is my classifier discriminatory

full paper, Irene Y. Chen, Fredrik D. Johansson and David Sontag

This paper takes a novel perspective on fair machine learning. Most work so far has focused on promoting fairness by trading it off against accuracy. Motivated by applications in which the cost of reduced accuracy is too high, the authors point out that there is another avenue to pursue fairness: collecting more data.

More precisely, they remind us that certain popular loss functions (such as zero-one and squared losses) can be decomposed into three components: bias, variance, and noise, the purpose being to assess if improvements are most likely to result from refining the model, collecting more observations or collecting more features per observation.

They argue that differences in bias between a protected minority group and the remaining majority are often due to the model not being equally good at making predictions for both groups and that existing approaches for inducing fairness have addressed this issue by trading-off prediction accuracy in one group against another.

They then make the observation that the complementary strategy of reducing differences in variance or noise between the groups has been largely neglected. Differences in variance arise due to differences in sample sizes between the groups, as well as differences in the intrinsic variance of features associated with each group. Differences in noise (or Bayes error, the minimum achievable error given the existing features), on the other hand, are fundamentally due to the fact that existing features better predict one group than another.

Motivated by this observation, they go on to investigate empirically how increases in sample size are able to improve fairness in three real-world datasets. They also propose using clustering or topic modelling to identify which clusters have particularly large accuracy gaps between both groups so as to prioritise data collection within those clusters.

Overall, the paper provides an important example of how there may be better options to remediate discrepancies in prediction than changing the model (in this case collecting more data of a specific nature). However, this is only one approach to understanding the cause of discrepancies in predictions - one still focused on the work of a data scientist, data collection and modelling.

### 3) Equality of opportunity in classification: a causal approach

full paper, Junzhe Zhang and Elias Bareinboim

This paper proposes an alternative way of decomposing an observed discrepancy in predictions outcomes. They show how, given a structural causal model for the relationship between the sensitive attribute, features and predictor, the difference in error rates of the predictor can be decomposed into that resulting from direct, indirect and spurious causal paths. It extends on earlier work providing a similar decomposition for unconditional fairness metrics such as disparate impact.

A direct causal path from a sensitive attribute, $A$, to a predictor $\hat{Y}$ means that the predictive function depends explicitly on $A$ - eg it makes predictions based on the sensitive attribute at test time and that there exist rows of data for which, holding all other features constant, flipping the value of the sensitive attribute can change the prediction.

An indirect (causal) path from $A$ to $\hat{Y}$ means there is a causal relationship between $A$ and the predictor but that it is mediated by some other variable (eg $W$ in figure (b)). In other words if we were to imagine intervening on a sensitive attribute, and flip both its value and the value of other features that were consequences of it in the data, then the prediction for that row could change.

Finally, a spurious (non-causal) path means that $A$ and $\hat{Y}$ are correlated but not causally related, for example as shown in figure (c). If $A$ and $\hat{Y}$ are connected only indirectly, then intervening on $A$, and flipping only variables caused by $A$ would not change the prediction.

While the existence of direct, indirect and spurious causal paths can be seen immediately from the causal graph, it is not always straightforward to estimate how much influence each of these pathways has equality of opportunity. The paper provide a formula to decompose the discrepancy in equality of opportunity into that caused by direct, indirect and spurious paths if the causal structure is fully specified, and contains no unobserved/latent variables. However, if the causal graph contains latent variables, it is not always possible to separately estimate the terms of the decomposition, even given infinite observational data (ie the decomposition may not be identifiable). They give a sufficient condition for identifiability, and propose a method to estimate the decomposition when it is satisfied. Note that unlike the Why is my Classifier Discriminatory paper, this decomposition is solely a property of the causal structure and joint distribution over the sensitive attribute, other features and the predictor. It does not say anything about the expected discrepancies due to finite samples.

However, what remains unclear is how to connect this decomposition to what we really want in a fair classifier. When would a influence along a particular form of causal path (direct, indirect or spurious) not be considered unfair? If we only cared about direct discrimination, then fairness via unawareness would be sufficient - all we have to do is exclude the sensitive attribute from the training data. However, if we define unfairness this narrowly, then it is almost always possible to make a classifier fair with minimal loss of accuracy because we need only find some set of variables from which we can predict the sensitive attribute. If we care about direct and indirect (causal) paths but not spurious ones, then the decomposition is very closely related to interventional notions of fairness previously explored in the literature. But it’s far from clear that spurious paths are irrelevant from a fairness perspective. Think about the kind of variables that might cause a sensitive attribute such as race (eg parents race). It is hard to think of an instance in which discrepancies in outcomes caused by such variables are not similarly controversial to discrepancies caused by the sensitive attribute. If we decide that causes of sensitive attributes should be regarded as sensitive themselves, it is not clear that spurious paths are possible.

}