Ignorance Isn't Bliss

How human intuitions about AI can lead to unfair outcomes

Dec 6, 2018
Reading time: 13 minutes

Note, we recommend reading the preceding post before this one.

Societies are increasingly, and legitimately, concerned that automated decisions based on historical data can lead to unfair outcomes for disadvantaged groups. One of the most common pathways to unintended discrimination by AI systems is that they perpetuate historical and societal biases when trained on historical data. This is because an AI has no wider knowledge to distinguish between bias and legitimate selection.

In this post we investigate whether we can improve the fairness of a machine learning model by removing sensitive attribute fields from the data. By sensitive attributes we mean attributes that the organisation responsible for the system does not intend to discriminate against because of societal norms, law or policy - for example, gender, race, religion.

How would you feel if you knew that someone was using your gender or race when determining your suitability for a job? What if that someone was an AI? Would you prefer a system that was ignorant of these attributes? Intuitively, we would expect ignoring such attributes to make the AI fair and therefore lead to more equitable outcomes. Let’s look at the behaviour of a machine learning system on an illustrative data set that we use to explore this scenario.

Automating the hiring process

You are a data scientist and you’ve been tasked by a large organisation to implement a scalable automated application screening process to select potential employees from a large pool of applicants.

In this scenario, you’ve been told to write software to select candidates based on their suitability to the role, but suitability isn’t something that we can easily observe in data. Instead you could realistically use a large set of historical examples of whether an applicant was hired by the company after an interview as a proxy for suitability. This measure is closely related to suitability, but may be unintentionally biased through historical or societal factors.

For simplicity, let’s condense the features (or data fields) collected about each applicant into an experience feature and a gender-flag feature. Experience is an approximate indication of suitability for a role but, due to societal reasons, also depends on the sensitive attribute, gender. In reality, there are many features in an application that conflate suitability and sensitive attributes to varying extents however we’ll just use one in this example for simplicity.

We’ve generated a synthetic data set using the above assumptions. The table below lists some typical applicants and their features. You can access the code used to create the simulated data here.

ID Experience Gender Hired
898 14 Male Yes
2343 6 Male No
2398 16 Female Yes
5906 16 Female Yes
9394 15 Male No

Digging into the data, you would notice that there’s a decent dependency (correlation coefficient of 0.49) between someone’s experience and their perceived suitability or likelihood of being hired.

Histogram of experience for hired and not-hired applicants.
Histogram of experience for hired and not-hired applicants.

Historically, roughly 20% of the cohort of applicants were hired. Within the historical data, there’s a very weak association between someone’s gender and whether or not that person was hired (correlation coefficient of about 0.06). Given the gender is a protected attribute, this link might be something that we should investigate after you’ve built the classifier.

A typical approach

The model’s role is to ingest features extracted from the applications, and predict whether each candidate will be suitable for the role or not. To get the best accuracy, you would use all of the features available to you to train the model. Training a logistic regressor on four fifths of the applicants (the training set) gives you a predictive accuracy of about 77% on the remaining one fifth of historical applicants (the test set). Crucially, the test set allows us to assess the predictive accuracy of our model because this data is historical and we know whether or not each individual was in fact hired after an interview.

The image below shows the ratio of correctly predicted candidates to incorrectly predicted candidates applying the model to the historical test data. The candidates in this representative sample have been coloured green and red to indicate whether they were hired or not, respectively, according to the historical data. Those that the model actually selected to be hired are bold, while those it chose to reject are lighter:

Icons representing the various categories of applicant.
Accuracy of the standard model.
Accuracy of the standard model.

If you are worried about how fair your model is across genders, you might wisely decide to break down the accuracy score over the sensitive attribute. A couple of lines of code later, you’d see that the model has an accuracy of 77% on men versus 78% on women.

Accuracy of the standard model on the male cohort.
Accuracy of the standard model on the male cohort.
Accuracy of the standard model on the female cohort.
Accuracy of the standard model on the female cohort.

The visualisation above shows a difference in the error rate between males and female. It’s not uncommon to observe substantial variations in accuracy metrics across different sub-cohorts. Inherent dependencies between race or gender and features of interest are frequent occurrences. Perhaps, naively, you might be happy that our model is more accurate on the minority group.

However, under closer scrutiny, it becomes apparent that women are actually disadvantaged in the selection process. Despite representing about 66% of the total applicants, males make up 73% of the cohort selected to be hired by the algorithm.

Gender breakdown across all applicants.
Gender breakdown across all applicants.
Gender breakdown across selected applicants.
Gender breakdown across selected applicants.

It’s possible that this discrepancy is due to a historical bias against women in the training data and is not consistent with the company’s notion of fair outcomes. When there is bias in the training data, a model’s predictions will perpetuate it because the model has no wider knowledge to discern actual suitability from discrimination.

An intuitive approach that you might adopt in an attempt to remedy this issue (and an approach that we’ve commonly seen used in practice) is to remove the protected attributes from the data so the model is ignorant or “unaware” of race or gender. Let’s see what impact this has on the model.

Fairness through unawareness

Retraining the model using only the experience feature, we notice a drop in the accuracy on the test set (from 77% to 74%). Perhaps this shouldn’t be surprising as we’re providing the model with less information.

Accuracy of the unaware model.
Accuracy of the unaware model.

However, when we compare the gender bias in the selected cohort before and after removing the protected attribute, the results are quite counter-intuitive.

Selected applicants for the unaware model.
Selected applicants for the unaware model.

The percentage of females in the selected group has shrunk from 27% to just 17%. Things are equally bad when we look at equality of opportunity, the fraction of applicants that were hired and the algorithm also correctly selected (referred to as recall):

Fraction of females in the test set that were hired and were also selected by the unaware algorithm.
Fraction of females in the test set that were hired and were also selected by the unaware algorithm.
Fraction of males in the test set that were hired and were also selected by the unaware algorithm.
Fraction of males in the test set that were hired and were also selected by the unaware algorithm.
Standard Model Unaware Model
Selected Cohort 74% male 83% male
Recall on female cohort 58 49
Recall on male cohort 80 88

So despite the intuition that removing the protected feature from the set would help remove discriminatory historical bias, we’ve actually made things considerably worse for women.

What went wrong?

Our intuition has led us astray - the outcomes were much worse for women when we deleted gender from the data.

There are two factors at play here.

First, removing just the protected attributes from the model rarely makes the model “unaware” of gender or race. Often the remaining features, or combinations of them, still encode the protected attributes. For example, strong dependencies exist between address and ethnicity or between extra-curricular activities and socio-economic background. In a job application context, features such as the choice of degree majors, historical employers, part time and flexible work choices, and extra curricular activities can together form a strong indication of the gender of a candidate without explicitly specifying it.

In this simple example, the fact that the data indicates that women are likely to have less experience than men has enabled a basic classifier to predict gender with 66% weighted accuracy despite only having access to one feature (experience). In real populations, such a relationship might be due to factors such as women taking extended leave periods while raising a family or valuing flexible working conditions more highly than their male counterparts. This factor acts to diminish the impact of deleting the gender column on the outcomes of the model.

Histogram of male and female experience.
Histogram of male and female experience.

Secondly, in our demonstration, the experience feature is conflated with gender. The standard model was actually using the gender feature to compensate for the fact that many suitable women in the data set had less experience than their male counterparts. In fact, if we inspect the predictive weights of the original model,

Feature Weight
Experience 0.56
Gender = Male -1.03

we can see that it was actually explicitly reducing the odds of a male being eligible (by giving that feature a negative weight) to compensate for their higher expected experience. Without access to the gender feature, the unaware model groups suitable women with less suitable men because it can’t distinguish between the groups as effectively as the standard model. This results in the model selecting a large number of unsuitable males while ignoring suitable females. We need to know gender to correctly interpret experience, otherwise it makes women seem relatively less experienced and reduces their chances of employment.

In general, the effect of removing the protected attribute from the data is variable. In some cases it may improve the equity of outcomes, but there is also a good chance it will be detrimental, as was the case on this data.


  • Training a model on real data will invariably involve using proxies for the desired attribute, such as using “was hired” as a proxy for job suitability. These proxies introduce an unknown degree of historical and societal bias into the model, so in general we have to accept that no dataset will be unbiased, and consequently no naive model will be unbiased. As a community, we need to inform both high-level decision makers and the public of the inherent risks of bias and the existence of strategies to mitigate the damage.

  • When we try to make a model ethically-aware, our intuitions often lead us astray. A very reasonable, intuitive thought such as “we should remove gender from the data” could have unintended, even detrimental consequences for a disadvantaged group as our worked example has shown. Instead of drawing on naive intuitions, data scientists need to draw on understanding of how bias can seep into AI and how the bias can be reduced - from the manner in which the problem is framed through to the data collection and policy implementation.

  • While we should take actions to make our AI ethically aware, it is also important to validate that our efforts are actually helping the disadvantaged group. Developers would benefit from tools to assess the impact of their actions on the fairness of outcomes.


The code used to perform the above analysis is available online here.