adverse_impact_ratio
adverse_impact_ratio#
- solas_disparity.disparity.adverse_impact_ratio(...)#
Calculate the Adverse Impact Ratio (AIR) for a given set of protected and reference groups.
AIR is defined as the percentage of favorable outcomes of the protected group divided by the percentage of favorable outcomes of the reference group. For example, if 10% of Black or African American applicants were to receive a loan offer, but 20% of Non-Hispanic White applicants were to receive a loan offer, the AIR is equal to 10% / 20% = 0.50.
\[\text{AIR}_\text{Protected Group} = \frac{\text{% Favorable Outcome}_\text{Protected Group}}{\text{% Favorable Outcome}_\text{Reference Group}}\]Or, in terms of the confusion matrix:
\[\text{AIR}_\text{Protected Group} = \frac{\frac{P_{\text{Protected Group}}}{(P_{\text{Protected Group}} + N_{\text{Protected Group}})}}{\frac{P_{\text{Reference Group}}}{(P_{\text{Reference Group}} + N_{\text{Reference Group}})}}\]Where \(P\) represents the positive outcomes (i.e., \(TP\) + \(FP\)) and \(N\) represents the negative outcomes (i.e., \(TN\) + \(FN\)). Importantly, for this implementation of the AIR, we consider \(P\) to be favorable from the perspective of the person being scored by the model.
In some academic literature, the AIR is called the “disparate impact” metric. While it is true that the AIR may be used as a measure of disparate impact, other metrics are used to measure disparate impact in the legal sense of the word. Additionally, the AIR can also be used to measure other forms of discrimination, such as disparate treatment.
- An AIR is considered practically significant if the AIR is:
less than a chosen
air_threshold
,statistically significantly different than parity,
AND if its percent difference greater than a chosen
percent_difference_threshold
.
- Parameters
group_data (pd.DataFrame) – Dataframe containing columns for group data.
protected_groups (List[str]) – List of protected groups.
reference_groups (List[str]) – List of reference groups with the same length as
protected_groups
.group_categories (List[str]) – List of group categories to which each protected and reference group pair belongs to (e.g. race, gender, age, etc.). Has the same length as
protected_groups
.outcome (pd.Series) – Boolean outcome series where a value of
1
is assumed to be favorable.air_threshold (float) – Adverse Impact Ratio threshold value.
percent_difference_threshold (float) – Percent difference threshold value. For example, a 20% difference is input as
percent_difference_threshold=0.2
.label (Optional[pd.Series], optional) – Boolean label, true outcome, and/or target series evaluated alongside
outcome
. Defaults to None.sample_weight (Optional[pd.Series], optional) – Sample weight series. Has the same length as
group_data
. Defaults to None.max_for_fishers (int, optional) – Maximum value of samples for Fisher’s exact test to be used. Defaults to MAX_FOR_FISHERS.
shortfall_method (Optional[types.ShortfallMethod], optional) – Method used for shortfall calculation. Defaults to ShortfallMethod.TO_REFERENCE_MEAN.
- Returns
Object containing results of the disparity calculation.
- Return type