solas_disparity.categorical_adverse_impact_ratio#

solas_disparity.categorical_adverse_impact_ratio(group_data: pandas.core.frame.DataFrame, protected_groups: List[str], reference_groups: List[str], group_categories: List[str], outcome: pandas.core.series.Series, air_threshold: float, percent_difference_threshold: float, category_order: List[str], label: Optional[pandas.core.series.Series] = None, sample_weight: Optional[pandas.core.series.Series] = None, max_for_fishers: int = 100) solas_disparity.types._disparity.Disparity#

Calculate the Adverse Impact Ratio for a set of favorability-ordinal categorical outcomes.

AIR is defined as the percentage of favorable outcomes of the protected group divided by the percentage of favorable outcomes of the reference group.

\[\text{AIR}_\text{Protected Group} = \frac{\text{% Favorable Outcome}_\text{Protected Group}}{\text{% Favorable Outcome}_\text{Reference Group}}\]
An AIR is considered practically significant if the AIR is:
  1. less than a chosen air_threshold,

  2. statistically significantly different than parity,

  3. AND greater than a chosen percent_difference_threshold.

Parameters
  • group_data (DataFrame) – Dataframe containing columns for group data.

  • protected_groups (List[str]) – List of protected groups.

  • reference_groups (List[str]) – List of reference groups with the same length as protected_groups.

  • group_categories (List[str]) – List of group categories to which each protected and reference group pair belongs to (e.g. race, gender, age, etc.). Has the same length as protected_groups.

  • outcome (Series) – Outcome series of elements of the set category_order.

  • air_threshold (float) – Adverse Impact Ratio threshold value.

  • percent_difference_threshold (float) – Percent difference threshold value. For example, a 20% difference is input as percent_difference_threshold=0.2.

  • category_order (List[str]) – Series of outcome categories in ascending order of favorability (e.g. ["bad", "good", "great", "best"]).

  • label (Optional[Series], optional) – Label, true outcome, and/or target series evaluated alongside outcome. Defaults to None.

  • sample_weight (Optional[Series], optional) – Sample weight series. Has the same length as group_data. Defaults to None.

  • max_for_fishers (int, optional) – Maximum value of samples for Fisher’s exact test to be used. Defaults to MAX_FOR_FISHERS.

Returns

Object containing results of the disparity calculation.

Return type

Disparity