solas_disparity.residual_standardized_mean_difference#

solas_disparity.residual_standardized_mean_difference(group_data: pandas.core.frame.DataFrame, protected_groups: List[str], reference_groups: List[str], group_categories: List[str], prediction: pandas.core.series.Series, label: pandas.core.series.Series, residual_smd_threshold: float, lower_score_favorable: bool = True, sample_weight: Optional[pandas.core.series.Series] = None, residual_smd_denominator: Union[solas_disparity.types._residual_smd_denominator.ResidualSMDDenominator, str] = ResidualSMDDenominator.POPULATION)#

Calculate the Standardized Mean Difference of residuals for a given set of protected and reference groups.

A residual is the label value minus the predicted value.

\[\text{Residual} = y - \hat{y}\]

The Residual SMD is the mean residual of a protected group minus the mean residual of a reference group all divided by the standard deviation of residuals.

\[\text{Residual SMD}_\text{Protected Group} = \frac{\text{Mean Residual}_{\text{Protected Group}} - \text{Mean Residual}_{\text{Reference Group}}}{s}\]
A Residual SMD is considered practically significant if lower_score_favorable=True and the Residual SMD is:
  1. greater in magnitude than a chosen residual_smd_threshold

  2. AND statistically significantly different than zero.

Alternatively, a Residual SMD is considered practically significant if lower_score_favorable=False and the Residual SMD is:
  1. lesser in magnitude than a chosen residual_smd_threshold

  2. AND statistically significantly different than zero.

Parameters
  • group_data (DataFrame) – Dataframe containing columns for group data.

  • protected_groups (List[str]) – List of protected groups.

  • reference_groups (List[str]) – List of reference groups.

  • group_categories (List[str]) – List of group categories.

  • prediction – (Series): Predictions series.

  • label (Series) – Label, true outcome, and/or target series evaluated alongside outcome. Defaults to None.

  • residual_smd_threshold (float) – Residual Standardized Mean Difference threshold value.

  • lower_score_favorable (bool, optional) – Whether a lower value of label - prediction is favorable. Defaults to True.

  • sample_weight – Optional[Series]: Sample weight series. Has the same length as group_data. Defaults to None.

  • residual_smd_denominator (Union[ResidualSMDDenominator, str], optional) – Residual Standardized Mean Difference denominator calculation. Defaults to ResidualSMDDenominator.POPULATION.

Returns

Object containing results of the disparity calculation.

Return type

Disparity