solas_disparity.residual_standardized_mean_difference
solas_disparity.residual_standardized_mean_difference#
- solas_disparity.residual_standardized_mean_difference(group_data: pandas.core.frame.DataFrame, protected_groups: List[str], reference_groups: List[str], group_categories: List[str], prediction: pandas.core.series.Series, label: pandas.core.series.Series, residual_smd_threshold: float, lower_score_favorable: bool = True, sample_weight: Optional[pandas.core.series.Series] = None, residual_smd_denominator: Union[solas_disparity.types._residual_smd_denominator.ResidualSMDDenominator, str] = ResidualSMDDenominator.POPULATION)#
Calculate the Standardized Mean Difference of residuals for a given set of protected and reference groups.
A residual is the label value minus the predicted value.
\[\text{Residual} = y - \hat{y}\]The Residual SMD is the mean residual of a protected group minus the mean residual of a reference group all divided by the standard deviation of residuals.
\[\text{Residual SMD}_\text{Protected Group} = \frac{\text{Mean Residual}_{\text{Protected Group}} - \text{Mean Residual}_{\text{Reference Group}}}{s}\]- A Residual SMD is considered practically significant if
lower_score_favorable=True
and the Residual SMD is: greater in magnitude than a chosen
residual_smd_threshold
AND statistically significantly different than zero.
- Alternatively, a Residual SMD is considered practically significant if
lower_score_favorable=False
and the Residual SMD is: lesser in magnitude than a chosen
residual_smd_threshold
AND statistically significantly different than zero.
- Parameters
group_data (DataFrame) – Dataframe containing columns for group data.
protected_groups (List[str]) – List of protected groups.
reference_groups (List[str]) – List of reference groups.
group_categories (List[str]) – List of group categories.
prediction – (Series): Predictions series.
label (Series) – Label, true outcome, and/or target series evaluated alongside
outcome
. Defaults to None.residual_smd_threshold (float) – Residual Standardized Mean Difference threshold value.
lower_score_favorable (bool, optional) – Whether a lower value of
label
-prediction
is favorable. Defaults to True.sample_weight – Optional[Series]: Sample weight series. Has the same length as
group_data
. Defaults to None.residual_smd_denominator (Union[ResidualSMDDenominator, str], optional) – Residual Standardized Mean Difference denominator calculation. Defaults to ResidualSMDDenominator.POPULATION.
- Returns
Object containing results of the disparity calculation.
- Return type
- A Residual SMD is considered practically significant if