standardized_mean_difference
standardized_mean_difference#
- solas_disparity.standardized_mean_difference(...)#
Calculate the Standardized Mean Difference (SMD) for a given set of protected and reference groups.
The SMD is the mean outcome of a protected group minus the mean outcome of a reference group all divided by the standard deviation of outcomes.
\[\text{SMD}_\text{Protected Group} = \frac{\text{Mean Outcome}_{\text{Protected Group}} - \text{Mean Outcome}_{\text{Reference Group}}}{s}\]An SMD is considered practically significant if
lower_score_favorable=True
and the SMD is:greater than a chosen
smd_threshold
AND statistically significantly different than zero.
Alternatively, an SMD is considered practically significant if
lower_score_favorable=False
and the SMD is:lesser than a chosen
smd_threshold
AND statistically significantly different than zero.
- Parameters
group_data (DataFrame) – Dataframe containing columns for group data.
protected_groups (List[str]) – List of protected groups.
reference_groups (List[str]) – List of reference groups with the same length as
protected_groups
.group_categories (List[str]) – List of group categories to which each protected and reference group pair belongs to (e.g. race, gender, age, etc.). Has the same length as
protected_groups
.outcome (Series) – Outcome series.
smd_threshold (float) – Standardized Mean Difference threshold value.
lower_score_favorable (bool, optional) – Whether a lower value of
outcome
is favorable. Defaults to True.label (Optional[Series], optional) – Label, true outcome, and/or target series evaluated alongside
outcome
. Defaults to None.sample_weight (Optional[Series], optional) – Sample weight series. Has the same length as
group_data
. Defaults to None.smd_denominator (SMDDenominator, optional) – Standardized Mean Difference denominator calculation. Defaults to SMDDenominator.POPULATION.
- Returns
Object containing results of the disparity calculation.
- Return type