solas_disparity.standardized_mean_difference#

solas_disparity.standardized_mean_difference(group_data: pandas.core.frame.DataFrame, protected_groups: List[str], reference_groups: List[str], group_categories: List[str], outcome: pandas.core.series.Series, smd_threshold: float, lower_score_favorable: bool = True, label: Optional[pandas.core.series.Series] = None, sample_weight: Optional[pandas.core.series.Series] = None, smd_denominator: solas_disparity.types._smd_denominator.SMDDenominator = SMDDenominator.POPULATION) solas_disparity.types._disparity.Disparity#

Calculate the Standardized Mean Difference (SMD) for a given set of protected and reference groups.

The SMD is the mean outcome of a protected group minus the mean outcome of a reference group all divided by the standard deviation of outcomes.

\[\text{SMD}_\text{Protected Group} = \frac{\text{Mean Outcome}_{\text{Protected Group}} - \text{Mean Outcome}_{\text{Reference Group}}}{s}\]

An SMD is considered practically significant if lower_score_favorable=True and the SMD is:

  1. greater than a chosen smd_threshold

  2. AND statistically significantly different than zero.

Alternatively, an SMD is considered practically significant if lower_score_favorable=False and the SMD is:

  1. lesser than a chosen smd_threshold

  2. AND statistically significantly different than zero.

Parameters
  • group_data (DataFrame) – Dataframe containing columns for group data.

  • protected_groups (List[str]) – List of protected groups.

  • reference_groups (List[str]) – List of reference groups with the same length as protected_groups.

  • group_categories (List[str]) – List of group categories to which each protected and reference group pair belongs to (e.g. race, gender, age, etc.). Has the same length as protected_groups.

  • outcome (Series) – Outcome series.

  • smd_threshold (float) – Standardized Mean Difference threshold value.

  • lower_score_favorable (bool, optional) – Whether a lower value of outcome is favorable. Defaults to True.

  • label (Optional[Series], optional) – Label, true outcome, and/or target series evaluated alongside outcome. Defaults to None.

  • sample_weight (Optional[Series], optional) – Sample weight series. Has the same length as group_data. Defaults to None.

  • smd_denominator (SMDDenominator, optional) – Standardized Mean Difference denominator calculation. Defaults to SMDDenominator.POPULATION.

Returns

Object containing results of the disparity calculation.

Return type

Disparity