Disparity
Disparity#
- class solas_disparity.types.Disparity(...)#
Dataclass for disparity objects.
Methods
Method generated by attrs for class Disparity.
show
Export summary table as an XLSX file.
Attributes
Group categories that correspond to practically significant groups.
Protected groups that have practically significant adverse disparities.
Reference groups that correspond to practically significant groups.
plot
Data in NYC Department of Consumer and Worker Protection format.
Type of disparity calculation.
Summary table of disparity calculation results.
Protected group names.
Reference group names.
Group category names.
StatSig object.
Standardized mean difference threshold.
Residual Standardized mean difference threshold.
Standardized mean difference denominator.
Residual standardized mean difference denominator.
Is a lower pre-transformation prediction favorable? If True, then the model's predictions are assumed to be more favorable the lower the value.
Odds ratio threshold.
AIR threshold.
Percent difference threshold value.
Max value of samples for Fishers Exact test to be used.
Shortfall method.
False discovery rate threshold for use when calculating segment-level results are statistically significant using the Benjamani-Hochberg Procedure.
Metric Requested
Difference Calculation.
Difference threshold as a float set only for curated custom disparity metrics such as FDR.
Ratio Calculation.
Ratio threshold as float set only for curated custom disparity metrics such as FDR.
Statistical Significance Method
Statistical Significance P-value Threshold
If
True
, if any cell count in a contingency table used in the CMH and Breslow-Day tests is zero, then 0.5 is added to all values in the contingency table so an odds ratio is able to be calculated.Whether to separate and return a table of groups that comprise less than 2% of individuals.
Groups comprising less than 2% of individuals.
Summary of individuals with unknown demographic information.
- property affected_categories: Optional[List[str]]#
Group categories that correspond to practically significant groups.
- property affected_groups: List[str]#
Protected groups that have practically significant adverse disparities.
- property affected_reference: List[str]#
Reference groups that correspond to practically significant groups.
- air_threshold: Optional[float]#
AIR threshold. Set by the user, this takes a float that represents the AIR level below which Solas will identify as being indicative of a practically significant disparity. Legal and compliance counsel should be sought for the appropriate AIR threshold in a given use case.
- difference_calculation: Optional[solas_disparity.types._difference_calculation.DifferenceCalculation]#
Difference Calculation.
- difference_threshold: Optional[float]#
Difference threshold as a float set only for curated custom disparity metrics such as FDR.
- disparity_type: solas_disparity.types._disparity_calculation.DisparityCalculation#
Type of disparity calculation.
- drop_small_groups: bool#
Whether to separate and return a table of groups that comprise less than 2% of individuals.
- fdr_threshold: Optional[float]#
False discovery rate threshold for use when calculating segment-level results are statistically significant using the Benjamani-Hochberg Procedure.
- group_categories: List[str]#
Group category names. Same length as
protected_groups
. Set by the user, this takes a list of strings which represent the reference groups (also known as control groups) being analyzed. There must be a one-to-one correspondence between reference groups and protected_groups. Note that the protected groups and reference groups are aligned by index in the lists.
- lower_score_favorable: Optional[bool]#
Is a lower pre-transformation prediction favorable? If True, then the model’s predictions are assumed to be more favorable the lower the value. If False, then the model’s predictions are assumed to be more favorable the higher the value. Optional. If omitted, defaults to True.
- max_for_fishers: Optional[int]#
Max value of samples for Fishers Exact test to be used. Defaults to const.MAX_FOR_FISHERS. Set by the user, this takes an integer and defaults to 100.
- metric: Callable[[...], Union[int, float]]#
Metric Requested
- odds_ratio_threshold: Optional[float]#
Odds ratio threshold.
- p_value_threshold: float#
Statistical Significance P-value Threshold
- percent_difference_threshold: Optional[float]#
Percent difference threshold value. For example, if percent_difference_threshold = 0.2, then the difference in percent favorable will need to exceed 20% for a result to be practically significant.
- protected_groups: List[str]#
Protected group names. Set by the user, this takes a list of strings which represent the protected groups being analyzed. There can be as few as one protected group and there is no upper limit to the number of protected groups that can be analyzed.
- ratio_calculation: Optional[solas_disparity.types._ratio_calculation.RatioCalculation]#
Ratio Calculation.
- ratio_threshold: Optional[float]#
Ratio threshold as float set only for curated custom disparity metrics such as FDR.
- reference_groups: List[str]#
Reference group names. Same length as
protected_groups
. Set by the user, this takes a list of strings which represent the reference groups (also known as control groups) being analyzed. There must be a one-to-one correspondence between reference groups and protected_groups. Note that the protected groups and reference groups are aligned by index in the lists.
- property report: Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]#
Data in NYC Department of Consumer and Worker Protection format.
- Raises
ValueError – If the disparity calculation is not supported
- Returns
Gender, Race/Ethnicity, and Intersectional tables.
- Return type
Tuple[DataFrame, DataFrame, DataFrame]
- residual_smd_denominator: Optional[str]#
Residual standardized mean difference denominator. Defaults to ResidualSMDDenominator.POPULATION.
- residual_smd_threshold: Optional[float]#
Residual Standardized mean difference threshold.
- shift_zeros: bool#
If
True
, if any cell count in a contingency table used in the CMH and Breslow-Day tests is zero, then 0.5 is added to all values in the contingency table so an odds ratio is able to be calculated. Note: In large sample sizes, this correction will not make a significant difference. In small sample sizes, this correction has the potential to impact the significance determination. Defaults to True.
- shortfall_method: Optional[solas_disparity.types._shortfall_method.ShortfallMethod]#
Shortfall method. Set by the user. Determines the value of the
const.SHORTFALL
column inDisparity.summary_table
. Defaults toShortfallMethod.TO_REFERENCE_MEAN
.
- small_group_table: pandas.core.frame.DataFrame#
Groups comprising less than 2% of individuals.
- smd_denominator: Optional[str]#
Standardized mean difference denominator. Defaults to SMDDenominator.POPULATION. This defines how the standard deviation of scores is set. Options include either “population” or “pooled”.
- smd_threshold: Optional[float]#
Standardized mean difference threshold. Set by the user, this takes a float that represents the SMD level which Solas will identify as being indicative of a practically significant disparity. Legal and compliance counsel should be sought for the appropriate SMD threshold in a given use case.
- statistical_significance: Optional[solas_disparity.types._stat_sig.StatSig]#
StatSig object. Contains statistical significance information stated in
Disparity.summary_table
and sometimes more than the former.
- statistical_significance_test: Optional[solas_disparity.types._stat_sig_test.StatSigTest]#
Statistical Significance Method
- summary_table: pandas.core.frame.DataFrame#
Summary table of disparity calculation results. Provided as a Pandas DataFrame. This is a stand-alone version of the summary table provided by .disparity
- to_excel(file_path: Union[str, pathlib.Path])#
Export summary table as an XLSX file.
- Parameters
file_path (Union[str, Path]) – Path to file.
- unknown_table: pandas.core.frame.DataFrame#
Summary of individuals with unknown demographic information.