Disparity#

class solas_disparity.types.Disparity(...)#

Dataclass for disparity objects.

Methods

__init__

Method generated by attrs for class Disparity.

show

to_excel

Export summary table as an XLSX file.

Attributes

affected_categories

Group categories that correspond to practically significant groups.

affected_groups

Protected groups that have practically significant adverse disparities.

affected_reference

Reference groups that correspond to practically significant groups.

plot

report

Data in NYC Department of Consumer and Worker Protection format.

disparity_type

Type of disparity calculation.

summary_table

Summary table of disparity calculation results.

protected_groups

Protected group names.

reference_groups

Reference group names.

group_categories

Group category names.

statistical_significance

StatSig object.

smd_threshold

Standardized mean difference threshold.

residual_smd_threshold

Residual Standardized mean difference threshold.

smd_denominator

Standardized mean difference denominator.

residual_smd_denominator

Residual standardized mean difference denominator.

lower_score_favorable

Is a lower pre-transformation prediction favorable? If True, then the model's predictions are assumed to be more favorable the lower the value.

odds_ratio_threshold

Odds ratio threshold.

air_threshold

AIR threshold.

percent_difference_threshold

Percent difference threshold value.

max_for_fishers

Max value of samples for Fishers Exact test to be used.

shortfall_method

Shortfall method.

fdr_threshold

False discovery rate threshold for use when calculating segment-level results are statistically significant using the Benjamani-Hochberg Procedure.

metric

Metric Requested

difference_calculation

Difference Calculation.

difference_threshold

Difference threshold as a float set only for curated custom disparity metrics such as FDR.

ratio_calculation

Ratio Calculation.

ratio_threshold

Ratio threshold as float set only for curated custom disparity metrics such as FDR.

statistical_significance_test

Statistical Significance Method

p_value_threshold

Statistical Significance P-value Threshold

shift_zeros

If True, if any cell count in a contingency table used in the CMH and Breslow-Day tests is zero, then 0.5 is added to all values in the contingency table so an odds ratio is able to be calculated.

drop_small_groups

Whether to separate and return a table of groups that comprise less than 2% of individuals.

small_group_table

Groups comprising less than 2% of individuals.

unknown_table

Summary of individuals with unknown demographic information.

property affected_categories: Optional[List[str]]#

Group categories that correspond to practically significant groups.

property affected_groups: List[str]#

Protected groups that have practically significant adverse disparities.

property affected_reference: List[str]#

Reference groups that correspond to practically significant groups.

air_threshold: Optional[float]#

AIR threshold. Set by the user, this takes a float that represents the AIR level below which Solas will identify as being indicative of a practically significant disparity. Legal and compliance counsel should be sought for the appropriate AIR threshold in a given use case.

difference_calculation: Optional[solas_disparity.types._difference_calculation.DifferenceCalculation]#

Difference Calculation.

difference_threshold: Optional[float]#

Difference threshold as a float set only for curated custom disparity metrics such as FDR.

disparity_type: solas_disparity.types._disparity_calculation.DisparityCalculation#

Type of disparity calculation.

drop_small_groups: bool#

Whether to separate and return a table of groups that comprise less than 2% of individuals.

fdr_threshold: Optional[float]#

False discovery rate threshold for use when calculating segment-level results are statistically significant using the Benjamani-Hochberg Procedure.

group_categories: List[str]#

Group category names. Same length as protected_groups. Set by the user, this takes a list of strings which represent the reference groups (also known as control groups) being analyzed. There must be a one-to-one correspondence between reference groups and protected_groups. Note that the protected groups and reference groups are aligned by index in the lists.

lower_score_favorable: Optional[bool]#

Is a lower pre-transformation prediction favorable? If True, then the model’s predictions are assumed to be more favorable the lower the value. If False, then the model’s predictions are assumed to be more favorable the higher the value. Optional. If omitted, defaults to True.

max_for_fishers: Optional[int]#

Max value of samples for Fishers Exact test to be used. Defaults to const.MAX_FOR_FISHERS. Set by the user, this takes an integer and defaults to 100.

metric: Callable[[...], Union[int, float]]#

Metric Requested

odds_ratio_threshold: Optional[float]#

Odds ratio threshold.

p_value_threshold: float#

Statistical Significance P-value Threshold

percent_difference_threshold: Optional[float]#

Percent difference threshold value. For example, if percent_difference_threshold = 0.2, then the difference in percent favorable will need to exceed 20% for a result to be practically significant.

protected_groups: List[str]#

Protected group names. Set by the user, this takes a list of strings which represent the protected groups being analyzed. There can be as few as one protected group and there is no upper limit to the number of protected groups that can be analyzed.

ratio_calculation: Optional[solas_disparity.types._ratio_calculation.RatioCalculation]#

Ratio Calculation.

ratio_threshold: Optional[float]#

Ratio threshold as float set only for curated custom disparity metrics such as FDR.

reference_groups: List[str]#

Reference group names. Same length as protected_groups. Set by the user, this takes a list of strings which represent the reference groups (also known as control groups) being analyzed. There must be a one-to-one correspondence between reference groups and protected_groups. Note that the protected groups and reference groups are aligned by index in the lists.

property report: Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]#

Data in NYC Department of Consumer and Worker Protection format.

Raises

ValueError – If the disparity calculation is not supported

Returns

Gender, Race/Ethnicity, and Intersectional tables.

Return type

Tuple[DataFrame, DataFrame, DataFrame]

residual_smd_denominator: Optional[str]#

Residual standardized mean difference denominator. Defaults to ResidualSMDDenominator.POPULATION.

residual_smd_threshold: Optional[float]#

Residual Standardized mean difference threshold.

shift_zeros: bool#

If True, if any cell count in a contingency table used in the CMH and Breslow-Day tests is zero, then 0.5 is added to all values in the contingency table so an odds ratio is able to be calculated. Note: In large sample sizes, this correction will not make a significant difference. In small sample sizes, this correction has the potential to impact the significance determination. Defaults to True.

shortfall_method: Optional[solas_disparity.types._shortfall_method.ShortfallMethod]#

Shortfall method. Set by the user. Determines the value of the const.SHORTFALL column in Disparity.summary_table. Defaults to ShortfallMethod.TO_REFERENCE_MEAN.

small_group_table: pandas.core.frame.DataFrame#

Groups comprising less than 2% of individuals.

smd_denominator: Optional[str]#

Standardized mean difference denominator. Defaults to SMDDenominator.POPULATION. This defines how the standard deviation of scores is set. Options include either “population” or “pooled”.

smd_threshold: Optional[float]#

Standardized mean difference threshold. Set by the user, this takes a float that represents the SMD level which Solas will identify as being indicative of a practically significant disparity. Legal and compliance counsel should be sought for the appropriate SMD threshold in a given use case.

statistical_significance: Optional[solas_disparity.types._stat_sig.StatSig]#

StatSig object. Contains statistical significance information stated in Disparity.summary_table and sometimes more than the former.

statistical_significance_test: Optional[solas_disparity.types._stat_sig_test.StatSigTest]#

Statistical Significance Method

summary_table: pandas.core.frame.DataFrame#

Summary table of disparity calculation results. Provided as a Pandas DataFrame. This is a stand-alone version of the summary table provided by .disparity

to_excel(file_path: Union[str, pathlib.Path])#

Export summary table as an XLSX file.

Parameters

file_path (Union[str, Path]) – Path to file.

unknown_table: pandas.core.frame.DataFrame#

Summary of individuals with unknown demographic information.