SolasAI Disparity Plots#

import solas_disparity as sd
import pandas as pd

Certain notebook environments have limited rendering functionality. Uncomment this cell as a potential workaround if plots are not displaying.

# import plotly.io as pio
# pio.renderers.default = "png"

It’s preferable to explicitly and specifically handle warnings. For the purposes of this notebook, we will filter out all warnings.

from warnings import simplefilter
simplefilter("ignore")

Some predictions have already been created using a tree model run on an HMDA dataset.

label = "Interest Rate"
data = pd.read_parquet("hmda_test.parquet")

Store commonly reused function arguments.

protected_groups = ["Black", "Asian", "Native American", "Hispanic", "Female"]
reference_groups = ["White", "White", "White", "Non-Hispanic", "Male"]
groups = sd.pgrg_ordered(
    protected_groups=protected_groups,
    reference_groups=reference_groups,
)
reused_arguments = dict(
    group_data=data[groups],
    protected_groups=protected_groups,
    reference_groups=reference_groups,
    group_categories=["Race", "Race", "Race", "Ethnicity", "Sex"],
    sample_weight=None,
)
binary_outcome = data["Prediction"] <= data["Prediction"].quantile(0.5)
binary_label = data[label] <= data[label].quantile(0.5)

Single-Level Plots#

Certain disparity functions provide a result for each group. Their associated plots are single figures and are referred to as single-level plots.

Calculate Disparity#

Let’s use a result from the AIR function as an example for single-index plots.

air = sd.adverse_impact_ratio(
    outcome=binary_outcome,
    air_threshold=0.8,
    percent_difference_threshold=0.0,
    **reused_arguments
)

Output Results#

The default output for a disparity calculation result object includes a default plot.

air

Disparity Calculation: Adverse Impact Ratio

┌───────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────┐
│ Protected Groups                          │ Black, Asian, Native American, Hispanic, Female                     │
│ Reference Groups                          │ White, White, White, Non-Hispanic, Male                             │
│ Group Categories                          │ Race, Race, Race, Ethnicity, Sex                                    │
│ AIR Threshold                             │ 0.8                                                                 │
│ Percent Difference Threshold              │ 0.0                                                                 │
│ Shortfall Method                          │ to_reference_mean                                                   │
│ Affected Groups                           │ Hispanic                                                            │
│ Affected Reference                        │ Non-Hispanic                                                        │
│ Affected Categories                       │ Ethnicity                                                           │
└───────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘

Adverse Impact Ratio Summary Table

* Percent Missing: Ethnicity: 13.68%, Race: 13.56%, Sex: 46.88%

Group Reference Group Group Category Total Favorable Percent Favorable Percent Difference Favorable AIR P-Values Practically Significant Shortfall
Black White Race 340.0 141.0 41.47% 9.70% 0.810 0.001 No
Asian White Race 327.0 243.0 74.31% -23.14% 1.452 0.000 No
Native American White Race 20.0 9.0 45.00% 6.17% 0.879 0.657 No
White Race 3,623.0 1,854.0 51.17%
Hispanic Non-Hispanic Ethnicity 508.0 167.0 32.87% 21.54% 0.604 0.000 Yes 109.4
Non-Hispanic Ethnicity 3,808.0 2,072.0 54.41%
Female Male Sex 1,034.0 414.0 40.04% 9.78% 0.804 0.000 No
Male Sex 1,622.0 808.0 49.82%

The .plot() method on the result object returns the plotly figure directly.

figure = air.plot()
type(figure)
plotly.graph_objs._figure.Figure
figure

In the case of AIR, the plot function also takes an column argument, allowing specification of a different column in the summary table to be plotted.

air.plot(column=sd.const.TOTAL)

The .plot() method is simply a convenience wrapper for the associated plot function in the solas_disparity.plots namespace. For further information, reference this plot function in rendered documention. To have stronger linting support, one can optionally call this function directly.

sd.plots.plot_adverse_impact_ratio(disparity=air)

Multi-Level Plots#

Certain other disparity functions provide a result for each secondary level for each group.

Calculate Disparity#

Use AIR by quantile as an example for multi-level plots.

airq = sd.adverse_impact_ratio_by_quantile(
    outcome=data["Prediction"],
    air_threshold=0.8,
    percent_difference_threshold=0.0,
    quantiles=[decile / 10 for decile in range(1, 11)],
    **reused_arguments,
)

Output Results#

The default output for a disparity calculation result object includes a default plot. Note that a new subplot is created for each quantile.

airq

Disparity Calculation: Adverse Impact Ratio By Quantile

┌───────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────┐
│ Protected Groups                          │ Black, Asian, Native American, Hispanic, Female                     │
│ Reference Groups                          │ White, White, White, Non-Hispanic, Male                             │
│ Group Categories                          │ Race, Race, Race, Ethnicity, Sex                                    │
│ AIR Threshold                             │ 0.8                                                                 │
│ Percent Difference Threshold              │ 0.0                                                                 │
│ Lower Score Favorable                     │ True                                                                │
│ Affected Groups                           │ Black, Hispanic, Female                                             │
│ Affected Reference                        │ White, Non-Hispanic, Male                                           │
│ Affected Categories                       │ Race, Ethnicity, Sex                                                │
└───────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘

Adverse Impact Ratio By Quantile Summary Table

Group Quantile Reference Group Group Category Quantile Cutoff Observations Percent Missing Total Favorable Percent Favorable Percent Difference Favorable AIR P-Values Practically Significant
Black 10.0% White Race 0.044761 4,322 13.56% 340.0 13.0 3.82% 4.93% 0.437 0.001 Yes
Asian 10.0% White Race 0.044761 4,322 13.56% 327.0 91.0 27.83% -19.08% 3.181 0.000 No
Native American 10.0% White Race 0.044761 4,322 13.56% 20.0 1.0 5.00% 3.75% 0.571 1.000 No
White 10.0% Race 0.044761 4,322 13.56% 3,623.0 317.0 8.75%
Hispanic 10.0% Non-Hispanic Ethnicity 0.044761 4,316 13.68% 508.0 15.0 2.95% 7.79% 0.275 0.000 Yes
Non-Hispanic 10.0% Ethnicity 0.044761 4,316 13.68% 3,808.0 409.0 10.74%
Female 10.0% Male Sex 0.044761 2,656 46.88% 1,034.0 66.0 6.38% 3.05% 0.677 0.006 Yes
Male 10.0% Sex 0.044761 2,656 46.88% 1,622.0 153.0 9.43%
Black 20.0% White Race 0.045863 4,322 13.56% 340.0 37.0 10.88% 9.85% 0.525 0.000 Yes
Asian 20.0% White Race 0.045863 4,322 13.56% 327.0 132.0 40.37% -19.64% 1.947 0.000 No
Native American 20.0% White Race 0.045863 4,322 13.56% 20.0 2.0 10.00% 10.73% 0.482 0.403 No
White 20.0% Race 0.045863 4,322 13.56% 3,623.0 751.0 20.73%
Hispanic 20.0% Non-Hispanic Ethnicity 0.045863 4,316 13.68% 508.0 42.0 8.27% 14.95% 0.356 0.000 Yes
Non-Hispanic 20.0% Ethnicity 0.045863 4,316 13.68% 3,808.0 884.0 23.21%
Female 20.0% Male Sex 0.045863 2,656 46.88% 1,034.0 155.0 14.99% 4.92% 0.753 0.002 Yes
Male 20.0% Sex 0.045863 2,656 46.88% 1,622.0 323.0 19.91%
Black 30.0% White Race 0.046427 4,322 13.56% 340.0 62.0 18.24% 11.30% 0.617 0.000 Yes
Asian 30.0% White Race 0.046427 4,322 13.56% 327.0 175.0 53.52% -23.98% 1.812 0.000 No
Native American 30.0% White Race 0.046427 4,322 13.56% 20.0 4.0 20.00% 9.53% 0.677 0.464 No
White 30.0% Race 0.046427 4,322 13.56% 3,623.0 1,070.0 29.53%
Hispanic 30.0% Non-Hispanic Ethnicity 0.046427 4,316 13.68% 508.0 69.0 13.58% 19.14% 0.415 0.000 Yes
Non-Hispanic 30.0% Ethnicity 0.046427 4,316 13.68% 3,808.0 1,246.0 32.72%
Female 30.0% Male Sex 0.046427 2,656 46.88% 1,034.0 225.0 21.76% 5.74% 0.791 0.001 Yes
Male 30.0% Sex 0.046427 2,656 46.88% 1,622.0 446.0 27.50%
Black 40.0% White Race 0.046703 4,322 13.56% 340.0 103.0 30.29% 16.38% 0.649 0.000 Yes
Asian 40.0% White Race 0.046703 4,322 13.56% 327.0 238.0 72.78% -26.11% 1.559 0.000 No
Native American 40.0% White Race 0.046703 4,322 13.56% 20.0 8.0 40.00% 6.67% 0.857 0.656 No
White 40.0% Race 0.046703 4,322 13.56% 3,623.0 1,691.0 46.67%
Hispanic 40.0% Non-Hispanic Ethnicity 0.046703 4,316 13.68% 508.0 139.0 27.36% 22.30% 0.551 0.000 Yes
Non-Hispanic 40.0% Ethnicity 0.046703 4,316 13.68% 3,808.0 1,891.0 49.66%
Female 40.0% Male Sex 0.046703 2,656 46.88% 1,034.0 380.0 36.75% 7.27% 0.835 0.000 No
Male 40.0% Sex 0.046703 2,656 46.88% 1,622.0 714.0 44.02%
Black 50.0% White Race 0.047009 4,322 13.56% 340.0 141.0 41.47% 9.70% 0.810 0.001 No
Asian 50.0% White Race 0.047009 4,322 13.56% 327.0 243.0 74.31% -23.14% 1.452 0.000 No
Native American 50.0% White Race 0.047009 4,322 13.56% 20.0 9.0 45.00% 6.17% 0.879 0.657 No
White 50.0% Race 0.047009 4,322 13.56% 3,623.0 1,854.0 51.17%
Hispanic 50.0% Non-Hispanic Ethnicity 0.047009 4,316 13.68% 508.0 167.0 32.87% 21.54% 0.604 0.000 Yes
Non-Hispanic 50.0% Ethnicity 0.047009 4,316 13.68% 3,808.0 2,072.0 54.41%
Female 50.0% Male Sex 0.047009 2,656 46.88% 1,034.0 414.0 40.04% 9.78% 0.804 0.000 No
Male 50.0% Sex 0.047009 2,656 46.88% 1,622.0 808.0 49.82%
Black 60.0% White Race 0.047266 4,322 13.56% 340.0 161.0 47.35% 13.62% 0.777 0.000 Yes
Asian 60.0% White Race 0.047266 4,322 13.56% 327.0 260.0 79.51% -18.54% 1.304 0.000 No
Native American 60.0% White Race 0.047266 4,322 13.56% 20.0 11.0 55.00% 5.97% 0.902 0.648 No
White 60.0% Race 0.047266 4,322 13.56% 3,623.0 2,209.0 60.97%
Hispanic 60.0% Non-Hispanic Ethnicity 0.047266 4,316 13.68% 508.0 214.0 42.13% 21.53% 0.662 0.000 Yes
Non-Hispanic 60.0% Ethnicity 0.047266 4,316 13.68% 3,808.0 2,424.0 63.66%
Female 60.0% Male Sex 0.047266 2,656 46.88% 1,034.0 520.0 50.29% 7.60% 0.869 0.000 No
Male 60.0% Sex 0.047266 2,656 46.88% 1,622.0 939.0 57.89%
Black 80.0% White Race 0.048018 4,322 13.56% 340.0 248.0 72.94% 7.96% 0.902 0.001 No
Asian 80.0% White Race 0.048018 4,322 13.56% 327.0 308.0 94.19% -13.29% 1.164 0.000 No
Native American 80.0% White Race 0.048018 4,322 13.56% 20.0 14.0 70.00% 10.90% 0.865 0.250 No
White 80.0% Race 0.048018 4,322 13.56% 3,623.0 2,931.0 80.90%
Hispanic 80.0% Non-Hispanic Ethnicity 0.048018 4,316 13.68% 508.0 364.0 71.65% 10.83% 0.869 0.000 No
Non-Hispanic 80.0% Ethnicity 0.048018 4,316 13.68% 3,808.0 3,141.0 82.48%
Female 80.0% Male Sex 0.048018 2,656 46.88% 1,034.0 765.0 73.98% 4.44% 0.943 0.010 No
Male 80.0% Sex 0.048018 2,656 46.88% 1,622.0 1,272.0 78.42%
Black 90.0% White Race 0.048694 4,322 13.56% 340.0 288.0 84.71% 5.41% 0.940 0.003 No
Asian 90.0% White Race 0.048694 4,322 13.56% 327.0 321.0 98.17% -8.05% 1.089 0.000 No
Native American 90.0% White Race 0.048694 4,322 13.56% 20.0 17.0 85.00% 5.12% 0.943 0.441 No
White 90.0% Race 0.048694 4,322 13.56% 3,623.0 3,265.0 90.12%
Hispanic 90.0% Non-Hispanic Ethnicity 0.048694 4,316 13.68% 508.0 428.0 84.25% 6.85% 0.925 0.000 No
Non-Hispanic 90.0% Ethnicity 0.048694 4,316 13.68% 3,808.0 3,469.0 91.10%
Female 90.0% Male Sex 0.048694 2,656 46.88% 1,034.0 887.0 85.78% 4.04% 0.955 0.002 No
Male 90.0% Sex 0.048694 2,656 46.88% 1,622.0 1,457.0 89.83%
Black 100.0% White Race 0.058530 4,322 13.56% 340.0 340.0 100.00% 0.00% 1.000 1.000 No
Asian 100.0% White Race 0.058530 4,322 13.56% 327.0 327.0 100.00% 0.00% 1.000 1.000 No
Native American 100.0% White Race 0.058530 4,322 13.56% 20.0 20.0 100.00% 0.00% 1.000 1.000 No
White 100.0% Race 0.058530 4,322 13.56% 3,623.0 3,623.0 100.00%
Hispanic 100.0% Non-Hispanic Ethnicity 0.058530 4,316 13.68% 508.0 508.0 100.00% 0.00% 1.000 1.000 No
Non-Hispanic 100.0% Ethnicity 0.058530 4,316 13.68% 3,808.0 3,808.0 100.00%
Female 100.0% Male Sex 0.058530 2,656 46.88% 1,034.0 1,034.0 100.00% 0.00% 1.000 1.000 No
Male 100.0% Sex 0.058530 2,656 46.88% 1,622.0 1,622.0 100.00%

The .plot() method on the result object returns the plotly figure directly.

type(airq.plot())
plotly.graph_objs._figure.Figure

The .plot() method also takes an optional argument column just like a single-index plot.

airq.plot(column=sd.const.PERCENT_DIFFERENCE_FAVORABLE)

A user can also specify a single group to extract a single by-level plot for.

airq.plot(group="Black")
airq.plot(group="Black", column=sd.const.PERCENT_DIFFERENCE_FAVORABLE)

.plot() also has a quantile argument to return a figure for a single quantile. The quantile argument is specific to AIR by quantile. For example, the equivalent argument for a categorical AIR calculation would be category.

airq.plot(quantile=0.1)
airq.plot(quantile=0.5)

Another argument exposed by multi-level plots is separate. It is used to separate a single plotly figure containing multiple subplots into a list of separate plotly figures for each level. It is convenience argument equivalent to calling .plot() with the quantile argument for every quantile.

airq_figures = airq.plot(separate=True)
type(airq_figures)
list
airq_figures[0]
airq_figures[4]

As with any other plot, the full documentation and typing support can be found in the solas_disparity.plots namespace.

sd.plots.plot_adverse_impact_ratio_by_quantile
<cyfunction plot_adverse_impact_ratio_by_quantile at 0x7f25fa61c860>

More Plot Functionality#

Since the figures returned by plot functions are plotly figures, reference the plotly documentation for more functionality. https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html#plotly.graph_objects.Figure

The update_layout method to modify overall attributes of the figure, including its height and width. https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html#plotly.graph_objects.Figure.update_layout

air.plot().update_layout(height=500, width=500)

Plots can be saved as images using the write_image method. Here’s an example saving a plot as an svg file. https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html#plotly.graph_objects.Figure.write_image

air.plot().write_image("air.svg")

Or as a png…

air.plot().write_image("air.png")

The size of plot when being saved to an image can also be controlled without affecting the original figure object.

air.plot().write_image("air_resized.svg", height=800, width=1100)

Clean up files.

from pathlib import Path

to_clean = ["air.svg", "air_resized.svg", "air.png"]
for name in to_clean:
    if Path(name).exists():
        Path(name).unlink()