SolasAI Disparity Plots
Contents
SolasAI Disparity Plots#
import solas_disparity as sd
import pandas as pd
Certain notebook environments have limited rendering functionality. Uncomment this cell as a potential workaround if plots are not displaying.
# import plotly.io as pio
# pio.renderers.default = "png"
It’s preferable to explicitly and specifically handle warnings. For the purposes of this notebook, we will filter out all warnings.
from warnings import simplefilter
simplefilter("ignore")
Some predictions have already been created using a tree model run on an HMDA dataset.
label = "Interest Rate"
data = pd.read_parquet("hmda_test.parquet")
Store commonly reused function arguments.
protected_groups = ["Black", "Asian", "Native American", "Hispanic", "Female"]
reference_groups = ["White", "White", "White", "Non-Hispanic", "Male"]
groups = sd.pgrg_ordered(
protected_groups=protected_groups,
reference_groups=reference_groups,
)
reused_arguments = dict(
group_data=data[groups],
protected_groups=protected_groups,
reference_groups=reference_groups,
group_categories=["Race", "Race", "Race", "Ethnicity", "Sex"],
sample_weight=None,
)
binary_outcome = data["Prediction"] <= data["Prediction"].quantile(0.5)
binary_label = data[label] <= data[label].quantile(0.5)
Single-Level Plots#
Certain disparity functions provide a result for each group. Their associated plots are single figures and are referred to as single-level plots.
Calculate Disparity#
Let’s use a result from the AIR function as an example for single-index plots.
air = sd.adverse_impact_ratio(
outcome=binary_outcome,
air_threshold=0.8,
percent_difference_threshold=0.0,
**reused_arguments
)
Output Results#
The default output for a disparity calculation result object includes a default plot.
air
Disparity Calculation: Adverse Impact Ratio
┌───────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────┐ │ Protected Groups │ Black, Asian, Native American, Hispanic, Female │ │ Reference Groups │ White, White, White, Non-Hispanic, Male │ │ Group Categories │ Race, Race, Race, Ethnicity, Sex │ │ AIR Threshold │ 0.8 │ │ Percent Difference Threshold │ 0.0 │ │ Shortfall Method │ to_reference_mean │ │ Affected Groups │ Hispanic │ │ Affected Reference │ Non-Hispanic │ │ Affected Categories │ Ethnicity │ └───────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘
Adverse Impact Ratio Summary Table
* Percent Missing: Ethnicity: 13.68%, Race: 13.56%, Sex: 46.88%
Group | Reference Group | Group Category | Total | Favorable | Percent Favorable | Percent Difference Favorable | AIR | P-Values | Practically Significant | Shortfall |
---|---|---|---|---|---|---|---|---|---|---|
Black | White | Race | 340.0 | 141.0 | 41.47% | 9.70% | 0.810 | 0.001 | No | |
Asian | White | Race | 327.0 | 243.0 | 74.31% | -23.14% | 1.452 | 0.000 | No | |
Native American | White | Race | 20.0 | 9.0 | 45.00% | 6.17% | 0.879 | 0.657 | No | |
White | Race | 3,623.0 | 1,854.0 | 51.17% | ||||||
Hispanic | Non-Hispanic | Ethnicity | 508.0 | 167.0 | 32.87% | 21.54% | 0.604 | 0.000 | Yes | 109.4 |
Non-Hispanic | Ethnicity | 3,808.0 | 2,072.0 | 54.41% | ||||||
Female | Male | Sex | 1,034.0 | 414.0 | 40.04% | 9.78% | 0.804 | 0.000 | No | |
Male | Sex | 1,622.0 | 808.0 | 49.82% |
The .plot()
method on the result object returns the plotly figure directly.
figure = air.plot()
type(figure)
plotly.graph_objs._figure.Figure
figure
In the case of AIR, the plot function also takes an column
argument,
allowing specification of a different column in the summary table to be
plotted.
air.plot(column=sd.const.TOTAL)
The .plot()
method is simply a convenience wrapper for the associated
plot function in the solas_disparity.plots
namespace. For further
information, reference this plot function in rendered documention. To
have stronger linting support, one can optionally call this function
directly.
sd.plots.plot_adverse_impact_ratio(disparity=air)
Multi-Level Plots#
Certain other disparity functions provide a result for each secondary level for each group.
Calculate Disparity#
Use AIR by quantile as an example for multi-level plots.
airq = sd.adverse_impact_ratio_by_quantile(
outcome=data["Prediction"],
air_threshold=0.8,
percent_difference_threshold=0.0,
quantiles=[decile / 10 for decile in range(1, 11)],
**reused_arguments,
)
Output Results#
The default output for a disparity calculation result object includes a default plot. Note that a new subplot is created for each quantile.
airq
Disparity Calculation: Adverse Impact Ratio By Quantile
┌───────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────┐ │ Protected Groups │ Black, Asian, Native American, Hispanic, Female │ │ Reference Groups │ White, White, White, Non-Hispanic, Male │ │ Group Categories │ Race, Race, Race, Ethnicity, Sex │ │ AIR Threshold │ 0.8 │ │ Percent Difference Threshold │ 0.0 │ │ Lower Score Favorable │ True │ │ Affected Groups │ Black, Hispanic, Female │ │ Affected Reference │ White, Non-Hispanic, Male │ │ Affected Categories │ Race, Ethnicity, Sex │ └───────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘
Adverse Impact Ratio By Quantile Summary Table
Group | Quantile | Reference Group | Group Category | Quantile Cutoff | Observations | Percent Missing | Total | Favorable | Percent Favorable | Percent Difference Favorable | AIR | P-Values | Practically Significant |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Black | 10.0% | White | Race | 0.044761 | 4,322 | 13.56% | 340.0 | 13.0 | 3.82% | 4.93% | 0.437 | 0.001 | Yes |
Asian | 10.0% | White | Race | 0.044761 | 4,322 | 13.56% | 327.0 | 91.0 | 27.83% | -19.08% | 3.181 | 0.000 | No |
Native American | 10.0% | White | Race | 0.044761 | 4,322 | 13.56% | 20.0 | 1.0 | 5.00% | 3.75% | 0.571 | 1.000 | No |
White | 10.0% | Race | 0.044761 | 4,322 | 13.56% | 3,623.0 | 317.0 | 8.75% | |||||
Hispanic | 10.0% | Non-Hispanic | Ethnicity | 0.044761 | 4,316 | 13.68% | 508.0 | 15.0 | 2.95% | 7.79% | 0.275 | 0.000 | Yes |
Non-Hispanic | 10.0% | Ethnicity | 0.044761 | 4,316 | 13.68% | 3,808.0 | 409.0 | 10.74% | |||||
Female | 10.0% | Male | Sex | 0.044761 | 2,656 | 46.88% | 1,034.0 | 66.0 | 6.38% | 3.05% | 0.677 | 0.006 | Yes |
Male | 10.0% | Sex | 0.044761 | 2,656 | 46.88% | 1,622.0 | 153.0 | 9.43% | |||||
Black | 20.0% | White | Race | 0.045863 | 4,322 | 13.56% | 340.0 | 37.0 | 10.88% | 9.85% | 0.525 | 0.000 | Yes |
Asian | 20.0% | White | Race | 0.045863 | 4,322 | 13.56% | 327.0 | 132.0 | 40.37% | -19.64% | 1.947 | 0.000 | No |
Native American | 20.0% | White | Race | 0.045863 | 4,322 | 13.56% | 20.0 | 2.0 | 10.00% | 10.73% | 0.482 | 0.403 | No |
White | 20.0% | Race | 0.045863 | 4,322 | 13.56% | 3,623.0 | 751.0 | 20.73% | |||||
Hispanic | 20.0% | Non-Hispanic | Ethnicity | 0.045863 | 4,316 | 13.68% | 508.0 | 42.0 | 8.27% | 14.95% | 0.356 | 0.000 | Yes |
Non-Hispanic | 20.0% | Ethnicity | 0.045863 | 4,316 | 13.68% | 3,808.0 | 884.0 | 23.21% | |||||
Female | 20.0% | Male | Sex | 0.045863 | 2,656 | 46.88% | 1,034.0 | 155.0 | 14.99% | 4.92% | 0.753 | 0.002 | Yes |
Male | 20.0% | Sex | 0.045863 | 2,656 | 46.88% | 1,622.0 | 323.0 | 19.91% | |||||
Black | 30.0% | White | Race | 0.046427 | 4,322 | 13.56% | 340.0 | 62.0 | 18.24% | 11.30% | 0.617 | 0.000 | Yes |
Asian | 30.0% | White | Race | 0.046427 | 4,322 | 13.56% | 327.0 | 175.0 | 53.52% | -23.98% | 1.812 | 0.000 | No |
Native American | 30.0% | White | Race | 0.046427 | 4,322 | 13.56% | 20.0 | 4.0 | 20.00% | 9.53% | 0.677 | 0.464 | No |
White | 30.0% | Race | 0.046427 | 4,322 | 13.56% | 3,623.0 | 1,070.0 | 29.53% | |||||
Hispanic | 30.0% | Non-Hispanic | Ethnicity | 0.046427 | 4,316 | 13.68% | 508.0 | 69.0 | 13.58% | 19.14% | 0.415 | 0.000 | Yes |
Non-Hispanic | 30.0% | Ethnicity | 0.046427 | 4,316 | 13.68% | 3,808.0 | 1,246.0 | 32.72% | |||||
Female | 30.0% | Male | Sex | 0.046427 | 2,656 | 46.88% | 1,034.0 | 225.0 | 21.76% | 5.74% | 0.791 | 0.001 | Yes |
Male | 30.0% | Sex | 0.046427 | 2,656 | 46.88% | 1,622.0 | 446.0 | 27.50% | |||||
Black | 40.0% | White | Race | 0.046703 | 4,322 | 13.56% | 340.0 | 103.0 | 30.29% | 16.38% | 0.649 | 0.000 | Yes |
Asian | 40.0% | White | Race | 0.046703 | 4,322 | 13.56% | 327.0 | 238.0 | 72.78% | -26.11% | 1.559 | 0.000 | No |
Native American | 40.0% | White | Race | 0.046703 | 4,322 | 13.56% | 20.0 | 8.0 | 40.00% | 6.67% | 0.857 | 0.656 | No |
White | 40.0% | Race | 0.046703 | 4,322 | 13.56% | 3,623.0 | 1,691.0 | 46.67% | |||||
Hispanic | 40.0% | Non-Hispanic | Ethnicity | 0.046703 | 4,316 | 13.68% | 508.0 | 139.0 | 27.36% | 22.30% | 0.551 | 0.000 | Yes |
Non-Hispanic | 40.0% | Ethnicity | 0.046703 | 4,316 | 13.68% | 3,808.0 | 1,891.0 | 49.66% | |||||
Female | 40.0% | Male | Sex | 0.046703 | 2,656 | 46.88% | 1,034.0 | 380.0 | 36.75% | 7.27% | 0.835 | 0.000 | No |
Male | 40.0% | Sex | 0.046703 | 2,656 | 46.88% | 1,622.0 | 714.0 | 44.02% | |||||
Black | 50.0% | White | Race | 0.047009 | 4,322 | 13.56% | 340.0 | 141.0 | 41.47% | 9.70% | 0.810 | 0.001 | No |
Asian | 50.0% | White | Race | 0.047009 | 4,322 | 13.56% | 327.0 | 243.0 | 74.31% | -23.14% | 1.452 | 0.000 | No |
Native American | 50.0% | White | Race | 0.047009 | 4,322 | 13.56% | 20.0 | 9.0 | 45.00% | 6.17% | 0.879 | 0.657 | No |
White | 50.0% | Race | 0.047009 | 4,322 | 13.56% | 3,623.0 | 1,854.0 | 51.17% | |||||
Hispanic | 50.0% | Non-Hispanic | Ethnicity | 0.047009 | 4,316 | 13.68% | 508.0 | 167.0 | 32.87% | 21.54% | 0.604 | 0.000 | Yes |
Non-Hispanic | 50.0% | Ethnicity | 0.047009 | 4,316 | 13.68% | 3,808.0 | 2,072.0 | 54.41% | |||||
Female | 50.0% | Male | Sex | 0.047009 | 2,656 | 46.88% | 1,034.0 | 414.0 | 40.04% | 9.78% | 0.804 | 0.000 | No |
Male | 50.0% | Sex | 0.047009 | 2,656 | 46.88% | 1,622.0 | 808.0 | 49.82% | |||||
Black | 60.0% | White | Race | 0.047266 | 4,322 | 13.56% | 340.0 | 161.0 | 47.35% | 13.62% | 0.777 | 0.000 | Yes |
Asian | 60.0% | White | Race | 0.047266 | 4,322 | 13.56% | 327.0 | 260.0 | 79.51% | -18.54% | 1.304 | 0.000 | No |
Native American | 60.0% | White | Race | 0.047266 | 4,322 | 13.56% | 20.0 | 11.0 | 55.00% | 5.97% | 0.902 | 0.648 | No |
White | 60.0% | Race | 0.047266 | 4,322 | 13.56% | 3,623.0 | 2,209.0 | 60.97% | |||||
Hispanic | 60.0% | Non-Hispanic | Ethnicity | 0.047266 | 4,316 | 13.68% | 508.0 | 214.0 | 42.13% | 21.53% | 0.662 | 0.000 | Yes |
Non-Hispanic | 60.0% | Ethnicity | 0.047266 | 4,316 | 13.68% | 3,808.0 | 2,424.0 | 63.66% | |||||
Female | 60.0% | Male | Sex | 0.047266 | 2,656 | 46.88% | 1,034.0 | 520.0 | 50.29% | 7.60% | 0.869 | 0.000 | No |
Male | 60.0% | Sex | 0.047266 | 2,656 | 46.88% | 1,622.0 | 939.0 | 57.89% | |||||
Black | 80.0% | White | Race | 0.048018 | 4,322 | 13.56% | 340.0 | 248.0 | 72.94% | 7.96% | 0.902 | 0.001 | No |
Asian | 80.0% | White | Race | 0.048018 | 4,322 | 13.56% | 327.0 | 308.0 | 94.19% | -13.29% | 1.164 | 0.000 | No |
Native American | 80.0% | White | Race | 0.048018 | 4,322 | 13.56% | 20.0 | 14.0 | 70.00% | 10.90% | 0.865 | 0.250 | No |
White | 80.0% | Race | 0.048018 | 4,322 | 13.56% | 3,623.0 | 2,931.0 | 80.90% | |||||
Hispanic | 80.0% | Non-Hispanic | Ethnicity | 0.048018 | 4,316 | 13.68% | 508.0 | 364.0 | 71.65% | 10.83% | 0.869 | 0.000 | No |
Non-Hispanic | 80.0% | Ethnicity | 0.048018 | 4,316 | 13.68% | 3,808.0 | 3,141.0 | 82.48% | |||||
Female | 80.0% | Male | Sex | 0.048018 | 2,656 | 46.88% | 1,034.0 | 765.0 | 73.98% | 4.44% | 0.943 | 0.010 | No |
Male | 80.0% | Sex | 0.048018 | 2,656 | 46.88% | 1,622.0 | 1,272.0 | 78.42% | |||||
Black | 90.0% | White | Race | 0.048694 | 4,322 | 13.56% | 340.0 | 288.0 | 84.71% | 5.41% | 0.940 | 0.003 | No |
Asian | 90.0% | White | Race | 0.048694 | 4,322 | 13.56% | 327.0 | 321.0 | 98.17% | -8.05% | 1.089 | 0.000 | No |
Native American | 90.0% | White | Race | 0.048694 | 4,322 | 13.56% | 20.0 | 17.0 | 85.00% | 5.12% | 0.943 | 0.441 | No |
White | 90.0% | Race | 0.048694 | 4,322 | 13.56% | 3,623.0 | 3,265.0 | 90.12% | |||||
Hispanic | 90.0% | Non-Hispanic | Ethnicity | 0.048694 | 4,316 | 13.68% | 508.0 | 428.0 | 84.25% | 6.85% | 0.925 | 0.000 | No |
Non-Hispanic | 90.0% | Ethnicity | 0.048694 | 4,316 | 13.68% | 3,808.0 | 3,469.0 | 91.10% | |||||
Female | 90.0% | Male | Sex | 0.048694 | 2,656 | 46.88% | 1,034.0 | 887.0 | 85.78% | 4.04% | 0.955 | 0.002 | No |
Male | 90.0% | Sex | 0.048694 | 2,656 | 46.88% | 1,622.0 | 1,457.0 | 89.83% | |||||
Black | 100.0% | White | Race | 0.058530 | 4,322 | 13.56% | 340.0 | 340.0 | 100.00% | 0.00% | 1.000 | 1.000 | No |
Asian | 100.0% | White | Race | 0.058530 | 4,322 | 13.56% | 327.0 | 327.0 | 100.00% | 0.00% | 1.000 | 1.000 | No |
Native American | 100.0% | White | Race | 0.058530 | 4,322 | 13.56% | 20.0 | 20.0 | 100.00% | 0.00% | 1.000 | 1.000 | No |
White | 100.0% | Race | 0.058530 | 4,322 | 13.56% | 3,623.0 | 3,623.0 | 100.00% | |||||
Hispanic | 100.0% | Non-Hispanic | Ethnicity | 0.058530 | 4,316 | 13.68% | 508.0 | 508.0 | 100.00% | 0.00% | 1.000 | 1.000 | No |
Non-Hispanic | 100.0% | Ethnicity | 0.058530 | 4,316 | 13.68% | 3,808.0 | 3,808.0 | 100.00% | |||||
Female | 100.0% | Male | Sex | 0.058530 | 2,656 | 46.88% | 1,034.0 | 1,034.0 | 100.00% | 0.00% | 1.000 | 1.000 | No |
Male | 100.0% | Sex | 0.058530 | 2,656 | 46.88% | 1,622.0 | 1,622.0 | 100.00% |
The .plot()
method on the result object returns the plotly figure directly.
type(airq.plot())
plotly.graph_objs._figure.Figure
The .plot()
method also takes an optional argument column
just like a
single-index plot.
airq.plot(column=sd.const.PERCENT_DIFFERENCE_FAVORABLE)
A user can also specify a single group to extract a single by-level plot for.
airq.plot(group="Black")
airq.plot(group="Black", column=sd.const.PERCENT_DIFFERENCE_FAVORABLE)
.plot()
also has a quantile argument to return a figure for a single
quantile. The quantile
argument is specific to AIR by quantile. For
example, the equivalent argument for a categorical AIR calculation would
be category
.
airq.plot(quantile=0.1)
airq.plot(quantile=0.5)
Another argument exposed by multi-level plots is separate
. It is used
to separate a single plotly figure containing multiple subplots into a
list of separate plotly figures for each level. It is convenience
argument equivalent to calling .plot()
with the quantile
argument
for every quantile.
airq_figures = airq.plot(separate=True)
type(airq_figures)
list
airq_figures[0]
airq_figures[4]
As with any other plot, the full documentation and typing support can be
found in the solas_disparity.plots
namespace.
sd.plots.plot_adverse_impact_ratio_by_quantile
<cyfunction plot_adverse_impact_ratio_by_quantile at 0x7f25fa61c860>
More Plot Functionality#
Since the figures returned by plot functions are plotly figures, reference the plotly documentation for more functionality. https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html#plotly.graph_objects.Figure
The update_layout
method to modify overall attributes of the figure, including its height and width.
https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html#plotly.graph_objects.Figure.update_layout
air.plot().update_layout(height=500, width=500)
Plots can be saved as images using the write_image
method. Here’s an
example saving a plot as an svg file.
https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html#plotly.graph_objects.Figure.write_image
air.plot().write_image("air.svg")
Or as a png…
air.plot().write_image("air.png")
The size of plot when being saved to an image can also be controlled without affecting the original figure object.
air.plot().write_image("air_resized.svg", height=800, width=1100)
Clean up files.
from pathlib import Path
to_clean = ["air.svg", "air_resized.svg", "air.png"]
for name in to_clean:
if Path(name).exists():
Path(name).unlink()