'How do I get the partial dependence values from the PDP Plot in Python?

I have trained the following XGBClassifier:

import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.inspection import partial_dependence
from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots

model = XGBClassifier(
    objective='binary:logistic',
    base_score=0.5, 
    booster='gbtree', 
    colsample_bylevel=1,
    colsample_bynode=1, 
    colsample_bytree=0.8,
    enable_categorical=False, 
    gamma=0.5, 
    gpu_id=-1,
    importance_type=None, 
    learning_rate=0.2, 
    max_delta_step=0,
    max_depth=6,
    min_child_weight=3, 
    n_estimators=8, 
    n_jobs=1, 
    nthread=1, 
    num_parallel_tree=1,
    predictor='auto',
    random_state=0, 
    reg_alpha=0, 
    reg_lambda=1,
    scale_pos_weight=1, 
    silent=True, 
    subsample=1,
    tree_method='exact',
    validate_parameters=1, 
    pred_contribs=True,  
    verbosity=None)


model.fit(X, Y)

where X is the dataframe containing the predictive features and Y is the binary target.

Here is the list of predictive features:

col_list = ['avgBalanceLast90Days','rechargeAmtLAst90DaysMainAcct','medianAmtRechargeLast30Days','networkAge','numRechargesLast90DaysMainAcct','avgDailySpendLast90Days','avgDailySpendLast30Days','numLoansLast90Days','numRechargesLast30DaysMainAcct','amtLoanLast30Days','avgBalanceLast30Days','amtLoanLast90Days','amtLastRechargeLast30Days','numLoanLast30Days','medianBalanceBeforeRechargeLast20Days','medianBalanceBeforeRechargeLast90DaysUserLevel']

I then plot a Partial Dependence Plot for the first feature:

    pdp_goals = pdp.pdp_isolate(model=model, dataset=X, model_features=col_list, feature=col_list[0])
    print(pdp_goals)
    # plot it
    pdp.pdp_plot(pdp_goals, col_list[0])
    plt.show()

The plot looks like this:

enter image description here

I was curious to see the partial dependence value for each record. So I have coded this:

y = partial_dependence(model, features=[0], X=X, percentiles=(0, 1),grid_resolution=2) 

print(y)

And the function seems to return the PD values for the minimum and maximum values of the predictive feature:

(array([[0.8010992, 0.8048475]], dtype=float32), [array([-12912.4 , 200148.11])])

So, I try to produce the same results by iterating through each and every record in the dataframe X:

for i in range(len(X)):#[i:i+1]
    y = partial_dependence(model, features=[0], X=X[i:i+1], percentiles=(0, 1),grid_resolution=2) 

    print(y)

And what I get is the PD value for each feature value:

(array([[0.7965402]], dtype=float32), [array([3691.26])])
(array([[0.7831624]], dtype=float32), [array([159.42])])
(array([[0.8669701]], dtype=float32), [array([380.13])])
(array([[0.8731023]], dtype=float32), [array([2287.5])])
(array([[0.700277]], dtype=float32), [array([612.96])])
(array([[0.8800472]], dtype=float32), [array([240.41])])
(array([[0.8826335]], dtype=float32), [array([208.8])])
(array([[0.67712134]], dtype=float32), [array([40.])])
(array([[0.84342635]], dtype=float32), [array([1660.96])])
(array([[0.74027264]], dtype=float32), [array([969.12])])
..... and so on 

I have taken a look at the Partial Dependence values for the case in which the feature value is 0:

enter image description here

Why are there multiple Partial Dependence values for a single Feature value (in this case 0)? How is the plot above generated? Does the Partial Dependence Plot take the mean of all Partial Dependence Values for each feature value?

Is there a quick way in Python to get the values in the X-axis and Y-axis from the plot shown above?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source