'Greedy Algorithm to Determine 3 Best Quantile Points for Quantile Regression

I ran a Quantile Regression algorithm on a multimodal data point and have successfully generated a reasonable model with sufficient accuracy. However, the challenge I have is I had to hand pick the quantile points. I have to replicate this experience against multiple products.

The first image shows the quantiles I hand-picked. However, I'd like to automate the process so the machine can return to me the three best points. The source from which I picked my quantiles is from the table below which shows the average Mean Absolute Error (MAE) of the quantile columns at each 0.05 quantile/percentile. I basically started from the left most column, and gradually select the quantile points that yield the smallest average MAE.

I'm familiar with the Greedy Algorithms that looks through one list and yields a maximum/minimum solution by slicing a list/array and highlighting the first and last index. However, in multi-dimensional space like the table below, how can such algorithm be utilized?

In summary, the goal is to find three quantile points with the minimum average MAE to represent the observation. What approach can be used to solve this problem?

High-level overview of how I created the table below:

# Setup Flow
quantiles = np.arange(0.05,1.0,0.05)
predictions = {}

# Run Regression
from statsmodels.regression.quantile_regression import QuantReg

X_ = general_linear_model_preprocessor.fit(df_btb)
X_columns = X_.get_feature_names_out()
X = pd.DataFrame(X_.transform(df_set['product_train']), columns=X_columns, index=df_set['product_train'].index)
mod = QuantReg(df_set['product_train']['btb_total'],X)

# Go through each quantile and generate a regression model
for quantile in quantiles:
    res = mod.fit(q=quantile, kernel='epa')
    X_test = pd.DataFrame(X_.transform(df_set['product_test']), columns=X_columns, index=df_set['product_test'].index)
    predictions["{:.2f}".format(quantile)] = res.predict(X_test)


# Record the actual data into a new DataFrame 
# and concatenate with quantile results

df_y = df_set['product_test']['btb_total'].copy()

for quantile in quantiles:
    df_y_hat = pd.DataFrame(predictions["{:.2f}".format(quantile)], columns=["{:.2f}".format(quantile)], index=df_y.index)
    df_y = pd.concat([df_y,df_y_hat], axis=1)

# Sort
df_y_sort = df_y.sort_values(by=[y_variable])

# Generate table to measure MAE
df_mae = np.abs(df_y_sort.drop(columns=['btb_total']).sub(df_y_sort['btb_total'], axis=0))

# Split into 0.05 quantiles
df_mae_split = np.array_split(df_mae, len(quantiles)+1)

# Aggregator
mae_aggregator = list()

# Loop and measure MAE
for df_sub in df_mae_split:
    mae_aggregator.append(df_sub.mean(axis=0).values.ravel())

pd.DataFrame(mae_aggregator, columns=df_mae.columns)

Quantile Regression Output against Observation

enter image description here



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source