'How to cluster dataframe with numeric & categorical data?
My data has numeric column like amount & categorical column like 'payment type' with values cash, cheque & online. I am trying to cluster the data based on these 2 columns. Since it has mixed data type, I tried using kprototype for different no. of clusters and tried to plot an elbow graph to choose the optimal k value. Before trying kprototype, I have cleaned my dataset & removed all the outliers.
Below is the elbow graph I got. Since I cannot see a clear elbow in this graph, I am confused.

below is the code I wrote to get the plot:
# checking k value vs cost function for k = 4 to 9
cst = []
for k in range(4, 10):
kproto = KPrototypes(n_clusters=k, max_iter=10)
kproto.fit(trans_data, categorical=[1])
cst.append([k, kproto.cost_])
temp = pd.DataFrame(cst)
plt.plot(temp[0], temp[1])
Q1: Is something wrong with my code?
Q2: Does it mean that my dataset has no clear clusters?
Q3: Could it mean that my data needs more clusters & I need to plot the graph for higher no of clusters?
Q4: Most importantly, is there any other clustering technique I can use for such kind of data?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
