'Categorical Data Preprocessing for Regression
I have a training dataset where values of "Output" col is dependent on three columns (which are categorical [No ordering]).
Inp1 Inp2 Inp3 Output
A,B,C AI,UI,JI Apple,Bat,Dog Animals
L,M,N LI,DO,LI Lawn, Moon, Noon Noun
X,Y,Z LI,AI,UI Xmas,Yemen,Zombie Extras
So, based on this training data, I need a ML Algorithm to predict any incoming data row such that if it is Similar to training rows highest similar output aassigned.
The rows can go on increasing (hence get_dummies is creating a lot of columns, using those is not feasible), also the there's no ordering as per priority. Which encoding for Inp columns Categorical data will be needed for a multiple regression model to work.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
