'Parameter tuning and feature engineering, which one should be first?
I'm trying to train an SVM classifier but I'm quite new to ML. I know there are two steps here: parameter tuning and feature engineering, but which one goes first? It seems this answer suggested do feature engineering first, is it correct? If it's correct, do I randomly pick a set of SVM parameters to do feature engineering?
Solution 1:[1]
SVM's (and most other ML methods) accept input in the form of a 2-dimensional numeric feature matrix, so you will have to transform your data into that format to even use the SVM. So while you want to do some feature engineering before parameter tuning to confirm that your pipeline works the way you think it should, you don't necessarily need to separate the two completely.
If you use an automated or parameterized feature engineering method, then that method can be part of your hyperparameter tuning procedure.
One way to do this is using Featuretools, an open-source automated feature engineering library in Python in conjunction with a machine-learning library like Scikit-Learn.
Here's a pipeline using a demo dataset in Featuretools that does hyperparameter tuning and feature engineering in the same step:
import featuretools as ft
from featuretools.primitives import (Sum, Max, Mean, Min,
Percentile, Day, Weekend, Weekday)
from featuretools.selection import remove_low_information_features
from itertools import combinations
from sklearn.metrics import f1_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler, Imputer
retail_data = ft.demo.load_retail(nrows=1000)
# predict each customer's country
labels = LabelEncoder().fit_transform(retail_data['customers'].df['Country'])
def score_pipeline(max_depth, agg_primitives, trans_primitives, C):
feature_matrix, feature_defs = ft.dfs(entityset=retail_data,
target_entity='customers',
ignore_variables={'customers': ['Country']},
max_depth=max_depth,
agg_primitives=agg_primitives,
trans_primitives=trans_primitives,
verbose=True)
# one-hot encode to transform to numeric
feature_matrix, feature_defs = ft.encode_features(feature_matrix, feature_defs)
# remove feature with all nans or all single value
feature_matrix, feature_defs = remove_low_information_features(feature_matrix, feature_defs)
# impute missing values
imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)
feature_matrix = imputer.fit_transform(feature_matrix)
model = SVC(C=C, verbose=True)
X_train, X_test, y_train, y_test = train_test_split(feature_matrix,
labels, test_size=0.1)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
return f1_score(y_test, predictions, average='macro')
poss_agg_primitives = [Sum, Max, Mean, Min]
poss_trans_primitives = [Percentile, Day, Weekend, Weekday]
scores = []
for agg_primitives in combinations(poss_agg_primitives, 2):
for trans_primitives in combinations(poss_trans_primitives, 2):
for max_depth in range(1, 3):
for C in [0.01, 0.1, 1.0]:
score = score_pipeline(max_depth,
agg_primitives,
trans_primitives,
C)
scores.append(score)
print("Best score: {:.3f}".format(max(scores)))
Solution 2:[2]
feature engineering should be done first. follow the following sequence
- missing values imputation
- variables encoding
- handle outliers
- linear model assumption to select features
- select features mostly correlated to labels
these are some of basic steps of feature engineering. other than this it depends alot on what kind of dataset you are working on
Solution 3:[3]
All Machine leaning model performance depends upon how we have created unique features from all available data set
- Feature engineering
then check for correlation between feature to remove correlated features - Parameter tunning
Solution 4:[4]
You need to create your features and your training set before you train your model, so the first iteration of feature engineering must come before parameter tuning. However, both feature engineering and parameter tuning are iterative processes. For instance, you may use your first version of your features to train your model using grid search (brute force search for best parameters), and then you can use those parameters to try out different permutations of your features. For instance, you may try to use some variations of feature X, such as log(X), sqrt(X), X^2, etc. to see if this gives you better results.
My typical process is:
- Feature brain storming
- Feature creation
- Correlation analysis
- Feature selection
- Feature transformation (to make them as linearly correlated with the target as possible)
- Feature scaling to 1-mean unit-variance
- Grid search to find initial hyperparameters for the algorithm
- Iterative process for testing alternative feature transforms
- Iterative process for testing more fine-tuned hyper parameters
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | bschreck |
| Solution 2 | Mridul Pandey |
| Solution 3 | Vit |
| Solution 4 | Morten Jorgensen |
