'What is the optimal way to split up the cores (n_jobs) between an sklearn model and RandomizedSearch?
I am trying to figure out how to use the cores that I have available most effectively for a random forest that is being tuned using RandomizedSearchCV. Is it enough to specify n_jobs = -1 in the RandomizedSearch, should I include this in the model itself as well, or should I even divide up the cores between the two (e.g., if I have 32 cores available, n_jobs = 16 in the model and n_jobs = 16 in the RandomizedSearch)?
This is an excerpt of my code:
rf_model = RandomForestClassifier()
tuning_grid_rf = {'n_estimators': [500, 1000, 1500],
'max_features': ['auto', 'sqrt'],
'max_depth': [10, 20, 30],
'min_samples_split': [1, 3, 5],
'min_samples_leaf': [1, 3, 5],
'max_samples': [0.5, 0.75, None]}
rf_tuning = RandomizedSearchCV(estimator = rf_model,
param_distributions = tuning_grid_rf,
n_iter = 50,
cv = 5,
verbose = 2,
n_jobs = -1,
random_state = 42)
history_rf = rf_tuning.fit(X_train, y_train)
This is my first question here, so please let me know if I have left out any relevant information.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
