'How to retrieve the random_state of sklearn.model_selection.train_test_split?
How to retrieve the random state of sklearn.model_selection.train_test_split?
Without setting the random_state, I split my dataset with train_test_split. Because the machine learning model trained on the split dataset performs quite well, I want to retrieve the random_state that was used to split the dataset. Is there something like numpy.random.get_state()
Solution 1:[1]
What do you mean?
If you wanna know which random_state you are using, you have to use random_state while running the function, for example:
X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.33, random_state=42)
by default its set to none see the docs.
Here are also further information to random_state.
Or do you mean this?
Solution 2:[2]
If you only have an old notebook showing a slice of one+ of the train/test subsets (eg X_test[0:5], y_train[-5:], etc), but you know the other parameters (eg [test_size | train_size, shuffle, stratify]) of the train_test_split() call and can perfectly recreate X and y, you could try brute-forcing it by generating new splits with different random_state seeds and comparing the split to your known subset-slice and recording any random_state values producing matching (or close-enough that differences could just be floating-point weirdness) subset-slice values.
target_y_train = np.array([-5.482, -11.165, -13.926, -7.534, -8.323])
possible_random_state_values = []
for i in range(0, 1000):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=i)
if all(np.isclose(y_train[0:5], target_y_train)):
possible_random_state_values.append(i)
print(f"Possible random state value found: {i}")
If you don't get any possible seeds from the (0, 1000] range, increase the higher range. And when you get values, you can plug them into train_test_split(), compare other subset_slices if you have any, rerun your model training pipeline, and compare your output metrics.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | MattTriano |
