'Could anyone explain why do these two snippets of code produce different results? (Cross Validation)

I am currently undergoing a machine learning project to predict default. Decided to apply logistic regression and managed to get pretty decent results, but I decided to apply cross-validation and then use parameter tunning to see if I can find better results. I wrote these two blocks of code:

accuracytest = []

recalltrain = []
recalltest = []

precisiontrain = []
precisiontest = []


f1scoretrain = []
f1scoretest = []


KF = StratifiedKFold(n_splits = 5, random_state= 42, shuffle = True)
KF.get_n_splits(X, y)

for train_index, test_index in KF.split(X, y):
    print("Train:", train_index, "Validation:", test_index)
    X_train_scaled, X_test_scaled = X.iloc[train_index], X.iloc[test_index] 
    y_train, y_test = y.iloc[train_index], y.iloc[test_index] 
    
    sgd_LogisticRegression = SGDClassifier(loss='log', n_jobs = -1, warm_start = True, class_weight = "balanced").fit(X_train_scaled, y_train)    
    
    y_pred_train = sgd_LogisticRegression.predict(X_train_scaled) 

    y_pred_test = sgd_LogisticRegression.predict(X_test_scaled) 
    
    accuracy_train = accuracy_score(y_train, y_pred_train)
    accuracy_test = accuracy_score(y_test, y_pred_test)
    
    recall_train = recall_score(y_train, y_pred_train)
    recall_test = recall_score(y_test, y_pred_test)
    
    precision_train = precision_score(y_train, y_pred_train)
    precision_test = precision_score(y_test, y_pred_test)
    
    f1score_train = f1_score(y_train, y_pred_train)
    f1score_test = f1_score(y_test, y_pred_test)
    
    accuracytrain.append(accuracy_train)
    accuracytest.append(accuracy_test)
    
    recalltrain.append(recall_train)
    recalltest.append(recall_test)
    
    precisiontrain.append(precision_train)
    precisiontest.append(precision_test)
    
    f1scoretrain.append(f1score_train)
    f1scoretest.append(f1score_test)

And

clf =  sgd_LogisticRegression = SGDClassifier(loss='log', n_jobs = -1, warm_start = True, class_weight = "balanced")

scores = cross_val_score(clf, X, y, cv = KF, scoring = "accuracy")

I am getting vastly different results. The first code was to simply show the cv validation training while I explained in my notebook what I was doing.

But I cannot figure out why to these two snippets of code provide different results. Any ideas?

Thank you!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Could anyone explain why do these two snippets of code produce different results? (Cross Validation)

Sources

Related Questions