'showing the predicted data with scikit-learn in python

I am doing a project and trying to show some BASIC elements of scikit in python. My goal is to create a 3ish simple examples and show how it learns and predicts. I am applying a simple sine wave type pattern and have been playing with a good example online from https://mclguide.readthedocs.io/en/latest/sklearn/regression.html

My problem is that since I am new to this library and ML in general, I don't understand what I have in front of me and how to transform it into the output I am going for. The two problems I am struggling with is a linear regression on a sine wave and a guassian regression on a more complicated wave. The output I am getting per the article is the accuracy and that works like intended but what I am trying to get to is how to plot the predicted output on top of (or as an extension) of the training data to visually show how it did. I think the data is in here, I am either just using the wrong methods to return the appropriate information or I am not understanding how to extract the information from what is already being returned.

Here are some additional questions

I do not completely understand the "features = x[:, np.newaxis]" line
When plotting, what does '-*' and '-o'do? I looked through the documentation and it appears to be formatting but I couldn't find these two examples exactly.
What do I need to do to get access to the 20% predicted values so that I can plot it against the original?
Is there a simple way to apply the most amount of this code to apply to simple and gaussian examples?

Here is the skeletal code. Most of the scikit from the article is unchanged.


    import numpy as np
    import matplotlib.pyplot as plt
    
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    
    import random
    from operator import add
    
    
    N = 200 # 10 samples
    randomlist = []
    x = np.linspace(0, 12, N) 
    sine_wave =  np.sin(1*x)
    
    #plot the source data
    plt.figure(figsize=(20,5)) 
    plt.plot(x, sum_vector, 'o');
    plt.show()
    
    # convert features in 2D format i.e. list of list
    # print('Before: ', x.shape)
    features = x[:, np.newaxis]
    # print('After: ', features.shape)
    
    # save sine wave in variable 'targets'
    targets = sine_wave
    
    # split the training and test data
    train_features, test_features, train_targets, test_targets = train_test_split(
            features, targets,
            train_size=0.8,
            test_size=0.2,
            # random but same for all run, also accuracy depends on the
            # selection of data e.g. if we put 10 then accuracy will be 1.0
            # in this example
            random_state=23,
            # keep same proportion of 'target' in test and target data
            # stratify=targets  # can not used for single feature
        )
    
    # training using 'training data'
    regressor = LinearRegression()
    regressor.fit(train_features, train_targets) # fit the model for training data
    
    # predict the 'target' for 'training data'
    prediction_training_targets = regressor.predict(train_features)
    
    # note that 'score' uses 'feature and target (not predict_target)'
    # for scoring in Regression
    # whereas 'accuracy_score' uses 'features and predict_targets'
    # for scoring in Classification
    self_accuracy = regressor.score(train_features, train_targets)
    print("Accuracy for training data (self accuracy):", self_accuracy)
    
    # predict the 'target' for 'test data'
    prediction_test_targets = regressor.predict(test_features)
    test_accuracy = regressor.score(test_features, test_targets)
    print("Accuracy for test data:", test_accuracy)
    
    # plot the predicted and actual target for test data
    plt.figure(figsize=(20,5)) 
    plt.plot(test_targets, color = "red")
    plt.show()
    
    plt.plot(prediction_test_targets, '-*', color = "red")
    plt.plot(test_targets, '-o' )
    plt.show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'showing the predicted data with scikit-learn in python

Sources

Related Questions