'x and y must be the same size

Using python I'm trying to plot a sin wave and random distribution, then show where the ratio is greater than or equal to 3.

I think I'm 90% of the way there but keep getting the error message 'x and y must be the same size' when I try to plot it. I've been racking my brains but can't figure out what I'm missing.

Any help or pointers gratefully received.

import numpy as np
import math
import matplotlib.pyplot as plt

r= 2*math.pi
dev = 0.1
x = np.array(np.arange(0, r, dev))
y1 = np.array(np.sin(x))
y2 = np.array(np.random.normal(loc=0, scale=0.1, size=63))

mask = y1//y2 >= 3

fit = np.array(x[mask])

print(fit)


plt.plot(x, y1)
plt.scatter(x, fit)
plt.scatter(x, y2, marker=".")
plt.show()


Solution 1:[1]

Not sure if this is what you want but this will scatter dots on the sin-curve corresponding to your mask.

import numpy as np
import math
import matplotlib.pyplot as plt

r= 2*math.pi
dev = 0.1
x = np.array(np.arange(0, r, dev))
y1 = np.array(np.sin(x))
y2 = np.array(np.random.normal(loc=0, scale=0.1, size=63))

mask = y1//y2 >= 3

fit_x = np.array(x[mask])
fit_y = np.array(y1[mask])


plt.plot(x, y1)
plt.scatter(fit_x, fit_y)
plt.scatter(x, y2, marker=".")
plt.show()

Solution 2:[2]

Insert this line into your code, just before the point of error:

print(len(x), len(fit))

Output:

63 28

You explicitly removed elements from your sequence, and then expected them to be of the same size. You still have 63 x values, but now only 28 y values. Since you didn't trace the problem and explain what you intend for this scatter plot, I have no way of knowing what a "fix" might be. Perhaps make a list of point (x-y pairs), and then filter that for the appropriate y1/y2 ratio?

Solution 3:[3]

In your line plt.scatter(x, fit) you are trying to scatter your x-values with your fit-values. However fit is only of size 25 file while x is of size 63 (as are y1 and y2 btw., thats why that part works).

mask is basically an array of False or True values. That means if you use the np.array(x[mask]) function. It will only create an array of the values where x is actually True, which seems to be what you want. But you can only scatter this against something like np.array(np.sin(fit)), otherwise the sizes are incompatible to scatter.

Solution 4:[4]

    """## Splitting the dataset into the Training set and Test set"""
    
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

"""## Training the Simple Linear Regression model on the Training set"""

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

"""## Predicting the Test set results"""

y_pred = regressor.predict(X_test)

"""## Visualising the Training set results"""

plt.scatter(X_train, y_train, color = 'green')
plt.plot(X_train, regressor.predict(X_train), color = 'yellow')
plt.title('Doctor visits(Training set)')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

"""## Visualising the Test set results"""

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Doctor visits (Test set)')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin Gustafsson
Solution 2 Prune
Solution 3 Jonas J.
Solution 4 Dhivya Bharkavi