'TypeError: expected x and y to have same length

%matplotlib inline
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import io
#from tabulate import tabulate
df=pd.read_csv(io.BytesIO(dados['GramadoBIN.csv']))

x=x[~np.isnan(x)]
y=df['Frac']
n=y.size
y=y[~np.isnan(y)]

yerror=2*df['errory']
#yerror=0
p, cov=np.polyfit(x,y,1,cov=True)

#regression
yfit=np.polyval(p, x)
perr=np.sqrt(np.diag(cov))
R2=np.corrcoef(x, y)[0, 1]**2
resid=y-yfit
chi2red=np.sum((resid/yerror)**2)/(n-2)
s_err = np.sqrt(np.sum(resid**2)/(n - 2)) 
t = stats.t.ppf(0.975, n - 2)
ci = t * s_err * np.sqrt(    1/n + (x - np.mean(x))**2/np.sum((x-np.mean(x))**2))

# Prediction interval for the linear fit:
pi = t * s_err * np.sqrt(1 + 1/n + (x - np.mean(x))**2/np.sum((x-np.mean(x))**2))
plt.plot(x,y,'bo')
plt.errorbar(x, y, yerr=yerror, fmt = 'bo', ecolor='b', capsize=0)
plt.plot(x,yfit,'r', linewidth=3,color=[1,0,0,.5])
plt.fill_between(x, yfit+pi, yfit-pi, color=[1, 0, 0, 0.1], edgecolor='')
plt.fill_between(x, yfit+ci, yfit-ci, color=[1, 0, 0, 0.15], edgecolor='')
plt.title('$y = %.2f \pm %.2f + (%.2f \pm %.2f)x \; [R^2=%.2f,\, \chi^2_{red}=%.1f]$'
          %(p[1], perr[1], p[0], perr[0], R2, chi2red), fontsize=20, color=[0, 0, 0]) 
ax=plt.subplot()
#ax.set_xlim(2,8
#ax.set_ylim(0,1.2)
#plt.savefig(f"{images_dir}/ColumbiaFZr.eps", format='eps')
plt.show()
print(yerror)

Hello, I'm trying to do a linear correlation on some data to analyze the error. However when I run the code the error (expected x and y to have same length) appears. I'm confused as the code ran perfectly on other data.

The error is in the line (p, cov=np.polyfit(x,y,1,cov=True))

  1. Why this error is occouring? It did not happen with another .csv file...


Solution 1:[1]

It tells you that the function needs x and y to be the same size, which they are not in this case. I'm trying to find the definition of the x variable but I don't see any. However, if your x is defined previously in some other code, I would suggest to check values and lengths of variables on the following lines:

x=x[~np.isnan(x)]
y=df['Frac']
n=y.size
y=y[~np.isnan(y)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1