'ValueError Data cardinality is ambiguous:

I'm trying to learn tensorflow basic and make codes to check students performance score with this csvfrom kaggle, . But I have this error

The error is

ValueError

Data cardinality is ambiguous:

x sizes: 1000

y sizes: 3

Make sure all arrays contain the same number of samples.

File "C:\Users\w1234\algorithm.py\tensor\tensorflow\students_performance.py", line 30, in model.fit(np.array(x_data), np.array(y_data), epochs = 100)

Could you help me? How can I change the samples size?

The codes

from sklearn import metrics
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
import os
import numpy as np
import pandas as pd

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

data = pd.read_csv("C:/Users/w1234/algorithm.py/tensor/tensorflow/students_performance.csv")
data = data.dropna()
x_data = []
y_data = [data['math score'].values,
          data['reading score'].values,
          data['writing score']]


for i, row in data.iterrows() :
    x_data.append([row['gender'],
                  row['parental level of education'],
                  row['lunch'],
                  row['test preparation course']])

model = Sequential([Dense(64, activation='relu'),
                    Dense(32, activation='relu'),
                    Dense(1, activation='sigmoid', name = 'output')])

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')
model.fit(np.array(x_data), np.array(y_data), epochs = 100)


Solution 1:[1]

Typically machine learning algorithms work with numeric matrices or tensors and hence most feature engineering techniques deal with converting raw data into some numeric representations which can be easily understood by these algorithms.

From your code it seems like you are trying to predict the output for race/ethnicity which is the output variable.

gender, parental level of education, lunch, test preparation course are all categorical columns with dtype as object, we must convert these columns to numerical columns, hence I have used one-hot encoding.

Please find the working code below:

from sklearn import metrics
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
import os
import numpy as np
import pandas as pd

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

data = pd.read_csv("/content/StudentsPerformance.csv")
data = data.dropna()

#y_data is the output variable
y_data=data.pop("race/ethnicity")

#x_data are the input variables or the features on which y_data is depended
x_data=data

x_data.astype('object')

categorical_cols = ['gender', 'parental level of education', 'lunch', 'test preparation course'] 

#One-hot encoding
x_data = pd.get_dummies(x_data, columns = categorical_cols)

x_data.astype('float')

y_data  =pd.get_dummies(y_data)

model = Sequential([Dense(64, activation='relu', ),
                    Dense(32, activation='relu'),
                    Dense(5, activation='sigmoid', name = 'output')])

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')
model.fit((x_data), (y_data), epochs = 100)

Let us know if the issue still persists. Thanks!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tfer3