'Splitting Array of Lists into named subarrays

Splitting Arrays for Test Train

Essentially I am attempting to convert a pandas dataframe into numpy arrays so that I can run it through a Test/Train.

My goal here is to split the columns into groups of dependent and independent variables on which to run the test-train.

I am able to convert the dataframe into an array of lists with

x = df.values

This effectively gives me a list of a list of every value in every row.

If I were to use np.split() on this array to try to divide into groups, it would only group certain rows together, and not split by the column values.

The simplest example of what I aim to do (Using the already sectored iris dataset as opposed to mine) looks like this:

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0)

with data and target being sub-arrays of the dataset iris. How can I turn my one array of lists, into multiple named sub-arrays of lists?



Solution 1:[1]

I ended up keeping it as a pandas data frame and jsut broke up the columns into two separate new data frames

df2 = df.iloc[: , 1:]

features = list(df2.columns[1:18])

df2 = df2.dropna()

df_x = df2[['Vehicle']] df_y = df2[features]

target = df_x.values data = df_y.values


X_train, X_test, y_train, y_test = train_test_split(data, target,test_size=0.2)


train = xgb.DMatrix(X_train, label=y_train) test = xgb.DMatrix(X_test, label=y_test)

I was overcomplicating things. Thank you everyone for your help

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jeremy Caney