'Convert Select Columns in Pandas Dataframe to Numpy Array
I would like to convert everything but the first column of a pandas dataframe into a numpy array. For some reason using the columns= parameter of DataFrame.to_matrix() is not working.
df:
viz a1_count a1_mean a1_std
0 n 3 2 0.816497
1 n 0 NaN NaN
2 n 2 51 50.000000
I tried X=df.as_matrix(columns=[df[1:]]) but this yields an array of all NaNs
Solution 1:[1]
the easy way is the "values" property df.iloc[:,1:].values
a=df.iloc[:,1:]
b=df.iloc[:,1:].values
print(type(df))
print(type(a))
print(type(b))
so, you can get type
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>
Solution 2:[2]
Please use the Pandas to_numpy() method. Below is an example--
>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1, 2], "B":[3, 4], "C":[5, 6]})
>>> df
A B C
0 1 3 5
1 2 4 6
>>> s_array = df[["A", "B", "C"]].to_numpy()
>>> s_array
array([[1, 3, 5],
[2, 4, 6]])
>>> t_array = df[["B", "C"]].to_numpy()
>>> print (t_array)
[[3 5]
[4 6]]
Hope this helps. You can select any number of columns using
columns = ['col1', 'col2', 'col3']
df1 = df[columns]
Then apply to_numpy() method.
Solution 3:[3]
Hope this easy one liner helps:
cols_as_np = df[df.columns[1:]].to_numpy()
Solution 4:[4]
The best way for converting to Numpy Array is using '.to_numpy(self, dtype=None, copy=False)'. It is new in version 0.24.0.Refrence
You can also use '.array'.Refrence
Pandas .as_matrix deprecated since version 0.23.0.
Solution 5:[5]
Instead of .as_matrix(), use .values, because the first one was deprecated. Here is the contribution:
Solution 6:[6]
The fastest and easiest way is to use .as_matrix(). One short line:
df.iloc[:,[1,2,3]].as_matrix()
Gives:
array([[3, 2, 0.816497],
[0, 'NaN', 'NaN'],
[2, 51, 50.0]], dtype=object)
By using indices of the columns, you can use this code for any dataframe with different column names.
Here are the steps for your example:
import pandas as pd
columns = ['viz', 'a1_count', 'a1_mean', 'a1_std']
index = [0,1,2]
vals = {'viz': ['n','n','n'], 'a1_count': [3,0,2], 'a1_mean': [2,'NaN', 51], 'a1_std': [0.816497, 'NaN', 50.000000]}
df = pd.DataFrame(vals, columns=columns, index=index)
Gives:
viz a1_count a1_mean a1_std
0 n 3 2 0.816497
1 n 0 NaN NaN
2 n 2 51 50
Then:
x1 = df.iloc[:,[1,2,3]].as_matrix()
Gives:
array([[3, 2, 0.816497],
[0, 'NaN', 'NaN'],
[2, 51, 50.0]], dtype=object)
Where x1 is numpy.ndarray.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | 176coding |
| Solution 2 | Suvo |
| Solution 3 | Kamen Tsvetkov |
| Solution 4 | amir |
| Solution 5 | vcg |
| Solution 6 | amc |
