'For loop list all rows data by each columns in list - pandas

I want to make for loop that makes the list of each columns. There are a lot of columns so can I use df[i] instead of columns name?

ex:

df = {
    'A': [apple, hello, carrot],
    'B': [4, 5, 6],
    'C': [7, 8, 9]}

for i in df:
    df[i] = list(df.select(df[i]).toPandas()[df[i]]

I want output

a: apple, hello, carrot
b: 4,5,6
c: 7,8,9


Solution 1:[1]

From the functions you're using (e.g. toPandas()), it seems like you may be using PySpark, but if so you should make that clear in your question.

I'm going to ignore the PySpark part and assume we're just talking about a Pandas DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({'A':['apple', 'hello', 'carrot'], 'B':[4, 5, 6], 'C':[7, 8, 9] })
>>> df
        A  B  C
0   apple  4  7
1   hello  5  8
2  carrot  6  9

DataFrames have three primary ways to access rows, columns, and cells.

The first way is by indexing by a row name directly on the DataFrame. Example:

>>> df['A']
0     apple
1     hello
2    carrot

The second is with .loc[rowindexvalue, colname]. To select the 'A' column, you'd put : for the rowindex portion which tells Pandas select all rows. Example:

>>> df.loc[:, 'A']
0     apple
1     hello
2    carrot
Name: A, dtype: object

The third way is with .iloc[rowindex, colindex]. You can only use integer indexes with .iloc (cannot use column names). So to select the first column and all rows in our example, you'd do this:

>>> df.iloc[:, 0]
0     apple
1     hello
2    carrot
Name: A, dtype: object

To convert any of the above examples into a Python list, you can simply wrap it in a list() function. Using our first example above, that would be:

>>> list(df['A'])
['apple', 'hello', 'carrot']

Finally, you can iterate over the columns like this:

>>> for c in df.columns:
...     print(f"{c}: {list(df[c])}")
... 
A: ['apple', 'hello', 'carrot']
B: [4, 5, 6]
C: [7, 8, 9]
>>> 

Solution 2:[2]

To obtain the list of columns, one option is:

iterable = df.columns.to_list() 

Then you can iterate through that list that you have just created.

Solution 3:[3]

Why loop, why not going functional? With .apply() you can apply a function to each row (axis=1) or each column (axis=0). Given:

import pandas as pd

df = pd.DataFrame({'A':['apple', 'hello', 'carrot'], 'B':[4, 5, 6], 'C':[7, 8, 9] })

You can produce the printed output you suggested above by a quite simple one-liner like...


df.apply(lambda col: print(f"{col.name.lower()}: {list(col)}") , axis=0)

Of course you are free to pass any function you like. If you are not familiar with lambda which is nothing more than a fancy shortcut so you don't have to define your function first, but of course you can do that as well which is equivalent to the example above:

def print_column(c):
    print(f"{c.name.lower()}: {list(c)}")

df.apply(print_column, axis=0)

BTW: c here or col above are just the parameter of the function, which gets the actual column/row -data passed as a Series... you can name it anything you like of course and access it like a pandas Series

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3