'Process each row item and delete the processed row from dataframe in python

I have following dataframe:

       number               msg
0  1833181080  this is 1st test
1  1865585030  this is 2nd test
2    11111111         faul test

I want to process each row and drop in from the DataFrame after processing. I can access row by below:

import pandas as pd

df = pd.read_csv('test.csv')

print(df)

for row in df.iterrows():
    
    print(row[1].number)
    print(row[1].msg)

Output:

       number               msg
0  1833181080  this is 1st test
1  1865585030  this is 2nd test
2    11111111         faul test
1833181080
this is 1st test
1865585030
this is 2nd test
11111111
faul test

How can I do that?

Desired output:

after 1st iteration:df would be like (1st row deletioin)

       number               msg
1  1865585030  this is 2nd test
2    11111111         faul test

after 2nd iteration:

2    11111111         faul test

etc



Solution 1:[1]

Firstly, from your initial question, you can delete each row by defining the variable df as df from row 1 (df.iloc[1:]) onwards:

i = 0
while len(df)>0:
    print(df.iloc[0].number)
    print(df.iloc[0].msg)
    df = df.iloc[1:]
    # Creating .csv files with less and less rows.
    i += 1
    name = "test" + str(i) + ".csv"
    df.to_csv(name, index=False)

The .csv files have different names so that they don't overwrite your initial file. You could just change to name = "test.csv" if you wanted to completely delete the item once you have deleted it (or name = "unprocessed.csv" to create a .csv with only unprocessed items in).

Following your comment from your answer, you could set processed to "Yes" after printing the rows.

df2 = pd.DataFrame({"number": [1833181080, 1865585030, 11111111], "msg": ["this is 1st test", "this is 2nd test", "faul test"]}, index=[0, 1, 2])
df2["processed"] = "No"

while len(df2[df2["processed"] == "No"])>0:
    print(df2[df2["processed"] == "No"].iloc[0].number)
    print(df2[df2["processed"] == "No"].iloc[0].msg)
    df2.loc[df2[df2["processed"] == "No"].iloc[0].name, "processed"] = "Yes"
    print(df2[df2["processed"] == "No"])

#Output:
#1833181080
#this is 1st test
#       number               msg processed
#1  1865585030  this is 2nd test        No
#2    11111111         faul test        No
#1865585030
#this is 2nd test
#     number        msg processed
#2  11111111  faul test        No
#11111111
#faul test
#Empty DataFrame
#Columns: [number, msg, processed]
#Index: []

This filters the DataFrame to only items with "processed" column value as "No", then prints the first row's data. Then in the penultimate line it matches the index of that row and changes the "processed" value to "Yes", so that the next iteration will not include that row when filtering for "No". It uses name because df2[df2["processed"] == "No"].iloc[0] returns data with name as the index value:

df2[df2["processed"] == "No"].iloc[0]
#Out: 
#number             1833181080
#msg          this is 1st test
#processed                  No
#Name: 0, dtype: object

Note that I created the DataFrame each time. I also tried these with the .csv file instead and it also works.

Solution 2:[2]

I have solved it:

import pandas as pd
from datetime import datetime

df = pd.read_csv('test.csv')

df2=df.copy()


dd=int(datetime.now().strftime("%H"))

def deleterows(row):
    df.drop(df[df['number'] == row].index, inplace = True)

for row in df.iterrows():
    
    m=row[1].msg
    if dd > 14:
        n=row[1].number
        #print("it's more than 15 and you can start now")
        
        if dd==16:
                print("it's 16 now")
    #print(df)
    #print("{} is delivered at {}".format(row[1].number,datetime.now()))
    deleterows(n)
    print("row: {} is deleted".format(n))
    print(df)

print(df)
print(df2)

output:

row: 1833181080 is deleted
       number  com               msg
1  1865585030  new  this is 2nd test
2    11111111  new         faul test
row: 1865585030 is deleted
     number  com        msg
2  11111111  new  faul test
row: 11111111 is deleted
Empty DataFrame
Columns: [number, com, msg]
Index: []
Empty DataFrame
Columns: [number, com, msg]
Index: []
       number  com               msg
0  1833181080  new  this is 1st test
1  1865585030  new  this is 2nd test
2    11111111  new         faul test

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rawson
Solution 2 Bonomali