'Process each row item and delete the processed row from dataframe in python
I have following dataframe:
number msg
0 1833181080 this is 1st test
1 1865585030 this is 2nd test
2 11111111 faul test
I want to process each row and drop in from the DataFrame after processing. I can access row by below:
import pandas as pd
df = pd.read_csv('test.csv')
print(df)
for row in df.iterrows():
print(row[1].number)
print(row[1].msg)
Output:
number msg
0 1833181080 this is 1st test
1 1865585030 this is 2nd test
2 11111111 faul test
1833181080
this is 1st test
1865585030
this is 2nd test
11111111
faul test
How can I do that?
Desired output:
after 1st iteration:df would be like (1st row deletioin)
number msg
1 1865585030 this is 2nd test
2 11111111 faul test
after 2nd iteration:
2 11111111 faul test
etc
Solution 1:[1]
Firstly, from your initial question, you can delete each row by defining the variable df as df from row 1 (df.iloc[1:]) onwards:
i = 0
while len(df)>0:
print(df.iloc[0].number)
print(df.iloc[0].msg)
df = df.iloc[1:]
# Creating .csv files with less and less rows.
i += 1
name = "test" + str(i) + ".csv"
df.to_csv(name, index=False)
The .csv files have different names so that they don't overwrite your initial file. You could just change to name = "test.csv" if you wanted to completely delete the item once you have deleted it (or name = "unprocessed.csv" to create a .csv with only unprocessed items in).
Following your comment from your answer, you could set processed to "Yes" after printing the rows.
df2 = pd.DataFrame({"number": [1833181080, 1865585030, 11111111], "msg": ["this is 1st test", "this is 2nd test", "faul test"]}, index=[0, 1, 2])
df2["processed"] = "No"
while len(df2[df2["processed"] == "No"])>0:
print(df2[df2["processed"] == "No"].iloc[0].number)
print(df2[df2["processed"] == "No"].iloc[0].msg)
df2.loc[df2[df2["processed"] == "No"].iloc[0].name, "processed"] = "Yes"
print(df2[df2["processed"] == "No"])
#Output:
#1833181080
#this is 1st test
# number msg processed
#1 1865585030 this is 2nd test No
#2 11111111 faul test No
#1865585030
#this is 2nd test
# number msg processed
#2 11111111 faul test No
#11111111
#faul test
#Empty DataFrame
#Columns: [number, msg, processed]
#Index: []
This filters the DataFrame to only items with "processed" column value as "No", then prints the first row's data. Then in the penultimate line it matches the index of that row and changes the "processed" value to "Yes", so that the next iteration will not include that row when filtering for "No". It uses name because df2[df2["processed"] == "No"].iloc[0] returns data with name as the index value:
df2[df2["processed"] == "No"].iloc[0]
#Out:
#number 1833181080
#msg this is 1st test
#processed No
#Name: 0, dtype: object
Note that I created the DataFrame each time. I also tried these with the .csv file instead and it also works.
Solution 2:[2]
I have solved it:
import pandas as pd
from datetime import datetime
df = pd.read_csv('test.csv')
df2=df.copy()
dd=int(datetime.now().strftime("%H"))
def deleterows(row):
df.drop(df[df['number'] == row].index, inplace = True)
for row in df.iterrows():
m=row[1].msg
if dd > 14:
n=row[1].number
#print("it's more than 15 and you can start now")
if dd==16:
print("it's 16 now")
#print(df)
#print("{} is delivered at {}".format(row[1].number,datetime.now()))
deleterows(n)
print("row: {} is deleted".format(n))
print(df)
print(df)
print(df2)
output:
row: 1833181080 is deleted
number com msg
1 1865585030 new this is 2nd test
2 11111111 new faul test
row: 1865585030 is deleted
number com msg
2 11111111 new faul test
row: 11111111 is deleted
Empty DataFrame
Columns: [number, com, msg]
Index: []
Empty DataFrame
Columns: [number, com, msg]
Index: []
number com msg
0 1833181080 new this is 1st test
1 1865585030 new this is 2nd test
2 11111111 new faul test
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rawson |
| Solution 2 | Bonomali |
