'Sum all datas in those rows with matching keywords in pandas dataframe using str.contains
Find the 'awarded_amt' by keywords in each of the keywords listed.
The original 'awarded_amt' (column name) has more than 20000 numeric datas. There is another column 'description' in the dataframe has data type as strings. Now we need to find the keywords from 'description', sum up all the rows on 'awarded_amt' with the same keywords found in 'description'. I am tasked to use str.contains to get the solution. Please help!
keywords_list = ['insurance','aircon','food','event management','transport','furniture']
Output (in DataFrame format):

My wrong coding:
keywords_ser = pd.Series(['insurance','aircon','food','event management','transport','furniture'])
keywords_df = pd.DataFrame(keywords_ser,columns=['keywords'])
keywords_list = ['insurance','aircon','food','event management','transport','furniture']
for keyword in keywords_list:
pd.options.display.float_format = '{:,.0f}'.format
bool_finding = gebiz_df['tender_description'].str.contains(keyword)
total_df = gebiz_df[['awarded_amt']][bool_finding].sum()
df = pd.concat([keywords_df,total_df])
display(df)
Wrong output: enter image description here
Amended coding (total value is correct now but output not right):
keywords_ser = pd.Series(['insurance','aircon','food','event management','transport','furniture'])
keywords_df = pd.DataFrame(keywords_ser,columns=['keywords'])
keywords_list = ['insurance','aircon','food','event management','transport','furniture']
for keyword in keywords_list:
pd.options.display.float_format = '{:,.0f}'.format
sum_ser = gebiz_df['awarded_amt'][gebiz_df['tender_description'].str.lower().str.contains(keyword)]
sum_df = pd.DataFrame(sum_ser,columns=['awarded_amt'])
df1 = pd.DataFrame(sum_df.sum(),columns=['awarded_amt'])
df2 = pd.concat([keywords_df,df1])
display(df2)
Revised Output (instead of 1 dataframetable, it yielded 6 dataframes:

Finally, I managed to crack the code!
pd.options.display.float_format = '{:,.0f}'.format
keywords_list = ['insurance','aircon','food','event management','transport','furniture']
keywords_df = pd.DataFrame(keywords_list,columns=['keywords'])
sum_list = []
for keyword in keywords_list:
sum_ser = gebiz_df['awarded_amt'][gebiz_df['tender_description'].str.lower().str.contains(keyword)].sum()
sum_list.append(sum_ser)
sum_df = pd.DataFrame(sum_list, columns=['awarded_amt'] )
final_df = pd.concat([keywords_df,sum_df], axis=1)
display(final_df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
