'Sum all datas in those rows with matching keywords in pandas dataframe using str.contains

Find the 'awarded_amt' by keywords in each of the keywords listed.

The original 'awarded_amt' (column name) has more than 20000 numeric datas. There is another column 'description' in the dataframe has data type as strings. Now we need to find the keywords from 'description', sum up all the rows on 'awarded_amt' with the same keywords found in 'description'. I am tasked to use str.contains to get the solution. Please help!

keywords_list = ['insurance','aircon','food','event management','transport','furniture']

Output (in DataFrame format):

Output

My wrong coding:

keywords_ser = pd.Series(['insurance','aircon','food','event management','transport','furniture'])
keywords_df = pd.DataFrame(keywords_ser,columns=['keywords'])
keywords_list = ['insurance','aircon','food','event management','transport','furniture']

for keyword in keywords_list:
    pd.options.display.float_format = '{:,.0f}'.format 
    bool_finding = gebiz_df['tender_description'].str.contains(keyword)
    total_df = gebiz_df[['awarded_amt']][bool_finding].sum()
    df = pd.concat([keywords_df,total_df]) 
    display(df)  

Wrong output: enter image description here

Amended coding (total value is correct now but output not right):

keywords_ser = pd.Series(['insurance','aircon','food','event management','transport','furniture'])
keywords_df = pd.DataFrame(keywords_ser,columns=['keywords'])
keywords_list = ['insurance','aircon','food','event management','transport','furniture']

for keyword in keywords_list:
    pd.options.display.float_format = '{:,.0f}'.format
    sum_ser = gebiz_df['awarded_amt'][gebiz_df['tender_description'].str.lower().str.contains(keyword)]
    sum_df = pd.DataFrame(sum_ser,columns=['awarded_amt'])
    df1 = pd.DataFrame(sum_df.sum(),columns=['awarded_amt'])
    df2 = pd.concat([keywords_df,df1]) 
    display(df2)

Revised Output (instead of 1 dataframetable, it yielded 6 dataframes: enter image description here

Finally, I managed to crack the code!

pd.options.display.float_format = '{:,.0f}'.format
keywords_list = ['insurance','aircon','food','event management','transport','furniture']
keywords_df = pd.DataFrame(keywords_list,columns=['keywords'])
sum_list = []

for keyword in keywords_list:
    sum_ser = gebiz_df['awarded_amt'][gebiz_df['tender_description'].str.lower().str.contains(keyword)].sum()
    sum_list.append(sum_ser)
    
sum_df = pd.DataFrame(sum_list, columns=['awarded_amt'] )
final_df = pd.concat([keywords_df,sum_df], axis=1) 
display(final_df)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source