'How to create variables by searching chars (substring) in string variable.. with loop and if statement

'Hi, everyone. This variable has alphabetic and alphanumeric characters. If it has 'm' character is million and if it has 'Th.' is thousand.

df['mkt_value']

0       €15.00m
1        €1.00m
2       €100Th.
3        €3.00m
4        €900Th.
5        Free
 

'I pretend to i) identify if string variable is millions (m) or thousands (Th.) by creating a dummy variable. And then ii) use this dummy to get a new integer variable which millions be thousands'

#Desire output
df['mi']

0       15000
1        1000
2         100
3        3000
4         900
5         nan
 

'So, I first do a set up, then create a dummy with a loop and finally create a integer for the thousands:'

m = 'm'
th = 'Th'
dtype = {"money": "category"}
l_MKV = df['mkt_value'].tolist()
df['mi'] = df['mkt_value'].str.strip('mTh.€')

#loop for var dummy
for x in l_MKV:
    if m in x:
        df["money"]= 1
else:
       df["money"]= 0

# var integer for thousands: 1 million , 0 thousand
if df["money"] == 1:
        df["miles"] = int(df['mi']) * 100
        else:
                ALL['mi']

'The loop (for var dummy) is not working. I get:'

df["money"]

0       0
1       0
2       0
3       0
4       0

'And I get a syntax error for var integer without more specification

What I have missed?

Thanks for any help'.



Solution 1:[1]

The issue with your code is the way you try to modified a single row in a series. For example, df["money"] = 0 will actually set all rows to zero.

Rather than messing around with dummy columns, I would create a separate function for parsing the values and use DataFrame.apply():

def parse_amount(x):
    print(x[0])
    factor = {
        'm': 1000000,
        'Th': 1000
        }
    s = x['mkt_value'].strip('mTh.€')
    try:
        number = float(s)
        for f in factor.keys():
            if f in x['mkt_value']:
                number *= factor[f]
        return number           
    except:
        return 0

# import pandas as pd
# df = pd.DataFrame([{0:'€15.00m'},{0:'€1.00m'},{0:'€100Th.'},{0:'€3.00m'},{0:'€900Th'},{0:'Free'}])

df['money'] = df.apply(lambda x: parse_amount(x), axis = 1)
print(df)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1