'How to write a for-loop/if-statement for a dataframe (integer) column
I have a dataframe with a column of integers that symbolise birthyears. Each row has 20xx or 19xx in it but some rows have only the xx part.
What I wanna do is add 19 in front of those numbers with only 2 "elemets" if the integer is bigger than 22(starting from 0), or/and add 20 infront of those that are smaller or equal to 22.
This is what I wrote;
for x in DF.loc[DF["Year"] >= 2022]:
x + 1900
if:
x >= 22
else:
x + 2000
You can also change the code completely, I would just like you to maybe explain what exactly your code does.
Thanks for everybody who takes time to answer this.
Solution 1:[1]
Instead of iterating through the rows, use where to change the whole column:
y = df["Year"] # just to save typing
df["Year"] = y.where(y > 99, (y + 1900).where(y > 22, y + 2000))
or indexing:
df["Year"][df["Year"].between(0, 21)] += 2000
df["Year"][df["Year"].between(22, 99)] += 1900
or loc:
df.loc[df["Year"].between(0, 21), "Year"] += 2000
df.loc[df["Year"].between(22, 99), "Year"] += 1900
Solution 2:[2]
You can do it in one line with the apply method.
Example:
df = pd.DataFrame({'date': [2002, 95, 1998, 3, 56, 1947]})
print(df)
date
0 2002
1 95
2 1998
3 3
4 56
5 1947
Then:
df = df.date.apply(lambda x: x+1900 if (x<100) & (x>22) else (x+2000 if (x<100)&(x<22) else x) )
print(df)
date
0 2002
1 1995
2 1998
3 2003
4 1956
5 1947
Solution 3:[3]
It is basically what you did, an if inside a for:
new_list_of_years = []
for year in DF.loc[DF["Year"]:
full_year = year+1900 if year >22 else year+2000
new_list_of_years.append(full_year)
DF['Year'] = pd.DataFrame(new_list_of_years)
Edit: You can do that with for-if list comprehension also:
DF['Year'] = [year+1900 if year > 22 else year+2000 for year in DF.loc[DF["Year"]]]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | gioarma |
| Solution 3 |
