'Pandas - replace multiple characters from rows in a datframe
I have addresses stored in "address" column in a store dataframe, I would like to create a new column with the following corrections on existing addresses:
{"ST": "STREET",
"RD": "ROAD",
"AVE": "AVENUE",
"N": "NORTH",
"W": "WEST",
"S": "SOUTH",
"E": "EAST",
"STE": "SUITE",
"HWY": "HIGHWAY",
"DR": "DRIVE",
"NW": "NORTH WEST",
"NE": "NORTH EAST",
"SW": "SOUTH WEST",
"SE": "SOUTH EAST",
"LN": "LANE",
"WAY": "WAY"}
How should I move forward this?
Expected output:
101 ST LN -> 101 STREET LANE
Here is the R code to the same:
terms <- c("W","WEST","E","EAST","N","NORTH","S","SOUTH")
terms <- split(terms,rep(1:2,times = length(terms) / 2))
terms[[1]] <- paste0("\\b",terms[[1]],"(\\.|\\b|\\,)")
terms[[1]]
stri_replace_all_regex(data$address,pattern = terms[[1]], replacement = terms[[2]],vectorize_all = FALSE)
Solution 1:[1]
Late to the party, I was using the same mapping dictionary and used the following function. I do not like regex since it is difficult to make business folks understand.
add_abv is the dictionary you mentioned above
def add_abv_rep(string):
str_ex = []
for i in string_ex.split(' '):
if i in add_abv:
str_ex.append(add_abv[i])
else:
str_ex.append(i)
return ' '.join(y.upper() for y in str_ex)
Finally, apply this function to the series.
df['Address'] = df['Address'].apply(lambda x: add_abv_rep(x))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Akshay Gupta |
