'Python: strip pair-wise column names
I have a DataFrame with columns that look like this:
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df:
(NYSE_close, close) (NYSE_close, open) (NYSE_close, volume) (NASDAQ_close, close) (NASDAQ_close, open) (NASDAQ_close, volume)
I want to remove everything after the underscore and append whatever comes after the comma to get the following:
df:
NYSE_close NYSE_open NYSE_volume NASDAQ_close NASDAQ_open NASDAQ_volume
I tried to strip the column name but it replaced it with nan. Any suggestions on how to do that?
Thank you in advance.
Solution 1:[1]
You could use re.sub to extract the appropriate parts of the column names to replace them with:
import re
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df.columns = [re.sub(r'\(([^_]+_)\w+, (\w+)\)', r'\1\2', c) for c in df.columns]
Output:
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
Solution 2:[2]
You could:
import re
def cvt_col(x):
s = re.sub('[()_,]', ' ', x).split()
return s[0] + '_' + s[2]
df.rename(columns = cvt_col)
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
Solution 3:[3]
Use a list comprehension, twice:
step1 = [ent.strip('()').split(',') for ent in df]
df.columns = ["_".join([left.split('_')[0], right.strip()])
for left, right in step1]
df
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | jch |
| Solution 3 | sammywemmy |
