'How to return columns with with similar names in a dataframe using pandas

Say I have a table that looks something like:

+----------------------------------+-------------------------------------+----------------------------------+
| ExperienceModifier|ApplicationId | ExperienceModifier|RatingModifierId | ExperienceModifier|ActionResults |
+----------------------------------+-------------------------------------+----------------------------------+
|                                  |                                     |                                  |
+----------------------------------+-------------------------------------+----------------------------------+ 

I would like to grab all of the columns that lead with 'ExperienceModifier' and stuff the results of that into its own dataframe. How would I accomplish this with pandas?



Solution 1:[1]

You can try pandas.DataFrame.filter

df.filter(like='ExperienceModifier')

If you want to get columns only contains ExperienceModifier at the beginning.

df.filter(regex='^ExperienceModifier')

Solution 2:[2]

Ynjxsjmh's answer will get all columns that contain "ExperienceModifier". If you literally want columns that start with that string, rather merely contain it, you can do new_df = df[[col for col in df.columns if col[:18] == 'ExperienceModifier']]. If all of the desired columns have | after "ExperienceModifier", you could also do new_df = df[[col for col in df.columns if col.split('|')[0] == 'ExperienceModifier']]. All of these will create a view of the dataframe. If you want a completely separate dataframe, you should copy it, like this: new_df = df[[col for col in df.columns if col.split('|')[0] == 'ExperienceModifier']].copy(). You also might want to create a multi-index by splitting the column names on | rather than creating a separate dataframe.

Solution 3:[3]

The accepted answer does easly the job but I still attach my "hand made version" that works:

import pandas as pd
import numpy as np
import re

lst = [[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4]]
column_names = [['ExperienceModifier|ApplicationId', 'ExperienceModifier|RatingModifierId', 'ExperienceModifier|ActionResults','OtherName|ActionResults']] 

data = pd.DataFrame(lst, columns = column_names) 
data


old_and_dropped_dataframes = []
new_dataframes=[]    
for i in np.arange(0,len(column_names[0])):
    column_names[0][i].split("|")
    splits=re.findall(r"[\w']+", column_names[0][i])
    if "ExperienceModifier" in splits:
        new_dataframe = data.iloc[:,[i]]
        new_dataframes.append(new_dataframe)
    else:
        old_and_dropped_dataframe =  data.iloc[:,[i]]
        old_and_dropped_dataframes.append(old_and_dropped_dataframe)



ExperienceModifier_dataframe = pd.concat(new_dataframes,axis=1)
ExperienceModifier_dataframe

OtherNames_dataframe = pd.concat(old_and_dropped_dataframes,axis=1)
OtherNames_dataframe

This script creates two new dataframes starting from the initial dataframe: one that contains the columns whose names start with ExperienceModifier and an other one that contains the columns that do not start with ExperienceModifier.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Acccumulation
Solution 3 Student.py