'Split dataframe colum by content

How can I separate this data column by 'A','B' ...? The first column as an index must be retained.

df = pd.DataFrame(data)
df = df[['seconds', 'marker', 'data1', 'data2', 'data3']]

seconds,marker,data1,data2,data3
00001,A,3,3,0,42,0
00002,B,3,3,0,34556,0
00003,C,3,3,0,42,0
00004,A,3,3,0,1833,0
00004,B,3,3,0,6569,0
00005,C,3,3,0,2454,0
00006,C,3,3,0,3256,0
00007,C,3,3,0,5423,0
00008,A,3,3,0,569,0


Solution 1:[1]

You can just get the unique values in the letter column (that's what I called it). And then filter the DataFrame containing all values using these unique values.

I am storing the newly created DataFrames in a dictionary here, but you could also store them in a list or whatever. I've used the input you have provided but have given the first 2 columns the names index and letter as they were unnamed in your .csv.

import pandas as pd

df = pd.DataFrame({
    'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8},
    'letter': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'B', 5: 'C', 6: 'C', 7: 'C', 8: 'A'},
    'seconds': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3},
    'marker': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3},
    'data1': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0},
    'data2': {0: 42, 1: 34556, 2: 42, 3: 1833, 4: 6569, 5: 2454, 6: 3256, 7: 5423, 8: 569},
    'data3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0}
})

# get unique values
unique_values = df["letter"].unique()
# filter "big" dataframe using one of the unique value at a time
split_dfs = {value: df[df["letter"] == value] for value in unique_values}

print(split_dfs["A"])
print(split_dfs["B"])
print(split_dfs["C"])

Expected output:

   index letter  seconds  marker  data1  data2  data3
0      1      A        3       3      0     42      0
3      4      A        3       3      0   1833      0
8      8      A        3       3      0    569      0
   index letter  seconds  marker  data1  data2  data3
1      2      B        3       3      0  34556      0
4      4      B        3       3      0   6569      0
   index letter  seconds  marker  data1  data2  data3
2      3      C        3       3      0     42      0
5      5      C        3       3      0   2454      0
6      6      C        3       3      0   3256      0
7      7      C        3       3      0   5423      0

As you can see the index is preserved.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1