'groupby columns in awk

Hello I'd like to convert a python script in awk, how to do a group by in columns from a data frame.

import pandas as pd
df = pd.read_csv("data.csv") 
res0 = df.groupby("genes").agg({'start':'count'}).reset_index()
res0

How to do this using awk or sh?



Solution 1:[1]

Without more details it's difficult to help you; does this solve your problem?

Minimal reproducible example:

cat test.csv
genes,timepoint,value
P53,1,3.1
P53,2,3.2
P53,3,4.5
P53,4,5.1
P53,5,6.6
TRIM43,1,44
TRIM43,2,50
TRIM43,3,55
TRIM43,4,60
TRIM43,5,67
GAPDH,1,0.1
GAPDH,2,0.1
GAPDH,3,0.1
GAPDH,4,0.1
GAPDH,5,0.1

Run the python script

cat test.py
#!/usr/bin/env python3

import pandas as pd
df = pd.read_csv("test.csv")
res0 = df.groupby("genes").agg({'value':'count'}).reset_index()
print(res0)

./test.py
    genes  value
0   GAPDH      5
1     P53      5
2  TRIM43      5

Replicate it with awk

awk 'BEGIN{FS=","; OFS="\t"}
     NR==1 {print "genes","value"}
     NR>1 {genes[$1]++}
     END {for (i in genes)
              print i, genes[i]
     }' test.csv
genes   value
GAPDH   5
TRIM43  5
P53     5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jared_mamrot