'groupby columns in awk
Hello I'd like to convert a python script in awk, how to do a group by in columns from a data frame.
import pandas as pd
df = pd.read_csv("data.csv")
res0 = df.groupby("genes").agg({'start':'count'}).reset_index()
res0
How to do this using awk or sh?
Solution 1:[1]
Without more details it's difficult to help you; does this solve your problem?
Minimal reproducible example:
cat test.csv
genes,timepoint,value
P53,1,3.1
P53,2,3.2
P53,3,4.5
P53,4,5.1
P53,5,6.6
TRIM43,1,44
TRIM43,2,50
TRIM43,3,55
TRIM43,4,60
TRIM43,5,67
GAPDH,1,0.1
GAPDH,2,0.1
GAPDH,3,0.1
GAPDH,4,0.1
GAPDH,5,0.1
Run the python script
cat test.py
#!/usr/bin/env python3
import pandas as pd
df = pd.read_csv("test.csv")
res0 = df.groupby("genes").agg({'value':'count'}).reset_index()
print(res0)
./test.py
genes value
0 GAPDH 5
1 P53 5
2 TRIM43 5
Replicate it with awk
awk 'BEGIN{FS=","; OFS="\t"}
NR==1 {print "genes","value"}
NR>1 {genes[$1]++}
END {for (i in genes)
print i, genes[i]
}' test.csv
genes value
GAPDH 5
TRIM43 5
P53 5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jared_mamrot |
