'How to calculate median per group in big dataframe?
I imported the csv file from desktop and first calculate cube root transformation. I would like to calculate the median per group. The table looks like this (i.e. 1 character column + 149 numeric columns):
My code is as follows:
# Import data frame
df <- read.csv("C:/Users/yinc1/Desktop/test_R.csv", header = TRUE)
head(df)
# Perform cube root transformation
df_cube_root <- cbind(df[1], df[,2:149]^(1/3))
df_cube_root
# Take the median per group
library(dplyr)
df_cube_root %>%
group_by(Group) %>%
summarize(Med_per_group = median(as.numeric(df_cube_root)))
medium_per_group
When I run this chunk of code, it goes wrong and returns:
Column `Group` is not found
How should I change the code?
Please find below my data when running dput(df[1:20,1:5]):
structure(list(ï..Group = c("12AD_F", "12AD_F", "12AD_F", "12AD_F",
"12AD_F", "12AD_F", "12AD_F", "12AD_F", "12AD_M", "12AD_M", "12AD_M",
"12AD_M", "12AD_M", "12AD_M", "12AD_M", "12AD_M", "12AD_M", "12WT_F",
"12WT_F", "12WT_F"), ATG_PE16.0_22.5 = c(0.02084415, 0.170488266,
0.032702913, 0.040343933, 0.043272897, 0.051219846, 0.027884681,
0.064906247, 0.053067268, 0.077767998, 0.140123352, 0.080211375,
0.101552477, 0.112449923, 0.064881822, 0.090127597, 0.06552084,
0.054710809, 0.050431982, 0.0427724), Phosphatidylethanolamine..16.0_16.0. = c(0.03193568,
0.109490593, 0.043206657, 0.041405041, 0.057716584, 0.052915294,
0.035309818, 0.058013016, 0.041524004, 0.049855731, 0.089821153,
0.059233229, 0.093928705, 0.046509845, 0.04415071, 0.065380665,
0.057015153, 0.048773789, 0.045162392, 0.053227769), Phosphatidylethanolamine..16.0_16.1. = c(0.00468836,
0.048401312, 0.011536122, 0.007585562, 0.011125738, 0.01611854,
0.010694161, 0.014169938, 0.009804969, 0.013677251, 0.039440742,
0.011876313, 0.02945088, 0.022079965, 0.01218537, 0.011572354,
0.011805721, 0.016142917, 0.005502517, 0.007498949), Phosphatidylethanolamine..16.0_18.1. = c(0.094810122,
0.682954208, 0.17729218, 0.128228583, 0.232379304, 0.214266287,
0.176554235, 0.213738303, 0.133376481, 0.179952952, 0.591409132,
0.238990631, 0.46130994, 0.274109352, 0.189368586, 0.203585912,
0.231057661, 0.340129184, 0.173689226, 0.211021092)), row.names = c(NA,
20L), class = "data.frame")
Solution 1:[1]
I can see the issue in your dput(). The first column is not called Group, but rather ï..Group. There are some wonky extra encoding characters there.
Use rename() to fix it.
df <- df %>% rename("Group" = 1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ben Norris |

