'Why is the summarySE function not recognizing my column?

I am attempting to run a for loop that creates a bar plot for each genotype in my dataset, and I am using the summarySE function (http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/) to generate summary stats for my plots.

setwd("C:/Users/aundr/Documents/Salk/Asahina Lab/2022_GSAposter")

#Load in dataset
allData <- read.csv("SimpleNewASMantiTdc2CellCount.csv")

#Convert Genotype column from chr to factor format
allData <- data.frame(Expt = allData$Expt,
                      Genotype = as.factor(allData$Genotype),
                      Slide.position = allData$Slide.position,
                      Region = allData$Region,
                      Labeling = allData$Labeling,
                      Total = allData$Total)

str(allData)

library("plyr")
library("ggplot2")
library("EnvStats")
library("tidyverse")

#Make list of genotype factor levels
Genotype_list <- list(levels(allData$Genotype))

## Gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
##   data: a data frame.
##   measurevar: the name of a column that contains the variable to be summarized
##   groupvars: a vector containing names of columns that contain grouping variables
##   na.rm: a boolean that indicates whether to ignore NA's
##   conf.interval: the percent range of the confidence interval (default is 95%)

##   My notes: see http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_%28ggplot2%29/
##   I also added max

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
  #library(plyr)
  
  # New version of length which can handle NA's: if na.rm==T, don't count them
  length2 <- function (x, na.rm=FALSE) {
    if (na.rm) sum(!is.na(x))
    else       length(x)
  }
  
  # This does the summary. For each group's data frame, return a vector with
  # N, mean, and sd
  datac <- ddply(data, groupvars, .drop=.drop,
                 .fun = function(xx, col) {
                   c(N    = length2(xx[[col]], na.rm=na.rm),
                     mean = mean   (xx[[col]], na.rm=na.rm),
                     sd   = sd     (xx[[col]], na.rm=na.rm),
                     maximum=max   (xx[[col]], na.rm=na.rm)
                   )
                 },
                 measurevar
  )
  
  # Rename the "mean" column    
  datac <- rename(datac, c("mean" = measurevar))
  
  datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean
  
  # Confidence interval multiplier for standard error
  # Calculate t-statistic for confidence interval: 
  # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
  ciMult <- qt(conf.interval/2 + .5, datac$N-1)
  datac$ci <- datac$se * ciMult
  
  return(datac)
}

#duplicating and factoring regions into order of importance/location in brain
allData1 <- allData
allData1$Region <-factor(allData1$Region,
                   levels = c("ASM","AL1","AL2","VL","VM","Other"))

#for loop generating plots for each genotype 
for(i in Genotype_list) {
  
#calculating mean, se, max
allDataStat <- summarySE(allData1, measurevar="Total", groupvars=c("Labeling","Region"))
allDataStat

#calculating tick marks and max limit

halfMax <- max(allDataStat$maximum)/2
halfMax

evenMaxTick <- 2*ceiling(halfMax)
evenMaxTick

halfMaxTick <- evenMaxTick/2
halfMaxTick

tickMarks <- c(0,halfMaxTick, evenMaxTick)
tickMarks

maxLimit <- evenMaxTick + 0.5
maxLimit

#bar plot
print(ggplot(NULL, aes(Region, Total, fill= Labeling, color = Labeling)) +
  geom_bar(allDataStat, mapping = aes(), position = "dodge", stat = "summary") +  
    scale_fill_manual(values = c("#248491","#C8503B")) + scale_colour_manual(values = c("#248491","#C8503B")) +
      geom_errorbar(allDataStat, mapping = aes(ymin= Total, ymax= Total+se), width=.3, size = .6) +
        theme(axis.ticks.x = element_blank(),panel.background = element_blank(),
          panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
            panel.border = element_blank(), axis.title = element_text(size = 12),
              axis.text = element_text(size = 9)) + labs(x = "Region", y = "Cell number per brain") +
                scale_y_continuous(breaks = c(tickMarks), limits = c(0,maxLimit)) +
                  annotate(geom = "segment",x = 0, xend = 0, y = 0, yend = evenMaxTick, size = .75))

#end of for loop
}

I have used the summarySE function previously to generate a similar plot and did not run into any issues, but am now getting the following error:

Error in stop_subscript(): ! Can't rename columns that don't exist. x Column Total doesn't exist.

It looks like the function is getting caught up on measurevar = "Total" in allDataStat <- summarySE(allData1, measurevar="Total", groupvars=c("Labeling","Region")) allDataStat and not recognizing my 'Total' column, even though it is there:

str(allData)
'data.frame':   300 obs. of  6 variables:
 $ Expt          : chr  "anti Tdc2 SG4 (sim ASM) expt 3" "anti Tdc2 SG4 (sim ASM) expt 3" "anti Tdc2 SG4 (sim ASM) expt 3" "anti Tdc2 SG4 (sim ASM) expt 3" ...
 $ Genotype      : Factor w/ 11 levels "ALK 2-07 (m2p1)",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ Slide.position: chr  "L" "L" "L" "L" ...
 $ Region        : chr  "ASM" "VL" "AL1" "AL2" ...
 $ Labeling      : chr  "Tdc2+ SG4+" "Tdc2+ SG4+" "Tdc2+ SG4+" "Tdc2+ SG4+" ...
 $ Total         : int  9 0 1 0 0 8 8 0 2 0 ...

After diving into the error tracing, it looks like the problem is coming from datac <- rename(datac, c("mean" = measurevar)). I don't know why this was not an issue previously and now is, so any help would be much appreciated!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source