'Averaging control group conditions into new output column. Output column contains the associated control average values

Not new but still a beginner to R.

This is a snippet of my data (numbers randomised so the Average columns are not correct in this example).

> head(data)
                  1          2          3   Average   HKAverage        dC
Neg CNTRL        NA         NA         NA        NA          NA        NA
NEG CNTRL        NA         NA         NA        NA  0.80393767        NA
POS CNTRL 0.1836139 0.11392904 0.02925255 0.1089318  0.72559250 0.6165367
WT 1      0.5091585 0.15929057 0.51686195 0.3951037  0.26582395 0.5877941
WT 2      0.1924527 0.05267426 0.77929719 0.3414747  0.48798007 0.2600975
WT AA 1   0.2525962 0.97503047 0.62913683 0.6189212  0.03930599 0.9048247
> tail(data)
                  1         2         3   Average   HKAverage         dC
T AB 4    0.3425330 0.1698632 0.3100509 0.2741490   0.2312321 0.39589730
T C 1     0.8170886 0.8202081 0.1487331 0.5953433   0.1268834 0.99938496
T C 2     0.4374555 0.1926919 0.2847973 0.3049816   0.8647057 0.00970199
T C 3     0.3194017 0.2683773 0.8150882 0.4676224   0.8750478 0.73646663
T C 4     0.1091098 0.1547485 0.9696392 0.4111658   0.9897441 0.18335950
Pos CNTRL        NA        NA        NA        NA          NA         NA

I'm doing some calculations with these values and the outputs are generated as a new column. I'm running this before running any calculations:

data <- as.data.frame(input.data)
data[data == "Undetermined"] <- NA
data[] <- sapply(data, as.numeric)

Ignoring the 4 CNTRL rows (I should probably just remove them then!) there are WT... and T... for the same conditions. These conditions are repeated 2 or 4 times (hence WT 1, WT 2, T 1, T 2, etc.). I want to make a new column that contains the average of a WT condition. In the rows for the T conditions I want the same WT averages to show up there.

This would be an example of my output: (Av meaning average)

> head(newdata)
          X        X1         X2         X3   Average  HKAverage        dC ControlAv
1 Neg CNTRL        NA         NA         NA        NA         NA        NA        NA
2 NEG CNTRL        NA         NA         NA        NA 0.80393767        NA        NA
3 POS CNTRL 0.1836139 0.11392904 0.02925255 0.1089318 0.72559250 0.6165367        NA
4      WT 1 0.5091585 0.15929057 0.51686195 0.3951037 0.26582395 0.5877941   WT1:2Av
5      WT 2 0.1924527 0.05267426 0.77929719 0.3414747 0.48798007 0.2600975   WT1:2Av
6   WT AA 1 0.2525962 0.97503047 0.62913683 0.6189212 0.03930599 0.9048247 WTAA1:4Av
> tail(newdata)
        X        X1        X2          X3   Average HKAverage        dC ControlAv
10   T V1 0.4568928 0.5566606 0.610042142 0.5411985 0.8372219 0.9200497   WT1:2Av
11   T V2 0.8633715 0.3191596 0.483468638 0.5553332 0.8860817 0.9486309   WT1:2Av
12 T AA 1 0.1587924 0.2986826 0.005692643 0.1543892 0.1064064 0.7750263 WTAA1:4Av
13 T AA 2 0.3665066 0.9289861 0.143083833 0.4795255 0.4543861 0.9992564 WTAA1:4Av
14 T AA 3 0.5580805 0.4041877 0.411612593 0.4579603 0.8457465 0.9380688 WTAA1:4Av
15 T AA 4 0.8149501 0.1642240 0.229479382 0.4028845 0.7638992 0.6026836 WTAA1:4Av

I'm currently trying to use the within() function but not finding success:

> data$wt.av <- within(data, mean(dC["WT 1" & "WT 2"]))
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'mean': operations are possible only for numeric, logical or complex types

My dataframe is numeric but the rownames are obviously not.

aggregate() doesn't work in this case because the rownames do not match



Solution 1:[1]

I've done this using a very long winded method that is not very elegant. But it's done!

I took apart the data frame...

dat.V <- data[4:5, ]

for each control group. Then used cbind to put these averages onto the non-control group data frames. Then used bind_rows to bring all of this back together into one data frame.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 geom