'Averaging control group conditions into new output column. Output column contains the associated control average values
Not new but still a beginner to R.
This is a snippet of my data (numbers randomised so the Average columns are not correct in this example).
> head(data)
1 2 3 Average HKAverage dC
Neg CNTRL NA NA NA NA NA NA
NEG CNTRL NA NA NA NA 0.80393767 NA
POS CNTRL 0.1836139 0.11392904 0.02925255 0.1089318 0.72559250 0.6165367
WT 1 0.5091585 0.15929057 0.51686195 0.3951037 0.26582395 0.5877941
WT 2 0.1924527 0.05267426 0.77929719 0.3414747 0.48798007 0.2600975
WT AA 1 0.2525962 0.97503047 0.62913683 0.6189212 0.03930599 0.9048247
> tail(data)
1 2 3 Average HKAverage dC
T AB 4 0.3425330 0.1698632 0.3100509 0.2741490 0.2312321 0.39589730
T C 1 0.8170886 0.8202081 0.1487331 0.5953433 0.1268834 0.99938496
T C 2 0.4374555 0.1926919 0.2847973 0.3049816 0.8647057 0.00970199
T C 3 0.3194017 0.2683773 0.8150882 0.4676224 0.8750478 0.73646663
T C 4 0.1091098 0.1547485 0.9696392 0.4111658 0.9897441 0.18335950
Pos CNTRL NA NA NA NA NA NA
I'm doing some calculations with these values and the outputs are generated as a new column. I'm running this before running any calculations:
data <- as.data.frame(input.data)
data[data == "Undetermined"] <- NA
data[] <- sapply(data, as.numeric)
Ignoring the 4 CNTRL rows (I should probably just remove them then!) there are WT... and T... for the same conditions. These conditions are repeated 2 or 4 times (hence WT 1, WT 2, T 1, T 2, etc.). I want to make a new column that contains the average of a WT condition. In the rows for the T conditions I want the same WT averages to show up there.
This would be an example of my output: (Av meaning average)
> head(newdata)
X X1 X2 X3 Average HKAverage dC ControlAv
1 Neg CNTRL NA NA NA NA NA NA NA
2 NEG CNTRL NA NA NA NA 0.80393767 NA NA
3 POS CNTRL 0.1836139 0.11392904 0.02925255 0.1089318 0.72559250 0.6165367 NA
4 WT 1 0.5091585 0.15929057 0.51686195 0.3951037 0.26582395 0.5877941 WT1:2Av
5 WT 2 0.1924527 0.05267426 0.77929719 0.3414747 0.48798007 0.2600975 WT1:2Av
6 WT AA 1 0.2525962 0.97503047 0.62913683 0.6189212 0.03930599 0.9048247 WTAA1:4Av
> tail(newdata)
X X1 X2 X3 Average HKAverage dC ControlAv
10 T V1 0.4568928 0.5566606 0.610042142 0.5411985 0.8372219 0.9200497 WT1:2Av
11 T V2 0.8633715 0.3191596 0.483468638 0.5553332 0.8860817 0.9486309 WT1:2Av
12 T AA 1 0.1587924 0.2986826 0.005692643 0.1543892 0.1064064 0.7750263 WTAA1:4Av
13 T AA 2 0.3665066 0.9289861 0.143083833 0.4795255 0.4543861 0.9992564 WTAA1:4Av
14 T AA 3 0.5580805 0.4041877 0.411612593 0.4579603 0.8457465 0.9380688 WTAA1:4Av
15 T AA 4 0.8149501 0.1642240 0.229479382 0.4028845 0.7638992 0.6026836 WTAA1:4Av
I'm currently trying to use the within() function but not finding success:
> data$wt.av <- within(data, mean(dC["WT 1" & "WT 2"]))
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'mean': operations are possible only for numeric, logical or complex types
My dataframe is numeric but the rownames are obviously not.
aggregate() doesn't work in this case because the rownames do not match
Solution 1:[1]
I've done this using a very long winded method that is not very elegant. But it's done!
I took apart the data frame...
dat.V <- data[4:5, ]
for each control group. Then used cbind to put these averages onto the non-control group data frames. Then used bind_rows to bring all of this back together into one data frame.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | geom |
