'Plotting value from matching complete pairs in rows

I have data of drug concentrations (DV) that include a parent drug and its metabolite, coded DVID 1 and 2, respectively. They are provided to me as separate rows, identified by study (STUDYID), subject (ID), and the time after the dose (TAD). I have created with paste an index column that I hope is unique to each study+ID+time. I would like to generate a data frame to plot the concentrations of the parent drug and metabolite against one another:

Index  Drug   Metabolite

Here are a few lines of data. Note that there is not a match for the drug and metabolite at time 0.5 and 1, so I would like to exclude these unpaired records. dcast wants an aggregation function - I just want to match up the two concentrations from the same sample. Also, can this be done without the concatenated Index column?

   STUDYID ID     TAD       DV DVID         Index
1        2  1 0.50000  0.80871    1     2SUBJ10.5
2        2  1 1.00000  2.53052    1       2SUBJ11
3        2  1 1.50000  2.64830    1     2SUBJ11.5
4        2  1 1.50000 -0.94495    2     2SUBJ11.5
5        2  1 2.01667  2.74019    1 2SUBJ12.01667
6        2  1 2.01667 -0.69375    2 2SUBJ12.01667
7        2  1 2.53333  2.91831    1 2SUBJ12.53333
8        2  1 2.53333 -0.30612    2 2SUBJ12.53333
9        2  1 3.00000  2.88926    1       2SUBJ13
10       2  1 3.00000 -0.12863    2       2SUBJ13
11       2  1 4.00000  2.78932    1       2SUBJ14
12       2  1 4.00000  0.01882    2       2SUBJ14


Solution 1:[1]

You can first remove the rows that are unpaired, then we can plot using ggplot.

library(tidyverse)

df %>% 
  group_by(STUDYID, ID, TAD) %>% 
  filter(n() > 1) %>% 
  ggplot() +
  geom_line(aes(x = TAD, y = DV, group = factor(DVID), color = factor(DVID)))

enter image description here

Alternatively, if you were wanting to plot the values of the pairs, then you could do something like this:

output <- df %>% 
  group_by(STUDYID, ID, TAD) %>% 
  filter(n() > 1) %>% 
  mutate(row = as.integer(gl(n(), 2, n()))) %>%
  pivot_wider(names_from = "DVID", values_from = "DV") %>% 
  rename("Drug" = `1`, "Metabolite" = `2`) %>% 
  select(-row)

ggplot(output) +
  geom_point(aes(x = Drug, y = Metabolite, color = factor(TAD)))

enter image description here

Data

df <- structure(list(STUDYID = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), TAD = c(0.5, 1, 1.5, 1.5, 2.01667, 2.01667, 2.53333, 2.53333, 
3, 3, 4, 4), DV = c(0.80871, 2.53052, 2.6483, -0.94495, 2.74019, 
-0.69375, 2.91831, -0.30612, 2.88926, -0.12863, 2.78932, 0.01882
), DVID = c(1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), 
    Index = c("2SUBJ10.5", "2SUBJ11", "2SUBJ11.5", "2SUBJ11.5", 
    "2SUBJ12.01667", "2SUBJ12.01667", "2SUBJ12.53333", "2SUBJ12.53333", 
    "2SUBJ13", "2SUBJ13", "2SUBJ14", "2SUBJ14")), class = "data.frame", row.names = c(NA, 
-12L))

Solution 2:[2]

Not really sure what you want to dcast, you could simply plot the data directly.

plot(DV ~ TAD, df1, type='n', main='My Plot')  ## initialize
lines(DV ~ TAD, df1[df1$DVID == 1, ], type='b', pch=16, col=2)
lines(DV ~ TAD, df1[df1$DVID == 2, ], type='b', pch=17, col=4)
legend('bottomleft', legend=1:2, title='DVID', col=c(2, 4), pch=16:17)

enter image description here


Data:

df1 <- structure(list(STUDYID = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
), ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), TAD = c(0.5, 1, 
1.5, 1.5, 2.01667, 2.01667, 2.53333, 2.53333, 3, 3, 4, 4), DV = c(0.80871, 
2.53052, 2.6483, -0.94495, 2.74019, -0.69375, 2.91831, -0.30612, 
2.88926, -0.12863, 2.78932, 0.01882), DVID = c(1, 1, 1, 2, 1, 
2, 1, 2, 1, 2, 1, 2), Index = c(" 2SUBJ10.5", "   2SUBJ11", " 2SUBJ11.5", 
"  2SUBJ11.5", "2SUBJ12.01667", "2SUBJ12.01667", "2SUBJ12.53333", 
"2SUBJ12.53333", "  2SUBJ13", "  2SUBJ13", " 2SUBJ14", " 2SUBJ14"
)), class = "data.frame", row.names = c(NA, -12L))

Solution 3:[3]

@AndrewGB: Using a bigger hammer, I created a new column with a unique identifier ("Index" in original post). I then used your code to remove the unmatched concentrations, and then used lag to look for the previous instance of the unique Index, since sample times, ID, TAD were all not unique through the entire data set. It generated NAs that I removed to yield a data.table that I could plot It works, but the need to create the Index column is cludgy. I figure there is a better way.

df$Index <- paste0(df$STUDYID, df$REGIMEN,"SUBJ", df$ID, df$TAD)
PairedData <- df  %>% 
group_by(STUDYID, REGIMEN, ID, TAD) %>%
filter(n() >1) 
PairedData.dt <- data.table(PairedData)
PairedData.dt [, lag := c(NA, DV[-.N]), by = Index]
PairedData.paired <- na.omit(PairedData.dt)
rename(PairedData.paired, c("Metabolite"="DV" , "Drug"="lag"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 jay.sf
Solution 3 PHutson