'Compare values present in two data frames with the usage of sliding window function in R
We have two data frames
Data frame 1
sl no. Segment_name Segment
1 Segment1 AACG
2 Segment2 ACTG
3 Segment3 GTCA
Data frame 2
sl no. Dinucleotides Free energy Values
1 AA -1.0
2 AC -1.76
3 CG -1.5
4 CT -1.23
5 TG -1.67
6 GT -1.82
7 TC -1.43
8 CA -1.98
We want to compare the column 'Segment' of Data frame 1 and the column 'Free energy Values' of Data frame 2. Comparison of particular segment with the given free energy values (through a sliding window algorithm i.e. AA, AC, CG respectively for segment1=AACG) would give us the value of -4.26 for the sum of the nucleotides AA,AC,CG respectively of the segment1. We want to repeat the the same for the rest of the segments and store the summation of free energy values in a separate column in the data frame 1 as
sl no. Segment_name Segment Free energy
1 Segment1 AACG -4.26
2 Segment2 ACTG -4.66
3 Segment3 GTCA -5.23
Solution 1:[1]
I used my own sample data (see bottom), since columnnames with spaces in them are a pain in the ass to work with.
The [] at the end of each line are to show you the in-between-lines results. You can omit them in your production code.
library(data.table)
# set to data.table format
setDT(df1); setDT(df2)
# cut Segment into two parts
dt1[, c("from", "to") := tstrsplit(Segment, "(?<=..)(?=..)", perl = TRUE)][]
# find index
dt1[dt2, from.sl := i.sl, on = .(from = Dinucleotides)][]
dt1[dt2, to.sl := i.sl, on = .(to = Dinucleotides)][]
# now, sum
setkey(dt1, sl)
dt1[dt1, Free_energy := sum(dt2[i.from.sl:i.to.sl, FEV]), by = .EACHI][]
# drop temp columns
dt1[, `:=`(from = NULL, to = NULL, from.sl = NULL, to.sl = NULL)][]
# sl Segment_name Segment Free_energy
# 1: 1 Segment1 AACG -4.26
# 2: 2 Segment2 ACTG -6.16
# 3: 3 Segment3 GTCA -5.23
#sample data
df1 <- fread("sl Segment_name Segment
1 Segment1 AACG
2 Segment2 ACTG
3 Segment3 GTCA")
df2 <- fread("sl Dinucleotides FEV
1 AA -1.0
2 AC -1.76
3 CG -1.5
4 CT -1.23
5 TG -1.67
6 GT -1.82
7 TC -1.43
8 CA -1.98")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
