'Pivot table in R: create a new data frame and populate it from an existing data frame using logical statements
Rank newbie, I appreciate any pointers. I need to construct an "encounter history" for input to program MARK. It must be a text file that looks like 150 rows of this:
/*aaam*/ 100101111101000000000 0 1 1;
/*ayym*/ 100000001101000000000 0 1 1;
/*bbbm*/ 100000000001111111110 1 0 1;
the four letter codes at left are individual identifiers for birds in the study. The string of zeros and ones in the middle is the critical part: a 1 indicates that bird was encountered in a given time period, a 0 says it wasn't encountered. The trailing "0 1 1;" or "1 0 1;" mean "female" or "male" respectively, and the spaces and semicolon, slashes and asterisks are necessary. Translated into English, the top line says "the bird called aaam is a female who was encountered in the first time interval (March to June 2018), not encountered the second time interval (July to October 2018), not encountered in the third time interval (November 2018 to February 2019), and so on for 21 total time intervals.
I have an existing data frame called "resightings" roughly 5000 lines and 24 columns, looks like this (many columns omitted), and I need to extract the encounter histories from it:
| BIRD | YEAR | MONTH | DAY | SURVEY_TYPE | OUTCOME |
|---|---|---|---|---|---|
| aaam | 2018 | 3 | 5 | 4 | 2 |
| aaam | 2018 | 6 | 12 | 1 | 1 |
| aaam | 2019 | 1 | 12 | 1 | 0 |
| ayym | 2018 | 4 | 4 | 4 | 3 |
| ayym | 2018 | 6 | 16 | 1 | 1 |
| bbbm | 2018 | 6 | 19 | 1 | 1 |
Again, the four letter codes for BIRD identify unique individuals. Each line is one observation; there are 1 to 100+ lines per bird. Those observations tell me when each bird was encountered: for example, if there exists a line where BIRD is aaam, and YEAR is 2018, and MONTH is 3 and OUTCOME is 1, that means aaam was encountered March of 2018 and the first digit in aaam's encounter history is a 1, but if no lines in the resightings database match the appropriate criteria, then the bird was not encountered, so that time period gets a zero.
To get from the large data frame to the "encounter history" I need to:
- construct a new data frame where the first column is again bird names but only one row per bird.
- add about 21 additional columns, most of which will be populated with a 0 or 1 based on a logical test like the "if" sentence above (see the code below starting with "any").
- add the various spaces, slashes, asterisks, codes for sex, to each row as in the example at top,
- concatenate each row, all those ones and zeros and spaces and letters and characters, into a text string,
- save as a text file (preferably with the suffix ".inp" but I can also rename it outside of R).
What I have tried:
unique(resightings$BIRD)
returns a vector with each unique bird name.
A statement like the below returns "TRUE" or "FALSE", and if I can get it to spit out "1" or "0" instead I think I can use something like this to fill in the other columns, editing how I specify month and year for each time interval.
any(resightings$BIRD=="ayym" & resightings$YEAR==2018 & resightings$MONTH==12|11|10|9 & resightings$OUTCOME==1)
I think "paste" is going to be how I convert each row of the final data frame to strings of text, probably with sep="" to avoid spaces getting inserted.
But I'm very stuck on the rest of how to construct the new smaller data frame from the old one, even on how to conceptualize it. Do I construct a blank table and then populate it? Do I use the vector that results from unique(resightings$BIRD) to construct additional vectors for each new column, then assemble all those vectors into a data frame? I apologize for my newby-ness. I have read answers to most of the previous questions asked about pivot tables or "constructing a new table from data in an existing table in R" but I am still stuck.
Solution 1:[1]
Interesting task... Here's one possible solution:
library(tidyverse)
# Test data
data <- tribble(
~BIRD,~YEAR,~MONTH,~DAY,~SURVEY_TYPE,~OUTCOME,~SEX,
"aaam",2018,3,5,4,2,"male",
"aaam",2018,6,12,1,1,"male",
"aaam",2019,1,12,1,0,"male",
"ayym",2018,4,4,4,3,"female",
"ayym",2018,6,16,1,1,"female",
"bbbm",2018,6,19,1,1,"male",
"bbbm",2019,2,19,1,1,"male"
)
data2 <- data %>% group_by(BIRD) %>% summarise(
PERIOD_1 = as.integer(any(YEAR==2018 & MONTH %in% c(1,2,3,4) & OUTCOME==1)),
PERIOD_2 = as.integer(any(YEAR==2018 & MONTH %in% c(5,6,7,8) & OUTCOME==1)),
PERIOD_3 = as.integer(any(YEAR==2018 & MONTH %in% c(9,10,11,12) & OUTCOME==1)),
PERIOD_4 = as.integer(any(YEAR==2019 & MONTH %in% c(1,2,3,4) & OUTCOME==1)),
SEX = if (unique(SEX)=="male") "1 0 1" else "0 1 1"
)
data2
data3 <- data2 %>% unite("HISTORY", PERIOD_1:PERIOD_4, sep="")
data4 <- paste0("/*", data3$BIRD, "*/ ", data3$HISTORY, " ", data3$SEX, ";")
write_lines(data4, "test.inp")
data4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kevin Dialdestoro |


