'Summarizing repeated items by student attempt
Problem and explanation of the expected output
I have raw data of how many attempts each student had in an exam and which items they responded (input). For this example, I have a pool of 5 items, but students only respond to 3 of those items. They can take the exam multiple times. Hence, I would like to summarize the raw data in a way that I can see all info below:
- Trial number (e.g., 1st trial, 2nd trial, 3rd trial...).
trialcolumn - How many correct items (e.g., total correct items) How many total items they've responded to (in this case, it is supposed to be 3 for everyone),
- Percent of correct.
percentcolumn - Main problem: How many of the items had they seen before?
repeatedcolumn (e.g., if they responded to item 1 in the first trial and also responded to item 1 in the second trial, then I want to know this info)- If there are repeated items, I want to see which repeated
items were there.
repeated_itemscolumn
- If there are repeated items, I want to see which repeated
items were there.
Input
As input, I have the raw data, with students name, the number of the trial (e.g., 1, 2, 3...), and all items from the pool i1:i5. The values for the items (0, 1) shows whether the student got it right or wrong (1 = correct, 0 = wrong). And missing values show that the student didn't take that item on that attempt.
library(dplyr)
df <- tibble(name = c("John", "John", "Mary", "Mary"),
trial = c(1, 2, 1, 2),
i1 = c(1, 0, 0, NA),
i2 = c(NA, NA, 1, 1),
i3 = c(NA, 1, NA, 1),
i4 = c(0, 1, 1, NA),
i5 = c(0, NA, NA, 1))
# # A tibble: 4 × 7
# name trial i1 i2 i3 i4 i5
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 John 1 1 NA NA 0 0
# 2 John 2 0 NA 1 1 NA
# 3 Mary 1 0 1 NA 1 NA
# 4 Mary 2 NA 1 1 NA 1
Expected Output
As output I want to see a summary table like this one below, showing all points I've put in the description above. I've created this table manually, and now I'm trying to organize it with functions, so I can speed up this process. My main problem is to generate columns repeated and repeated_items. The first trial of each students will always have NA for repeated_items and zero for repeated given this is the first trial, so they've never seen those items before.
repeatedcolumn counts how many items the participant had already seen in a previous attempt. So for instance, in John's first attempt, he responded to items i1, i4, and i5. In his second attempt, he responded to items i1, i3, and i4. Hence, in his second attempt, he responded to two items that were previously seen (i1, and i4). So I want to use the columnrepeatedto count how many items were repeated from the previous attempt.repeated_itemcolumn tracks which specific items were "repeated" in this new trial. So, in John's case, in the second trial, items i1 and i4 were repeated. And in Mary's second trial, only item i2 was repeated when compared to her first trial.# # A tibble: 4 × 7 # name trial correct n_items percent repeated repeated_items # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> # 1 John 1 1 3 0.333 0 NA # 2 John 2 2 3 0.667 2 i1, i4 # 3 Mary 1 2 3 0.667 0 NA # 4 Mary 2 3 3 1 1 i2
Solution 1:[1]
library(tidyverse)
df %>%
mutate(correct = rowSums(across(starts_with("i")), na.rm = T),
n_items = rowSums(!is.na(across(starts_with("i")))),
percent = correct / n_items) %>%
group_by(name) %>%
mutate(repeated_items = across(starts_with("i")) %>%
imap_chr(~ ifelse(!is.na(diff(.)), .y, NA)) %>%
na.omit() %>%
str_flatten(","),
repeated_items = ifelse(row_number() == 1, NA, repeated_items),
repeated = replace_na(str_count(repeated_items, ",") + 1, 0)) %>%
ungroup()
Output
name trial i1 i2 i3 i4 i5 correct n_items percent repeated_items repeated
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 John 1 1 NA NA 0 0 1 3 0.333 NA 0
2 John 2 0 NA 1 1 NA 2 3 0.667 i1,i4 2
3 Mary 1 0 1 NA 1 NA 2 3 0.667 NA 0
4 Mary 2 NA 1 1 NA 1 3 3 1 i2 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | LMc |
