'How do I count the ocurrence of a data point in a data frame?
I have this table and i want to retain and count only the id in which the string A and D are most represented. For example, A and D are most represented in the id "abc" than in the "hil" Id.
| string | id | start | end |
|---|---|---|---|
| A | abc | 0 | 1 |
| A | abc | 2 | 3 |
| B | efg | 1 | 3 |
| A | hil | 5 | 6 |
| A | abc | 6 | 7 |
| D | abc | 7 | 8 |
| D | abc | 1 | 2 |
| D | hil | 3 | 4 |
How can I obtain the id in which those strings are most represented?
Solution 1:[1]
In base R, you can get the most common id for each string like this:
apply(table(df$id, df$string), 2, function(x) {
rownames(table(df$id, df$string))[which.max(x)] })
#> A B D
#> "abc" "efg" "abc"
Solution 2:[2]
You can use this code:
df %>%
filter(string == "A" | string == "D") %>%
group_by(id) %>%
count(id) %>%
arrange() %>%
ungroup() %>%
slice(1)
Output:
# A tibble: 1 × 2
id n
<chr> <int>
1 abc 5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Allan Cameron |
| Solution 2 |
