'Extract div class text and sub tables in rvest
I am trying to recreate a table from this website under "Battle Pass Rewards." The final result is a data.frame with each of the following areas as different columns:
The table has three "tr" tags, but rvest is merging the 2nd and 3rd on scrape. I'm not sure why.
fnite_s2 <- read_html("https://fortnite.fandom.com/wiki/Season_2")
fnite_s2 %>%
html_table(fill = TRUE) %>%
.[2]
For example, "Blue Squire Outfit" is scrapped when "Blue Squire" is in a separate td tag from "Outfit".
The other issue is that rarity or the blue background is set in a div tag such as the following:
div class="rarity-background uncommon">
I need to be able to scrape the "uncommon" part of the div-tag and add it as another column as well.
EDIT: I was able to grab most things, but I'd still stuck on grabbing the div tag information
fnite_bp <-
read_html("https://fortnite.fandom.com/wiki/Season_2") %>%
html_nodes(".listing") %>%
html_table(fill = T) %>%
# Convert to table
# Transpose to long
# Convert back to table
# Add tier number column
# Convert to long to mutate content types for both tiers
as_tibble(.name_repair = "unique") %>%
t() %>%
as_tibble() %>%
rownames_to_column(var = "tier") %>%
pivot_longer(-tier, names_to = "type", values_to = "content_string") %>%
mutate(
type = if_else(type == "V1", "free", "paid"),
content_name = str_extract(content_string, '[^\n]+'),
content_type = str_replace(content_string, content_name, ""),
content_type = str_replace(content_type, "Free ", ""),
amount = as.integer(str_extract(content_string, "\\d+")),
amount = if_else(type == "paid" | (type == "free" & content_string != ""), replace_na(amount, 1), NA_integer_)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
