'Using xml_find_first in R to extract group of tags

This is my xml file:

<Games>
    <Game id = 1>
    <Q>1</Q>
    <Q>Rick</Q>
    <Q>623.3</Q>
    <Q>1/1/2012</Q>
    <Q>IT</Q>
    </Game>
    
    <Game id = 2>
    <Q>2</Q>
    <Q>Dan</Q>
    <Q>515.2</Q>
    <Q>9/23/2013</Q>
    <Q>Operations</Q>
    </Game>
    
    <Game id = 3>
    <Q>3</Q>
    <Q>Michelle</Q>
    <Q>611</Q>
    <Q>11/15/2014</Q>
    <Q>IT</Q>
    </Game>
    
</Games>

I need to To extract all the Q tags but mantain then associated to the Gametags ids.

When I use xml_find_first(xmlfile, path = ".//Game") I only have the Q's tags associated to the id 1.

How can I have the others Qtags without take the risk of lose the associated id's ?



Solution 1:[1]

You can do:

library(xml2)
library(dplyr)

my_xml %>% 
  xml_find_all(xpath = "//Game") %>% 
  lapply(function(x) c(xml_attr(x, "id"), xml_text(xml_children(x)))) %>%
  do.call(rbind, .) %>%
  as.data.frame() %>%
  setNames(c("Game", paste0("Q", seq(length(.) - 1))))
#>   Game Q1       Q2    Q3         Q4         Q5
#> 1    1  1     Rick 623.3   1/1/2012         IT
#> 2    2  2      Dan 515.2  9/23/2013 Operations
#> 3    3  3 Michelle   611 11/15/2014         IT

Created on 2022-03-25 by the reprex package (v2.0.1)


Reproducible data

xml <- charToRaw(
'<Games>
    <Game id = "1">
    <Q>1</Q>
    <Q>Rick</Q>
    <Q>623.3</Q>
    <Q>1/1/2012</Q>
    <Q>IT</Q>
    </Game>
    
    <Game id = "2">
    <Q>2</Q>
    <Q>Dan</Q>
    <Q>515.2</Q>
    <Q>9/23/2013</Q>
    <Q>Operations</Q>
    </Game>
    
    <Game id = "3">
    <Q>3</Q>
    <Q>Michelle</Q>
    <Q>611</Q>
    <Q>11/15/2014</Q>
    <Q>IT</Q>
    </Game>
    
</Games>')

my_xml <- read_xml(x = rawConnection(xml))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Allan Cameron