'Using xml_find_first in R to extract group of tags
This is my xml file:
<Games>
<Game id = 1>
<Q>1</Q>
<Q>Rick</Q>
<Q>623.3</Q>
<Q>1/1/2012</Q>
<Q>IT</Q>
</Game>
<Game id = 2>
<Q>2</Q>
<Q>Dan</Q>
<Q>515.2</Q>
<Q>9/23/2013</Q>
<Q>Operations</Q>
</Game>
<Game id = 3>
<Q>3</Q>
<Q>Michelle</Q>
<Q>611</Q>
<Q>11/15/2014</Q>
<Q>IT</Q>
</Game>
</Games>
I need to To extract all the Q tags but mantain then associated to the Gametags ids.
When I use xml_find_first(xmlfile, path = ".//Game") I only have the Q's tags associated to the id 1.
How can I have the others Qtags without take the risk of lose the associated id's ?
Solution 1:[1]
You can do:
library(xml2)
library(dplyr)
my_xml %>%
xml_find_all(xpath = "//Game") %>%
lapply(function(x) c(xml_attr(x, "id"), xml_text(xml_children(x)))) %>%
do.call(rbind, .) %>%
as.data.frame() %>%
setNames(c("Game", paste0("Q", seq(length(.) - 1))))
#> Game Q1 Q2 Q3 Q4 Q5
#> 1 1 1 Rick 623.3 1/1/2012 IT
#> 2 2 2 Dan 515.2 9/23/2013 Operations
#> 3 3 3 Michelle 611 11/15/2014 IT
Created on 2022-03-25 by the reprex package (v2.0.1)
Reproducible data
xml <- charToRaw(
'<Games>
<Game id = "1">
<Q>1</Q>
<Q>Rick</Q>
<Q>623.3</Q>
<Q>1/1/2012</Q>
<Q>IT</Q>
</Game>
<Game id = "2">
<Q>2</Q>
<Q>Dan</Q>
<Q>515.2</Q>
<Q>9/23/2013</Q>
<Q>Operations</Q>
</Game>
<Game id = "3">
<Q>3</Q>
<Q>Michelle</Q>
<Q>611</Q>
<Q>11/15/2014</Q>
<Q>IT</Q>
</Game>
</Games>')
my_xml <- read_xml(x = rawConnection(xml))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Allan Cameron |
