'Error when using the html_table function from the rvest package
I am attempting to perform some web-scraping using rvest on a webpage behind a login and I have successfully connected to the web page and can access the HTML. (For those interested I am scraping fantasy rugby player statistics).
I am trying to pass the data into a data frame using this code:
loginsession %>%
read_html() %>%
html_elements('.general') %>%
html_table(fill = T) %>%
data.frame()
But am met with this error:
Error in matrix(unlist(values), ncol = width, byrow = TRUE) : 'data' must be of a vector type, was 'NULL'
The html reads like this:
[1] <div class="item hider general club" style="text-align: left"><strong>Club</strong></div>\n
[2] <div class="item hider general nationality" style="text-align: left"><strong>Nat</strong></div>\n
[3] <div class="item hider general salary"><strong>Salary</strong></div>\n
[4] <div class="item hider general points"><strong>Points</strong></div>\n
[5] <div class="item hider general selectionCount"><strong>Selected</strong></div>\n
[6] <div class="item hider general internationalCaps"><strong>Caps</strong></div>\n
[7] <div class="item hider general age"><strong>Age</strong></div>\n
[8] <div class="item hider general recommendation scout-report"><strong>Recm</strong></div>\n
[9] <div class="item hider general form-display" style="text-align: left"><strong>Form</strong></div>\n
[10] <div class="item hider general averageRating"><strong>Avg</strong></div>\n
[11] <div class="item hider general minutesPlayed"><strong>Mins</strong></div>\n
[12] <div class="item hider general pointsPerGame"><strong>Pts/80</strong></div>\n
[13] <div class="item hider general attackingPointsPerGame"><strong>Att/80</strong></div>\n
[14] <div class="item hider general defensivePointsPerGame"><strong>Def/80</strong></div>\n
[15] <div class="item hider general kickingPointsPerGame"><strong>K/80</strong></div>\n
[16] <div class="item hider general club" style="text-align: left">\n<div class="logo full"><a class="popup" data-url="/fant ...
[17] <div class="item hider general nationality" style="text-align: left">\n<i class="flag GB-ENG"></i><span class="hide-for ...
[18] <div class="item hider general salary">\n ...
[19] <div class="item hider general points">\n ...
[20] <div class="item hider general selectionCount">\n 13%\n ...
Solution 1:[1]
html_table requires an HTML table definition (with tags TABLE, TH, TR, TD …). Your HTML is a series of divisions (DIV) presumably styled with CSS (class = "…") to look like an HTML table.
You can try extracting the desired elements as vectors, using CSS-selectors. Example:
## raw html %>%
html_node("div.item.hider.general > strong")
find tutorials on CSS selectors here
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | I_O |
