'How to identify CSS or xpath and convert html tables into dataframes?
Hi I'm trying to scrape https://www.coingecko.com/en/exchanges/binance using Rselenium and Rvest. I'm really interested in this project even though I only have small knowledge in coding. Hoping someone will lead me to the right direction.
I was able to make it work using rvest with a different code but I'm being limited with 50 data only because I can't find a workaround on the "show more" button.
--Here's the process flow (After navigating to the website)
1.) By using loop It will click the "show more**"** button on the first table until there's no more left.
2.) Extract the data on the table.
3.) Convert html table into Data frame
4.) Save to CSV
*Challenges
1.) I can't make the loop work. I think I'm using the wrong class/xpath (confused how to identify.)
2.) I want to extract the first column but can't figure out what class/xpath I should put in the code.
3.) I was able to turn html to datasets using rvest, xml2 (stored url) but now I have no idea how to make it work with Rselenium. Any link to tutorials will be appreciated. Thank you!
library(RSelenium)
library(rvest)
library(xml2)
driver <- remoteDriver()
driver$open()
driver$navigate("https://www.coingecko.com/en/exchanges/binance")
ShowMore ({
Sys.sleep(5)
suppressMessages ({
showmore_btn <- driver$findElement("Class", "btn btn-primary btn-sm mt-1")
while(showmore_btn$isElementDisplayed()[[1]]){
showmore_btn$clickElement()
Sys.sleep(10)
showmore_btn <- driver$findElement("class", "btn btn-primary btn-sm mt-1")
}
})
},
error = function(e) {
NA_character_
})
html_data <- driver$getPageSource()[[1]]
htmldata %>%
read_html() %>%
html_nodes(".") %>%
html_attr("href")
#converts html tables into dataframes
write.csv(html_data, "Coingecko Latest Volume")
Solution 1:[1]
I would do:
library(rvest)
library(dplyr)
url <- "https://www.coingecko.com/en/exchanges/binance"
read_html(url) %>%
html_element(xpath = '//*[@id="markets"]/div/div[2]') %>%
html_table()
Output is:
# A tibble: 50 × 12
`#` Coin `Market Cap` Pair Price Spread `+2% Depth` `-2% Depth` `24h Volume` `Volume %` `Last Traded` `Trust Score`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
1 1 "Bitcoin\n/\nTether" $818,318,162,545.68 BTC/USDT "$43,092.10\n\n43103 USDT" 0.01% $25,515,001 $21,867,327 "$1,903,904,039\n \n4… 10.36% Recently NA
2 2 "Ethereum\n/\nTether" $364,805,278,069.74 ETH/USDT "$3,038.39\n\n3040.19 USDT" 0.01% $26,850,657 $11,691,663 "$1,531,387,527\n \n5… 8.34% Recently NA
3 3 "Binance...\n/\nTether" $17,649,975,489.72 BUSD/USDT "$1.00\n\n0.9995 USDT" 0.01% $88,473,714 $66,661,930 "$764,952,254\n \n765… 4.16% Recently NA
4 4 "ApeCoin\n /\nTether" $2,415,880,396.94 APE/USDT "$14.34\n\n14.3415 USDT" 0.03% $3,666,651 $3,492,905 "$592,369,004\n \n413… 3.22% Recently NA
5 5 "Bitcoin\n/\nBinance..." $818,318,162,545.68 BTC/BUSD "$43,064.38\n\n43119.77 BUSD" 0.01% $11,392,485 $13,300,407 "$542,515,491\n \n125… 2.95% Recently NA
6 6 "Loopring\n/\nTether" $1,380,968,843.56 LRC/USDT "$1.10\n\n1.1052 USDT" 0.02% $731,068 $774,477 "$483,094,499\n \n437… 2.63% Recently NA
7 7 "Cardano\n/\nTether" $36,353,319,987.05 ADA/USDT "$1.14\n\n1.138 USDT" 0.09% $2,418,833 $2,506,293 "$437,868,499\n \n384… 2.39% Recently NA
8 8 "Ethereum\n/\nBinance..." $364,805,278,069.74 ETH/BUSD "$3,044.90\n\n3045.63 BUSD" 0.01% $5,781,558 $10,258,913 "$407,185,464\n \n133… 2.22% Recently NA
9 9 "Ethereu...\n/\nTether" $5,907,151,680.36 ETC/USDT "$44.03\n\n44.04 USDT" 0.02% $791,333 $920,964 "$349,260,416\n \n793… 1.90% Recently NA
10 10 "Solana\n/\nTether" $31,369,956,754.28 SOL/USDT "$97.23\n\n97.25 USDT" 0.01% $3,050,523 $2,598,698 "$330,473,923\n \n339… 1.80% Recently NA
# … with 40 more rows
you can Identify the xpath e.g. in chrome with: go to "settings" -> "more tools" -> "developer tools" -> use the courser and go over the table. sometimes you cant't select it properly, just mouseover the code whre you believe is the table.
right click on it and select: copy as xpath
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Stephan |

