'How to identify CSS or xpath and convert html tables into dataframes?

Hi I'm trying to scrape https://www.coingecko.com/en/exchanges/binance using Rselenium and Rvest. I'm really interested in this project even though I only have small knowledge in coding. Hoping someone will lead me to the right direction.

I was able to make it work using rvest with a different code but I'm being limited with 50 data only because I can't find a workaround on the "show more" button.

--Here's the process flow (After navigating to the website)

1.) By using loop It will click the "show more**"** button on the first table until there's no more left.

2.) Extract the data on the table.

3.) Convert html table into Data frame

4.) Save to CSV

*Challenges

1.) I can't make the loop work. I think I'm using the wrong class/xpath (confused how to identify.)

2.) I want to extract the first column but can't figure out what class/xpath I should put in the code.

3.) I was able to turn html to datasets using rvest, xml2 (stored url) but now I have no idea how to make it work with Rselenium. Any link to tutorials will be appreciated. Thank you!

library(RSelenium)
library(rvest)
library(xml2)

driver <- remoteDriver()
driver$open()


driver$navigate("https://www.coingecko.com/en/exchanges/binance")



ShowMore ({
    Sys.sleep(5)
  suppressMessages ({

      showmore_btn <- driver$findElement("Class", "btn btn-primary btn-sm mt-1")
    while(showmore_btn$isElementDisplayed()[[1]]){
      showmore_btn$clickElement()
      Sys.sleep(10)
      showmore_btn <- driver$findElement("class", "btn btn-primary btn-sm mt-1")
    }
  })
},
error = function(e) {
  NA_character_
})

html_data <- driver$getPageSource()[[1]]
htmldata %>%
read_html() %>%
html_nodes(".") %>%
html_attr("href")

#converts html tables into dataframes


write.csv(html_data, "Coingecko Latest Volume")


Solution 1:[1]

I would do:

library(rvest)
library(dplyr)
url <- "https://www.coingecko.com/en/exchanges/binance"


read_html(url) %>% 
  html_element(xpath = '//*[@id="markets"]/div/div[2]') %>% 
  html_table()

Output is:

# A tibble: 50 × 12
     `#` Coin                      `Market Cap`        Pair      Price                         Spread `+2% Depth` `-2% Depth` `24h Volume`           `Volume %` `Last Traded` `Trust Score`
   <int> <chr>                     <chr>               <chr>     <chr>                         <chr>  <chr>       <chr>       <chr>                  <chr>      <chr>         <lgl>        
 1     1 "Bitcoin\n/\nTether"      $818,318,162,545.68 BTC/USDT  "$43,092.10\n\n43103 USDT"    0.01%  $25,515,001 $21,867,327 "$1,903,904,039\n \n4… 10.36%     Recently      NA           
 2     2 "Ethereum\n/\nTether"     $364,805,278,069.74 ETH/USDT  "$3,038.39\n\n3040.19 USDT"   0.01%  $26,850,657 $11,691,663 "$1,531,387,527\n \n5… 8.34%      Recently      NA           
 3     3 "Binance...\n/\nTether"   $17,649,975,489.72  BUSD/USDT "$1.00\n\n0.9995 USDT"        0.01%  $88,473,714 $66,661,930 "$764,952,254\n \n765… 4.16%      Recently      NA           
 4     4 "ApeCoin\n /\nTether"     $2,415,880,396.94   APE/USDT  "$14.34\n\n14.3415 USDT"      0.03%  $3,666,651  $3,492,905  "$592,369,004\n \n413… 3.22%      Recently      NA           
 5     5 "Bitcoin\n/\nBinance..."  $818,318,162,545.68 BTC/BUSD  "$43,064.38\n\n43119.77 BUSD" 0.01%  $11,392,485 $13,300,407 "$542,515,491\n \n125… 2.95%      Recently      NA           
 6     6 "Loopring\n/\nTether"     $1,380,968,843.56   LRC/USDT  "$1.10\n\n1.1052 USDT"        0.02%  $731,068    $774,477    "$483,094,499\n \n437… 2.63%      Recently      NA           
 7     7 "Cardano\n/\nTether"      $36,353,319,987.05  ADA/USDT  "$1.14\n\n1.138 USDT"         0.09%  $2,418,833  $2,506,293  "$437,868,499\n \n384… 2.39%      Recently      NA           
 8     8 "Ethereum\n/\nBinance..." $364,805,278,069.74 ETH/BUSD  "$3,044.90\n\n3045.63 BUSD"   0.01%  $5,781,558  $10,258,913 "$407,185,464\n \n133… 2.22%      Recently      NA           
 9     9 "Ethereu...\n/\nTether"   $5,907,151,680.36   ETC/USDT  "$44.03\n\n44.04 USDT"        0.02%  $791,333    $920,964    "$349,260,416\n \n793… 1.90%      Recently      NA           
10    10 "Solana\n/\nTether"       $31,369,956,754.28  SOL/USDT  "$97.23\n\n97.25 USDT"        0.01%  $3,050,523  $2,598,698  "$330,473,923\n \n339… 1.80%      Recently      NA           
# … with 40 more rows

you can Identify the xpath e.g. in chrome with: go to "settings" -> "more tools" -> "developer tools" -> use the courser and go over the table. sometimes you cant't select it properly, just mouseover the code whre you believe is the table.

enter image description here

right click on it and select: copy as xpath

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stephan