'Error in UseMethod("xml_find_first") : no applicable method for 'xml_find_first' applied to an object of class "character"

I am trying to get the coordinates from the following below webpage: https://nominatim.openstreetmap.org/ui/search.html?q=

However while trying to find the <p> class I am getting the above error.

However, we can see that <p>class exists in the HTML Code. enter image description here

Code I am using for finding the <p> class: geocode <- function(record_id, address, city, state, zipcode){

  # NOMINATIM SEARCH API URL
  src_url <- "https://nominatim.openstreetmap.org/ui/search.html?q="
  
  ### INPUTS PREPARATION ###
  
  city <- str_replace_all(string = city, 
                          pattern = "\\s|,", 
                          replacement = "+")
  
  # CREATE A FULL ADDRESS
  addr <- paste(address, city, state, zipcode, sep = "%2C")
  
  # CREATE A SEARCH URL BASED ON NOMINATIM API TO RETURN GEOJSON
  requests <- paste0(src_url, addr, "&format=geojson")
  
  # ITERATE OVER THE URLS AND MAKE REQUEST TO THE SEARCH API
  for (i in 1:length(requests)) {
    
    # MAKE HTML REQUEST TO API AND TRANSFORM HTML RESPONSE TO JSON
    response <- read_html(requests[i]) %>%
      html_node("p") %>%
      html_text() %>%
      fromJSON()
    
    # FROM THE RESPONSE EXTRACT LATITUDE AND LONGITUDE COORDINATES
    lon <- response$features$geometry$coordinates[[1]][1]
    lat <- response$features$geometry$coordinates[[1]][2]
    
    # CREATE A COORDINATES DATAFRAME
    if (TRUE && i == 1) {
      loc <- tibble(record_id = record_id[i],
                    address = str_replace_all(addr[i], "%2C", ","),
                    latitude = lat, longitude = lon)
    }else{
      df <- tibble(record_id = record_id[i],
                    address = str_replace_all(addr[i], "%2C", ","),
                    latitude = lat, longitude = lon)
      loc <- bind_rows(loc, df)
    }
  }
  return(loc)
}

Recreating the problem through minimal code:

geocode <- function(record_id, address, city, state, zipcode){
  src_url <- "https://nominatim.openstreetmap.org/ui/search.html?q="
  city <- str_replace_all(string = city, 
                          pattern = "\\s|,", 
                          replacement = "+")
  addr <- paste(address, city, state, zipcode, sep = "%2C")
  requests <- paste0(src_url, addr, "&format=geojson")
  
  return(requests)
  
}

geocode(record_id = 1,
        address = 123,
        city = "New York",
        state = "NY", zipcode = "1006")

Output: "https://nominatim.openstreetmap.org/ui/search.html?q=123%2CNew+York%2CNY%2C1006&format=geojson"

request <- "https://nominatim.openstreetmap.org/ui/search.html?q=123%2CNew+York%2CNY%2C1006&format=geojson"

read_html(request)

Output:

{html_document}
<html lang="en">
[1] <head>\n<meta http-equiv="Content-Type" content="text/h ...
[2] <body>\n</body>

read_html(request) %>%
      + html_nodes('p')

Which results in the above output. What seems to be the problem?



Solution 1:[1]

You are not constructing the correct endpoint which the browser, running JS, actually calls. You can confirm this by monitoring what happens in the network tab of browser when refreshing the target webpage.

Below, I show an amended function to generate the correct endpoint URI, as well as an example call.

library(httr2)

geocode <- function(record_id, address, city, state, zipcode) {
  src_url <- "https://nominatim.openstreetmap.org/search.php?q="
  city <- str_replace_all(
    string = city,
    pattern = "\\s|,",
    replacement = "+"
  )
  addr <- paste(address, city, state, zipcode, sep = "%2C")
  requests <- paste0(src_url, addr, "&polygon_geojson=1&format=jsonv2")

  return(requests)
}

url <- geocode(
  record_id = 1,
  address = 123,
  city = "New York",
  state = "NY", zipcode = "1006"
)


headers <- c("User-Agent" = "Mozilla/5.0")

data <- request(url) |>
  (\(x) req_headers(x, !!!headers))() |>
  req_perform() |>
  resp_body_json()

print(data[[1]]$lat)
print(data[[1]]$lon)
print(data[[1]]$geojson$coordinates)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 QHarr