'NAs introduced by coercion when using as.numeric

theurl <- "https://cryptoslam.io/#sales-rankings-24h"
url <- curl(theurl, "rb")
urldata <- readLines(url, warn=FALSE)
data <- readHTMLTable(urldata, stringAsFactors = FALSE)
close(url)
data.2 <- data.frame(Reduce(rbind, data[1]))

data.3 <- data.2 %>% dplyr::select(Collection, Sales, Change..24h.) %>%
  head(10) %>% mutate(Sales.numeric = as.numeric(gsub('[$,]', '', Sales))) %>%
  mutate(Change.numeric = as.numeric(gsub('%', '', Change..24h.)))

I have been experiencing NA coercion even though I have removed % from the column but I am still unable to change it into numeric form.

r


Solution 1:[1]

We may use parse_number

library(dplyr)
data.2 %>%
   dplyr::select(Collection, Sales, Change..24h.) %>%
   head(10) %>%
   mutate(Sales.numeric = as.numeric(gsub('[$,]', '', Sales))) %>% 
   mutate(Change.numeric = readr::parse_number(Change..24h.))

-output

                                Collection      Sales Change..24h. Sales.numeric Change.numeric
1            Bored Ape Yacht ClubBored Ape YC $9,241,122       33.87%       9241122          33.87
2  Mutant Ape Yacht ClubMutant Ape Yacht Club $8,068,976       27.42%       8068976          27.42
3                      CryptoPunksCryptoPunks $3,067,042       70.91%       3067042          70.91
4                                CloneXCloneX $2,781,643       41.75%       2781643          41.75
5                      RTFKT MNLTHRTFKT MNLTH $2,478,028       29.55%       2478028          29.55
6                                  AzukiAzuki $2,418,388       30.29%       2418388          30.29
7                              CrabadaCrabada $2,128,350       20.20%       2128350          20.20
8  Bored Ape Kennel ClubBored Ape Kennel Club $2,112,681        2.23%       2112681           2.23
9                World Of WomenWorld Of Women $1,703,430       41.22%       1703430          41.22
10                   NBA Top ShotNBA Top Shot $1,695,039       73.66%       1695039          73.66

The reason is that there is a space before the number and this prevents it from converting to character

> data.2 %>%
  dplyr::select(Collection, Sales, Change..24h.) %>%
   head(10) %>%
   mutate(Sales.numeric = as.numeric(gsub('[$,]', '', Sales))) %>% 
   pull(Change..24h.)
[1] " 33.87%" " 27.42%" " 70.91%" " 41.75%" " 29.55%" " 30.29%" " 20.20%" " 2.23%"  " 41.22%" " 73.66%"

So, if we remove the space it should work

 data.2 %>%
   dplyr::select(Collection, Sales, Change..24h.) %>%
   head(10) %>%
  mutate(Sales.numeric = as.numeric(gsub('[$,]', '', Sales))) %>%
   mutate(Change.numeric = as.numeric(gsub("[^0-9.]+", "", Change..24h.)))

-output

                             Collection      Sales Change..24h. Sales.numeric Change.numeric
1            Bored Ape Yacht ClubBored Ape YC $9,241,122       33.87%       9241122          33.87
2  Mutant Ape Yacht ClubMutant Ape Yacht Club $8,068,976       27.42%       8068976          27.42
3                      CryptoPunksCryptoPunks $3,067,042       70.91%       3067042          70.91
4                                CloneXCloneX $2,781,643       41.75%       2781643          41.75
5                      RTFKT MNLTHRTFKT MNLTH $2,478,028       29.55%       2478028          29.55
6                                  AzukiAzuki $2,418,388       30.29%       2418388          30.29
7                              CrabadaCrabada $2,128,350       20.20%       2128350          20.20
8  Bored Ape Kennel ClubBored Ape Kennel Club $2,112,681        2.23%       2112681           2.23
9                World Of WomenWorld Of Women $1,703,430       41.22%       1703430          41.22
10                   NBA Top ShotNBA Top Shot $1,695,039       73.66%       1695039          73.66

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1