'R losing levels when importing csv file
I have one data frame with 3686 rows and 34 columns. When I save this data.frame with write.csv2(data, file = folder/data.csv2) and than load it into R again with read.csv2(folder/data.csv2), it also has the same number of rows (3686); but, when I ask for the number of species (Factor) with unique(data$Species), the data table in the Environment has 708 Levels and the one I imported showed only 554 Levels.
str(imported_dataframe$Species)
Output: Factor w/ 554 levels
str(Data_in_Environment$Species)
Output: Factor w/ 708 levels
Can anyone help me?
Solution 1:[1]
The levels attribute is lost when you write to CSV. You could either export the levels separately and set them in your data.frame.
# Species is a factor with three levels
all_levels <- levels(iris$Species)
all_levels
# [1] "setosa" "versicolor" "virginica"
# export table where not all levels are present
write.csv2(head(iris), file = "iris_tmp.csv", row.names = FALSE)
# also export complete list of levels
cat(all_levels, file = "iris_levels_tmp.txt")
# import both levels and data
all_levs <- scan("iris_levels_tmp.txt", what = "")
iris6 <- read.csv2("iris_tmp.csv")
# unrepresented levels have been lost
levels(iris6$Species)
# [1] "setosa"
# define Species as factor with all levels
iris6$Species <- factor(iris6$Species, levels = all_levs)
Alternatively you could export an R data object using save/load.
iris5 <- head(iris, n = 5)
save("iris5", file = "iris5.rda")
# load back iris5
load(file = "iris5.rda")
levels(iris5$Species)
# [1] "setosa" "versicolor" "virginica"
Solution 2:[2]
Alternatively, you can use csvy library and export csv file with yaml header containing factor levels:
# library load
library(csvy)
library(dplyr)
# relevel factos
iris_releveled = iris %>% mutate(Species = relevel(Species, "virginica","setosa","versicolor"))
# write csv file
write.csv2(iris_releveled,"iris_releveled.csv")
# load exported dataset
iris_relevel_loaded = read.csv2("iris_releveled.csv",stringsAsFactors = T)
# now factor levels are lost
iris_relevel_loaded$Species %>% levels()
# write CSVy file from dataset with releveled factors
write_csvy(iris_releveled, file = "iris_releveled.csvy")
# read csv file with original factor levels
iris_relevel_loaded = read_csvy("iris_releveled.csvy", stringsAsFactors = T)
# now factor levels are kept
iris_relevel_loaded$Species %>% levels()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
