'R losing levels when importing csv file

I have one data frame with 3686 rows and 34 columns. When I save this data.frame with write.csv2(data, file = folder/data.csv2) and than load it into R again with read.csv2(folder/data.csv2), it also has the same number of rows (3686); but, when I ask for the number of species (Factor) with unique(data$Species), the data table in the Environment has 708 Levels and the one I imported showed only 554 Levels.

str(imported_dataframe$Species)

Output: Factor w/ 554 levels

str(Data_in_Environment$Species)

Output: Factor w/ 708 levels

Can anyone help me?



Solution 1:[1]

The levels attribute is lost when you write to CSV. You could either export the levels separately and set them in your data.frame.

# Species is a factor with three levels
all_levels <- levels(iris$Species)
all_levels
# [1] "setosa"     "versicolor" "virginica" 

# export table where not all levels are present
write.csv2(head(iris), file = "iris_tmp.csv", row.names = FALSE)

# also export complete list of levels
cat(all_levels, file = "iris_levels_tmp.txt")

# import both levels and data
all_levs <- scan("iris_levels_tmp.txt", what = "")
iris6 <- read.csv2("iris_tmp.csv")

# unrepresented levels have been lost
levels(iris6$Species)
# [1] "setosa"

# define Species as factor with all levels
iris6$Species <- factor(iris6$Species, levels = all_levs)

Alternatively you could export an R data object using save/load.

iris5 <- head(iris, n = 5)
save("iris5", file = "iris5.rda")
# load back iris5
load(file = "iris5.rda")
levels(iris5$Species)
# [1] "setosa"     "versicolor" "virginica"

Solution 2:[2]

Alternatively, you can use csvy library and export csv file with yaml header containing factor levels:

# library load
library(csvy)
library(dplyr)

# relevel factos
iris_releveled = iris %>% mutate(Species = relevel(Species, "virginica","setosa","versicolor"))
# write csv file
write.csv2(iris_releveled,"iris_releveled.csv")
# load exported dataset
iris_relevel_loaded = read.csv2("iris_releveled.csv",stringsAsFactors = T)
# now factor levels are lost
iris_relevel_loaded$Species %>% levels()

# write CSVy file from dataset with releveled factors
write_csvy(iris_releveled,  file = "iris_releveled.csvy")
# read csv file with original factor levels
iris_relevel_loaded = read_csvy("iris_releveled.csvy", stringsAsFactors = T)
# now factor levels are kept
iris_relevel_loaded$Species %>% levels()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2