'typeof returns integer for something that is clearly a factor
Create a variable:
a_variable <- c("a","b","c")
Check type:
typeof(a_variable)
I want a factor - change to factor:
a_variable <- as.factor(a_variable)
Check type:
typeof(a_variable)
Says that it's integer!? As an R newb, this is confusing. I just told R to make a factor not an integer.
Test to see if it somehow magically did create an integer:
a_variable * 1
Hmm... I get an error message saying "*" isn't meaningful for factors. This seems weird to me since R just told me it was an integer!?
Clearly it's me who is confused, can someone more enlightened help make sense of this madness for me?
Solution 1:[1]
More on str - the surprising information for me was it's an abbreviation of "structure" not "string". It can be clearly seen in the bottommost example how str command is capturing it subjectively clearer than dput, naming it “Factor w/ N levels”:
str(head(abalone$Age, 5))
Factor w/ 3 levels "Mid","Old","Yng": 2 3 1 1 3
Thank you for asking this question, as I've found data types in R confusing and ran into the same issue while processing the Abalone dataset from UCI Machine Learning Repository. I've continued on with the research following the reply by IRTFM. It eventually helped me understand the typing and hopefully could help someone else. I found this resource helpful on understanding R data types: R-supp-data-structures
What I've observed while processing the data.frame from Abalon dataset:
- running lapply function on the "Age" column of the data.frame is resulting in a "list" of "character" type objects - due to the lapply property always returning a list even if in this case it could be an atomic vector
- further applying unlist function on the "Age" column of the data.frame is resulting in an "atomic vector" of "character" type object
- afer encoding vector as a factor we get a "factor" class object
The code example:
#
# Understanding datatypes while processing Abalone dataset
#
download.file('http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', 'abalone.data')
abalone = read.table("abalone.data", header = FALSE, sep=",", na.strings= "*")
# name columns of a data.frame object
colnames(abalone) <- c('Sex', 'Length','Diameter','Height','Whole w.', 'Shucked w.', 'Viscera w.','Shell w.','Rings')
dput(head(abalone, 1))
# discretize numeric rings to three ranges of an abalone age
additiveRingsToAgeConst = 1.5;
abalone$Age = lapply(abalone[,'Rings'] + additiveRingsToAgeConst, function (x) {
if (x > 11.5) {"Old"}
else if (x > 9.5) {"Mid"}
else {"Yng"}
})
# 1. running lapply function on the "Age" column of the data.frame is resulting in a "list" of "character" type objects
dput(head(abalone$Age, 5))
str(head(abalone$Age, 5))
# 2. further applying unlist function on the "Age" column of the data.frame is resulting in an "atomic vector" of "character" type object
abalone$Age = unlist(abalone$Age);
dput(head(abalone$Age, 5))
str(head(abalone$Age, 5))
# 3. afer encoding vector as a factor we get a "factor" class object
abalone$Age = as.factor(abalone$Age)
dput(head(abalone$Age, 5))
str(head(abalone$Age, 5))
Code execution results:
> # 1. running lapply function on the "Age" column of
# the data.frame is resulting in a "list" of "character" type objects
> dput(head(abalone$Age, 5))
list("Old", "Yng", "Mid", "Mid", "Yng")
> str(head(abalone$Age, 5))
List of 5
$ : chr "Old"
$ : chr "Yng"
$ : chr "Mid"
$ : chr "Mid"
$ : chr "Yng"
> # 2. further applying unlist function on the "Age" column of the data.frame
# is resulting in an "atomic vector" of "character" type object
> abalone$Age = unlist(abalone$Age);
> dput(head(abalone$Age, 5))
c("Old", "Yng", "Mid", "Mid", "Yng")
> str(head(abalone$Age, 5))
chr [1:5] "Old" "Yng" "Mid" "Mid" "Yng"
> # 3. afer encoding vector as a factor we get a "factor" class object
> abalone$Age = as.factor(abalone$Age)
> dput(head(abalone$Age, 5))
structure(c(2L, 3L, 1L, 1L, 3L), .Label = c("Mid", "Old", "Yng"
), class = "factor")
> str(head(abalone$Age, 5))
Factor w/ 3 levels "Mid","Old","Yng": 2 3 1 1 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | IRTFM |
