'What is the difference between astype('category') and astype(CategoricalDtype())?
Learning Data Science and need to convert 'object' values to 'categorical' values. While I was studying somebody's code I encountered that there are two possible ways to do that. So my question is when we have to use them?
df[name] = df[name].astype('category')
df[name] = df[name].astype(CategoricalDtype(levels, ordered = True))
Solution 1:[1]
I was wondering about the same thing...And here's what I found:
when you use "astype(CategoricalDtype(levels, ordered = True)", it returns a list of categories (which is defined by ['b', 'a'] in the example below), but this category is ordered, meaning that when you are implementing an ML model, "b"'s value is considered lower than "a"'s. Whereas in "astype('category')", the data type is simply changed to category and there is no pre-defined order of values in the list of categories!
Input:
t = pd.CategoricalDtype(categories=['b', 'a'], ordered=True)
pd.Series(['a', 'b', 'a', 'c'], dtype=t)
Output:
0 a
1 b
2 a
3 NaN
dtype: category
Categories (2, object): ['b' < 'a']
Input:
s = pd.Series(["a", "b", "c", "a"], dtype="category")
s
Output:
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): ['a', 'b', 'c']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dharman |
