'Do I need to change any of my data types to analyze this data frame?
Working on a case study here and I'm at the portion of my study where I look at my cleaned data set and analyze it. I'm looking to find average Age and Sex of drivers at fault for car accidents. When I summarize my data to view some basic stats this is what I get:
> summary(crash_demographics_2016)
Case_Individual_ID Year Case_ID Sex Age
Min. :17475366 Min. :2016 Min. :17475366 Length:60053 Min. :16.00
1st Qu.:17817391 1st Qu.:2016 1st Qu.:17817391 Class :character 1st Qu.:23.00
Median :18141624 Median :2016 Median :18141624 Mode :character Median :31.00
Mean :18133987 Mean :2016 Mean :18133987 Mean :36.41
3rd Qu.:18445424 3rd Qu.:2016 3rd Qu.:18445424 3rd Qu.:47.00
Max. :19486782 Max. :2016 Max. :19486782 Max. :95.00
The area of concern is the column "Sex". Do I need to change the data type to get any analytical information from this column besides length?
Solution 1:[1]
Most likely, yes. It's likely that Sex will contain binary data, so you need to change its data type into a factor, as Phil was suggesting in the comments:
crash_demographics_2016$Sex <- as.factor(crash_demographics_2016$Sex)
You will then be able to rerun summary() and get a frequency table, or use a more complex function such as summarytools::freq() to also get percentages.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrea M |
