'Do I need to change any of my data types to analyze this data frame?

Working on a case study here and I'm at the portion of my study where I look at my cleaned data set and analyze it. I'm looking to find average Age and Sex of drivers at fault for car accidents. When I summarize my data to view some basic stats this is what I get:

> summary(crash_demographics_2016)
 Case_Individual_ID      Year         Case_ID             Sex                 Age       
 Min.   :17475366   Min.   :2016   Min.   :17475366   Length:60053       Min.   :16.00  
 1st Qu.:17817391   1st Qu.:2016   1st Qu.:17817391   Class :character   1st Qu.:23.00  
 Median :18141624   Median :2016   Median :18141624   Mode  :character   Median :31.00  
 Mean   :18133987   Mean   :2016   Mean   :18133987                      Mean   :36.41  
 3rd Qu.:18445424   3rd Qu.:2016   3rd Qu.:18445424                      3rd Qu.:47.00  
 Max.   :19486782   Max.   :2016   Max.   :19486782                      Max.   :95.00  

The area of concern is the column "Sex". Do I need to change the data type to get any analytical information from this column besides length?

r


Solution 1:[1]

Most likely, yes. It's likely that Sex will contain binary data, so you need to change its data type into a factor, as Phil was suggesting in the comments:

crash_demographics_2016$Sex <- as.factor(crash_demographics_2016$Sex)

You will then be able to rerun summary() and get a frequency table, or use a more complex function such as summarytools::freq() to also get percentages.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrea M