'How to create a simple crosstab across multiple fields in R
I feel like there's a simple answer to this, since there are so many similar questions on this forum, but none of them seem to apply for some reason.
I have survey data along with segment assignments uploaded via excel into R. The structure of my data is basically this (around 50 fields, including a segment assignment, for each of 330 respondents):
| RespondentID | Household.Size | Children.in.household | Segment |
|---|---|---|---|
| 1 | 3 | Yes | Segment 1 |
| 2 | 2 | No | Segment 3 |
| 3 | 1 | No | Segment 2 |
| 4 | 2 | No | Segment 5 |
| 5 | 4 | Yes | Segment 1 |
As you see above, some variables are numeric, some are categorical. I want to create a simple crosstab, no stat testing needed, that summarizes each field per segment. Ie. for each segment, what % of respondents in that segment have a household size of 1, 2, 3, etc; what % have children in the household, etc.
I've experimented with both CrossTable and crosstab (I realize the latter is just a wrapper for the former), but I can't get anything to work. Most of the solutions posted in forums are for individual fields, and I'd like to avoid generating individual tables for each of 50 fields, if possible.
Please let me know if there's a straightforward way to approach this!
Solution 1:[1]
Using the data shown reproducibly in the Note at the end and assuming we have a small number of segments (viz. 5) and a large number of variables try the table1 or similar packages. (Convert all columns to factor first if you want to treat numeric columns as discrete items.)
library(table1)
tab1 <- table1(~ . | Segment, DF[-1])
as.data.frame(tab1)
giving:
Segment 1 Segment 2 Segment 3 Segment 5 Overall
1 (N=2) (N=1) (N=1) (N=1) (N=5)
2 Household.Size
3 Mean (SD) 3.50 (0.707) 1.00 (NA) 2.00 (NA) 2.00 (NA) 2.40 (1.14)
4 Median [Min, Max] 3.50 [3.00, 4.00] 1.00 [1.00, 1.00] 2.00 [2.00, 2.00] 2.00 [2.00, 2.00] 2.00 [1.00, 4.00]
5 Children.in.household
6 Yes 2 (100%) 0 (0%) 0 (0%) 0 (0%) 2 (40.0%)
7 No 0 (0%) 1 (100%) 1 (100%) 1 (100%) 3 (60.0%)
Note
DF <-
structure(list(RespondentID = 1:5, Household.Size = c(3L, 2L,
1L, 2L, 4L), Children.in.household = c("Yes", "No", "No", "No",
"Yes"), Segment = c("Segment 1", "Segment 3", "Segment 2", "Segment 5",
"Segment 1")), class = "data.frame", row.names = c(NA, -5L))
Solution 2:[2]
Another package that might work for this is janitor, then the tabyl function within that package:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | G. Grothendieck |
| Solution 2 | arachne591 |
