'How do I create a data table in code in R
I have a data table as a CSV file that I use to create metrics for a dashboard. The data table includes Metric IDs and associates these with field names. This table--this definition of metrics--is largely static, and I'd like to include it within R code rather than, for example, importing a CSV file containing these headings.
The table looks something like this:
| Metric_ID | Metric_Name | Numerator | Denominator |
|---|---|---|---|
| AB0001 | Number_of_Customers | No_of_Customers | |
| AB0002 | Percent_New_Customers | No_of_New_Customers | No_of_Customers |
This has about 40 rows of data, and I'd like to set this table up in code so that it is created at the time the R query is run. I'll then use it to associate metric IDs with measures I retrive through SQL queries. Sometimes this table may change -- for example, new metrics might be added or existing metrics modified. This would need some modificatoin in the code to incorporate these metrics.
The closet way I could find was to create a data table, along the lines described in the query below.
dt<-data.table(x=c(1,2,3),y=c(2,3,4),z=c(3,4,5))
dt
x y z
1: 1 2 3
2: 2 3 4
3: 3 4 5
cbind with data table and data frame
This works for a table with a few rows or columns, but will be unwieldy for tables with 40+ rows. For example, if I wanted to modify a metric 20 rows down, I'd have to go 20 rows down in each column, and then test the table to ensure I switched the metric at the right place in each column -- especially where some metrics have empty cells. for example, I may correct the metric ID in row 20, but accidentally put the definition (a separate column) in row 19.
Is there a more straightforward way of, in essence, creating a table in code?
(I appreciate the most straightforward way would be to keep a CSV file accessible and use read_csv to import it into R. However, this doesn't work so well if colleagues are running this query on their machine and have a different file path to the CSV -- it also raises the risk of them running the query with an out-of-date metrics table, as they may not have the latest version in their files).
Thanks in advance for any guidance you might have!
Tony
Solution 1:[1]
Here are two options (examples taken from respective help pages):
data.table::fread()
fread("A,B
1,2
3,4
")
#> A B
#> <int> <int>
#> 1: 1 2
#> 2: 3 4
https://rdatatable.gitlab.io/data.table/reference/fread.html
tibble::tribble()
tribble(
~colA, ~colB,
"a", 1,
"b", 2,
"c", 3
)
#> # A tibble: 3 × 2
#> colA colB
#> <chr> <dbl>
#> 1 a 1
#> 2 b 2
#> 3 c 3
https://tibble.tidyverse.org/reference/tribble.html
Other options:
- If you already have the data.frame from somewhere, you can also use
dput()to get astructure()code you can paste into the files you are distributing. - use the
reprexpackage https://reprex.tidyverse.org/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
