'Most appropriate data file format for columns with multiple inputs
I am new to data science and I have this data from SO survey: (.xlsx /.csv)
| DevType |
|---|
| full-stack |
| Developer; desktop or enterprise applications;Developer; full-stack |
| Designer;Developer; back-end |
| Designer; full-stack |
And I heard that it is a bad practice to do this where:
| DevType | DevType_1 | DevType_2 | DevType_3 |
|---|---|---|---|
| full-stack | NA | NA | NA |
| Developer | desktop or enterprise applications | Developer | full-stack |
| Designer | Developer, | back-end | NA |
| Designer | full-stack | NA | NA |
Basically, some entries are not utilized because of null values. From what I heard it is an inefficient use of columns that takes up file size when then affects performance speeds in processing these kinds of data. (SO survey has an approx of 80000 respondents)
What is the best file format for this kinds of data? I researched .json, is that true?
I am using it for smooth workflow in my data visualization purposes (PowerBI/Tableau/QlikSense)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
