'Should I normalize or not? If yes how?
Currently I have a table with a column containing CSVs. I am not sure whether to normalize the whole table or not. The problem is this column, configuration, may contain up to 50 or more different types of values. For example in the table shown below it's 18, 20, but for other data in the same column it may be 0, 20, 21, 22, 23, 25, 26, 27, 40, 52, 54, 55 and so on, however these values are unique. They will never repeat.
I do not know what is the maximum number for it(it may vary) so that's why I kept it in CSV. I am currently have trouble normalizing it, or rather I am not sure whether I should even normalize it. Any help here?
id tester_type device_id board_id configuration
75946 UFLEX 997 220
44570 UFLEX 450 220 18,20
44569 UFLEX 449 220 18,20
44568 UFLEX 448 220 18,20
44567 UFLEX 447 220 18
Note: Configuration column does also contain empty values or empty spaces.
Solution 1:[1]
I disagree with both "must" and "must not" normalize stands. My 2 cents:
- Do not normalize "continuous" values such as prices, numbers, dates, floats, etc.
- Do not normalize values that are unique or nearly so.
Do not normalize fields that are narrow. For example, don't replace a 2-letter country code with a 4-byte country_id.
"Normalize for simplicity": Do normalize things that are used in multiple tables and are subject to change. Sometimes names, addresses, company names, etc fall into this category. This is so you can change the value in exactly one place, not lots of places.
"Normalize for space": Do normalize things that would save a significant amount of overall space for the dataset. (This applies to gigabyte tables much more so than to kilobyte tables.)
Normalize, but don't "over-normalize". You will figure out what I mean when you have over-normalized and a nasty
JOINcan't be optimized.
If you would like further specific advice, let's see SHOW CREATE TABLE and sample values for any un-obvious columns.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rick James |
