'How Do I add Multiple characters using SED? [duplicate]
I have files with dates and times formatted as such and about 6 million lines of them
20211231233710;20211231233713;SomeHEXID;SomeIDNumber;Whatever;0;SomethingElse
How can I make it so that the dates are formatted in this way?
2021-12-05 18:05:03;2021-12-05 18:05:04;SomeHEXID;SomeIDNumber;Whatever;0;SomethingElse
Each line starts in this way.
I have tried something like
sed -e 's/\(....\)\(..\)\(..\)\(..\)\(..\)\(..\);\(.*\)/\1-\2-\3 \4:\5:\6;\7/' -e 's/\(.*\);\(....\)\(..\)\(..\)\(..\)\(..\)\(..\)/\1;\2-\3-\4 \5:\6:\7/'
and
sed -E 's/(....)(..)(..)(..)(..)(..)(;?)/\1-\2-\3 \4:\5:\6\7/g'
but both of these methods change the text past the first 2 fields.
TIA
Solution 1:[1]
There are multiple ways to approach the problem with sed, but all of the best ones will involve using s commands with capturing groups in the pattern and back references in the replacement. However, any solution based on that tool will need to work around the problem that there are too many separate fields (12) in each line of data to associate all of them with separate back references in a single s command. One fairly simple way to accommodate that would be to split it across two s commands, one for the first date code and one for the second. For example:
sed -e 's/\(....\)\(..\)\(..\)\(..\)\(..\)\(..\);\(.*\)/\1-\2-\3 \4:\5:\6;\7/' \
-e 's/\(.*\);\(....\)\(..\)\(..\)\(..\)\(..\)\(..\)/\1;\2-\3-\4 \5:\6:\7/' \
< input > output
The first sed expression matches matches each of the first 14 characters with its own . wildcard, capturing them in separate groups as specified by the escaped parentheses \( ... \), and captures all of the second date code with one additional capturing group matching the tail of the line. It then replaces everything matched (that is, the whole line) with an intermediate form in which the first date code is formatted but the second is not, something like so:
2021-12-05 18:05:03;20211205180504
In the replacement part of that s command, the \1 represents the data captured by the first group, the \2 represents that captured by the second group, etc..
The second sed expression is similar, but it handles the second time code, producing the wanted overall formatting as the end result.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
