'How Do I add Multiple characters using SED? [duplicate]

I have files with dates and times formatted as such and about 6 million lines of them

20211231233710;20211231233713;SomeHEXID;SomeIDNumber;Whatever;0;SomethingElse

How can I make it so that the dates are formatted in this way?

2021-12-05 18:05:03;2021-12-05 18:05:04;SomeHEXID;SomeIDNumber;Whatever;0;SomethingElse

Each line starts in this way. I have tried something like sed -e 's/\(....\)\(..\)\(..\)\(..\)\(..\)\(..\);\(.*\)/\1-\2-\3 \4:\5:\6;\7/' -e 's/\(.*\);\(....\)\(..\)\(..\)\(..\)\(..\)\(..\)/\1;\2-\3-\4 \5:\6:\7/' and sed -E 's/(....)(..)(..)(..)(..)(..)(;?)/\1-\2-\3 \4:\5:\6\7/g' but both of these methods change the text past the first 2 fields. TIA



Solution 1:[1]

There are multiple ways to approach the problem with sed, but all of the best ones will involve using s commands with capturing groups in the pattern and back references in the replacement. However, any solution based on that tool will need to work around the problem that there are too many separate fields (12) in each line of data to associate all of them with separate back references in a single s command. One fairly simple way to accommodate that would be to split it across two s commands, one for the first date code and one for the second. For example:

sed -e 's/\(....\)\(..\)\(..\)\(..\)\(..\)\(..\);\(.*\)/\1-\2-\3 \4:\5:\6;\7/' \
    -e 's/\(.*\);\(....\)\(..\)\(..\)\(..\)\(..\)\(..\)/\1;\2-\3-\4 \5:\6:\7/' \
    < input > output

The first sed expression matches matches each of the first 14 characters with its own . wildcard, capturing them in separate groups as specified by the escaped parentheses \( ... \), and captures all of the second date code with one additional capturing group matching the tail of the line. It then replaces everything matched (that is, the whole line) with an intermediate form in which the first date code is formatted but the second is not, something like so:

2021-12-05 18:05:03;20211205180504

In the replacement part of that s command, the \1 represents the data captured by the first group, the \2 represents that captured by the second group, etc..

The second sed expression is similar, but it handles the second time code, producing the wanted overall formatting as the end result.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1