'How can I replace certain characters (and not others) between delimiters and across newlines, using Regex in Powershell?
Here is a sample:
: [
{
"yearGroupId": 13,
"educationPhaseEnum": 2,
"name": "Year Group 12",
"label": "YG 12"
},
{
"yearGroupId": 14,
"educationPhaseEnum": 2,
"name": "Year Group 13",
"label": "YG 13"
}
]
I want to remove the line breaks, and all quotes. I only want to do this between the strings ': [' and ' ]'. So the desired output would look like this:
[ { yearGroupId: 13, educationPhaseEnum: 2, name: Year Group 12, label: YG 12 }, { yearGroupId: 14, educationPhaseEnum: 2, name: Year Group 13, label: YG 13 } ]
I've tried Powershell -NoProfile "(Get-Content -Raw .\allacts.txt) -replace '(?<=\u003a\u0020\u005b).*[\n\r\u0022].*(?=\u0020\u0020\u0020\u0020\u005d)', '' | Out-File -FilePath allacts.txt -Force -Encoding ASCII"
and about a hundred other things... but can't get my head around how it's meant to work. What do I have to do to get Powershell to replace these characters within these bounds? In other places in the file I need the line breaks.
Thanks.
Edit: Yep, this is JSON data. The issue is that there are duplicate keys (I can't change that). Converting it to a CSV results Powershell ignoring duplicate keys and picking one of them to go into the output CSV. Directly importing the JSON into Excel (where I need it to go) results in Excel rejecting it as it can't handle duplicate keys.
So, I decided to just glom everything into one value and use Power Query to sort it out at the other end (using the commas as delimiters).
Solution 1:[1]
You can use either of the two plain string pattern regex replacements:
(Get-Content -Raw .\allacts.txt) -replace '(?s)(?<=: \[.*?)[\r\n"](?=.*? ])' | Out-File -FilePath allacts.txt -Force -Encoding ASCII
See this regex demo. Details:
(?s)-RegexOptions.Singlelineenables.to match any chars including newline chars(?<=: \[.*?)- a positive lookbehind that matches a location that is immediately preceded with: [string and then any zero or more chars as few as possible[\r\n"]- CR, LF or a"char(?=.*? ])- a positive lookahead that makes sure there are any zero or more chars as few as possible followed with a space +]char immediately to the right of the current location.
Or, if you have : [.."...".: [ ... ] like strings, and you want to remove the chars only in between the closest : [ and ] you will need to use
(Get-Content -Raw .\allacts.txt) -replace '(?s)(?<=: \[(?:(?!: \[).)*?)[\r\n"](?=.*? ])' | Out-File -FilePath allacts.txt -Force -Encoding ASCII
See this regex demo (see Context tab). Details:
(?s)-RegexOptions.Singlelineenables.to match any chars including newline chars(?<=: \[(?:(?!: \[).)*?)- a positive lookbehind that matches a location that is immediately preceded with: \[-: [string(?:(?!: \[).)*?- any char, zero or more but as few as possible times, that does not start a: [char sequence
[\r\n"]- CR, LF or a"char(?=.*? ])- a positive lookahead that makes sure there are any zero or more chars as few as possible followed with a space +]char immediately to the right of the current location.
Matches are removed here.
Or,
(Get-Content -Raw .\allacts.txt) -replace '(?s)(\G(?!^)|: \[)(.*?)[\r\n"](?=.*? ])', '$1$2' | Out-File -FilePath allacts.txt -Force -Encoding ASCII
or
(Get-Content -Raw .\allacts.txt) -replace '(?s)(\G(?!^)|: \[)((?:(?!: \[).)*?)[\r\n"](?=.*? ])', '$1$2' | Out-File -FilePath allacts.txt -Force -Encoding ASCII
See this regex demo (do not forget to click Context tab there). Here
(?s)-.matches any chars now(\G(?!^)|: \[)- Group 1 ($1): end of the previous match or: [string((?:(?!: \[).)*?)- Group 2 ($2): any char, zero or more but as few as possible times, that does not start a: [char sequence[\r\n"]- CR, LF or"(?=.*? ])- a check that there is space +]somewhere on the right.
In this case, matches are replaced with Group 1 + Group 2 values.
Replace literal spaces with \s* (zero or more whitespaces) or \s+ (one or more whitespaces) in the pattern if you mean to match any (amount of) whitespaces.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
