'Replace Comma Outside Double Quote - Python - Regex
I want to open a CSV file, using open(). I read it per line. For some reason, I'm not using Pandas.
I want to replace comma , with _XXX_, but I want to avoid replacing commas inside double quotes " because that comma is not a separation tag, so I can't use:
string_ = string_.replace(',', '_XXX_')
How to do this? User regex maybe?
I've found replace comma inside quotation, Python regex: find and replace commas between quotation marks, but i need replace comma OUTSIDE quotation.
Solution 1:[1]
You may use a re.sub with a simple "[^"]*" regex (or (?s)"[^"\\]*(?:\\.[^"\\]*)*" if you need to handle escaped sequences in between double quotes, too) to match strings between double quotes, capture this pattern into Group 1, and then match a comma in all other contexts. Then, pass the match object to a callable used as the replacement argument where you may further manipulate the match.
import re
print( re.sub(r'("[^"]*")|,',
lambda x: x.group(1) if x.group(1) else x.group().replace(",", ""),
'1,2,"test,3,7","4, 5,6, ... "') )
# => 12"test,3,7""4, 5,6, ... "
print( re.sub(r'(?s)("[^"\\]*(?:\\.[^"\\]*)*")|,',
lambda x: x.group(1) if x.group(1) else x.group().replace(",", ""),
r'1,2,"test, \"a,b,c\" ,03","4, 5,6, ... "') )
# => 12"test, \"a,b,c\" ,03""4, 5,6, ... "
See the Python demo.
Regex details
("[^"]*")|,:("[^"]*")- Capturing group 1: a", then any 0 or more chars other than"and then a"|- or,- a comma
The other one is
(?s)- the inline version of are.S/re.DOTALLflag("[^"\\]*(?:\\.[^"\\]*)*")- Group 1: a", then any 0 or more chars other than"and\then 0 or more sequences of a\and any one char followed with 0 or more chars other than"and\and then a"|- or,- comma.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
