'How to remove space from number followed by unit or dimensions?
Here is the input string
string1 = 0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM
string3 = 0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm
These are three string that I'm working on, so here I want two remove space from number followed by unit/dimension/mesurments and % as well, eg- 10 ML => 10ML but 8290306544FLUSH this is wrong. and second thing is if there is 10 digit number then make format like 4 digit - 4 digit - 2 digit. eg- 8290-3065-44 and if there is 9 digit the add zero at first and make it in format. eg- 290306544 => 0290306544 => 0290-3065-44
I want output like
string1 = 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
string2 = 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
string3 = 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm
how I make python function for this
Solution 1:[1]
This code may help you.
# pip install quantities
from quantities import units
string1 ='0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'
def string_formater(string):
unit_symbols = [u.symbol for _, u in units.__dict__.items() if isinstance(u, type(units.deg))] # list of all units
string = string.strip().split(' ') # strip remove unwanted spaces and split make a list.
for a in string:
if a.lower() in unit_symbols or a.upper() in unit_symbols: # if a is a unit then combine it with his previous value example '10','cm' then it becomes '10cm'.
index = string.index(a)
string[index-1] = string[index-1]+ string[index]
del string[index]
def number_formater(num):
num = list(num)
num.insert(4,'-')
num.insert(9,'-')
return(''.join(num)) # return the formated number with dash('-')
for a in string:
if a.isdigit():
if len(a) == 9:
index = string.index(a)
a = '0'+a
string[index] = number_formater(a)
elif len(a) == 10:
index = string.index(a)
string[index] = number_formater(a)
return(' '.join(string))
print(string_formater(string1)) # 0.9% SODIUM CHLORIDE 8290-3065-44 FLUSH 0.9% SYRINGE 10ML
print(string_formater(string2)) # 0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 76MM
print(string_formater(string3)) # 0.9% SODIUM CHLORIDE 0290-3065-44 FLUSH 0.9% SYRINGE 65cm
Solution 2:[2]
One other way:
import re
string1 = '0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML'
string2 = '0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM'
string3 = '0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm'
def repl(x):
print(x)
s =x.group(1)
if s is not None:
t = ('0' + s if len(s) == 9 else s)
return f'{t[:4]}-{t[4:6]}-{t[6:]}'
s1 = x.group(2)
if s1 is not None:
return s1.replace(' ', '')
def my_fun(string):
return re.sub(r'(\b\d{9,10}\b)|(\d{1,3} [%a-zA-Z]{1,2})', repl, string)
my_fun(string1)
Out[]: '0.9% SODIUM CHLORIDE 8290-30-6544 FLUSH 0.9% SYRINGE 10ML'
my_fun(string2)
Out[]: '0.9% SODIUM CHLORIDE 8290-3071-44FLUSH 0.9% SYRINGE 10MM'
my_fun(string3)
Out[]: '0.9% SODIUM CHLORIDE 0290-30-6544 FLUSH 0.9% SYRINGE 10cm'
Solution 3:[3]
You could use a specific pattern to capture either 9 or 10 digits with capture groups, or match digits followed by a percentage sign or units.
Then you can make use of re.sub with a callback function checking for the existence of the capture groups. If there are there, return the number formatted with the hyphens, else remove the whitespace chars from the match.
(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)
Explanation
(?i)Inline modifier for a case insensitive match\b(\d{1,2})?A word boundary to prevent a partial word match, and capture 1-2 digits in group 1(\d{4})(\d{4})Capture group 2 and group 3 matching 4 digits each\bA word boundary|Or\b\d+A word boundary, then match 1+ digits\s+(?:M[ML]|cm|%)Match 1+ whitspace chars followed by either a unit or a percentage sign (You can extend the alternation of the units with the ones you want to allow)
Example code
import re
pattern = r"(?i)\b(\d{1,2})?(\d{4})(\d{4})\b|\b\d+\s+(?:M[ML]|cm|%)"
s = ("0.9% SODIUM CHLORIDE 8290306544 FLUSH 0.9 % SYRINGE 10 ML\n"
"0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9 % SYRINGE 10 MM\n"
"0.9% SODIUM CHLORIDE 290306544 FLUSH 0.9 % SYRINGE 10 cm\n")
def replacement(m):
if m.group(1):
nrs = "-".join(m.groups())
return "0" + nrs if len(m.group(1)) == 1 else nrs
return re.sub(r"\s+", "", m.group())
print(re.sub(pattern, replacement, s))
Output
0.9% SODIUM CHLORIDE 82-9030-6544 FLUSH 0.9% SYRINGE 10ML
0.9% SODIUM CHLORIDE 8290-3071-44 FLUSH 0.9% SYRINGE 10MM
0.9% SODIUM CHLORIDE 02-9030-6544 FLUSH 0.9% SYRINGE 10cm
See a regex demo and a Python demo
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | The fourth bird |
