'Splitting string in multiple places Python
I am trying to get a set of numbers out of a string. The numbers are nestled between characters.
Here is an example: NC123456Sarah Von Winkle
NCis the only part of the string that is a guarantee123456is the number I want to extractSarah Von Winkleis the name, it can be anything
So I cannot just split at 'S' and 'C' to try and grab the digits.
Code
Nothing tried so far.
Problem
I have no idea how to approach this.
How can I split the string to get only the digits in the middle?
Solution 1:[1]
You can use Regex for this:
import re
s='NC123456Sarah Von Winkle'
m=''.join(re.findall(r'NC(\d+).*',s))
print(int(m))
Solution 2:[2]
You can try re, which is the standard library of Python.
import re
sample_string = "NC123456Sarah Von Winkle"
result_digits = re.findall(r"\d+", sample_string, flags=0)
Then your result should be ['123456']. If you want just an integer instead of a string, you can convert it with int(result_digits[0]).
Solution 3:[3]
Use the regex module :
import re
s = "NC123456Sarah Von Winkle"
t = re.findall("[0-9]+",s)
print(t)
This will give :
['123456']
The regular-expression (pattern) is composed of:
- character-range
[0-9]will find all occurrences of any digit between 0 to 9 in the strings - quantifier
+indicates, we are searching for at least one occurrence of the pattern before (e.g.[0-9]).
Solution 4:[4]
To match and capture (= extract) the number, you can use a regular-expression.
TL;DR: I would recommend re.match(r'NC(\d+)', s).group(1) (details in the last section).
Regex to match a number
To match a number with a minimum length of 1 digit, use the regular-expression (patter) \d+' for one or many digits, optionally inside a capturing-group as (\d+)` where:
\dis a character class (meta-character) for digits (of range 0-9)+is a quantifier matching if at least one occurrence of preceding pattern was found(and)form a capturing-group of the enclosed sub-regex
Test your regex on regex101 or regexplanet and choose the right flavor/language/engine (here: Python).
In Python use the built-in regex module re. Define the regex as raw-string like r'\d+'.
Find to extract only the number or empty list
Either function re.findall to find a list of occurrences:
import re
s = 'NC123456Sarah Von Winkle'
pattern = r'\d+'
occurrences = re.findall(pattern, s)
print(occurrences)
Prints:
['123456']
The first number occurrences[0] is yours if not empty:
if len(occurrences) == 0:
print('no number found in: ' + s)
else:
number = occurrences[0]
Split to get all parts
Or function re.split to split the string into parts:
import re
s = 'NC123456Sarah Von Winkle'
pattern = r'(\d+)'
parts = re.split(pattern, s)
print(parts)
Prints:
['NC', '123456', 'Sarah Von Winkle']
Note: without the capture-group (i.e. without parentheses ()) the output would be just: ['NC', 'Sarah Von Winkle'] (excluding the splitter-pattern)
Here you would get the number in second part parts[1] as long as non-number-prefix like "NC" is guaranteed and followed by a number.
Extract with a capturing-group
Use the group function together with a regex containing a capturing-group:
import re
s = 'NC123456Sarah Von Winkle'
capture_number_pattern = re.compile(r'NC(\d+)')
extracted = capture_number_pattern.match(s).group(1)
print(extracted)
Prints:
123456
Note: re.compile returns a compiled pattern. This can optimize performance when pattern is re-used multiple times and improve readability of the code.
Pay attention: To make your matching robust and defensive test if there is a match, otherwise an error is raised at runtime, see Python shell:
>>> extracted = capture_number_pattern.match('NCHelloWorld2022').group(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
You can test if a match was found or fail-fast if match is None:
s = 'NCHelloWorld2022'
match = capture_number_pattern.match(s)
if not match:
print('No number found in:' + s)
else:
print(match.group(1))
prints:
No number found in:NC123456Sarah Von Winkle
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wasif |
| Solution 2 | Dharman |
| Solution 3 | hc_dev |
| Solution 4 |
