'Need to extract KBs off a text file using regular expressions in powershell or python
Example Input to evaluate.
- I need to find extract the KB numbers out of the text below. using regular expressions in powershell or python.
"KB4565628 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565628 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565628 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed
Solution 1:[1]
Use Select-String:
$strings = @(
"KB4565628 is not installed
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565628 is not installed
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565628 is not installed
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565588 or KB4565635 is not installed
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565588 or KB4565635 is not installed
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565588 or KB4565635 is not installed
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
)
$KBIDs = $strings |Select-String 'KB\d{5,}' -AllMatches |ForEach-Object Matches |ForEach-Object Value
The regular expression pattern KB\d{5,} describes a string consisting of the literal characters K and B, followed by 5 or more digits.
As a result, $KBIDs will now contain the KBXXXXXX identifiers from the input strings.
To remove duplicates, use Sort-Object -Unique:
$UniqueKBIDs = $KBIDs |Sort-Object -Unique
Solution 2:[2]
Here is a regex to match all of the KBs followed by numbers (tested on your provided text string: KB\d*
Example:
import re
res = re.findall( r'KB\d*', '0#" "KB4565588 or KB4565635 is not installed...')
print(res)
>>> ['KB4565588', 'KB4565635']
So, if you would like to get only numbers without "KB", then you could do this:
filtered_res = [elem[2:] for elem in res]
print(filtered_res)
>>> ['4565588', '4565635']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mathias R. Jessen |
| Solution 2 | LUKASANUKVARI |
