'Need to extract KBs off a text file using regular expressions in powershell or python

Example Input to evaluate.

  1. I need to find extract the KB numbers out of the text below. using regular expressions in powershell or python.

"KB4565628 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565628 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565628 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed %windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0 %windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#" "KB4565588 or KB4565635 is not installed



Solution 1:[1]

Use Select-String:

$strings = @(
"KB4565628 is not installed 
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565628 is not installed 
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565628 is not installed 
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565588 or KB4565635 is not installed 
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565588 or KB4565635 is not installed 
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
"KB4565588 or KB4565635 is not installed 
%windir%\Microsoft.NET\Framework64\v4.0.30319\System.Data.dll Version is 4.8.3761.0
%windir%\Microsoft.NET\Framework\v4.0.30319\System.Data.dll Version is 4.8.3761.0#"
)

$KBIDs = $strings |Select-String 'KB\d{5,}' -AllMatches |ForEach-Object Matches |ForEach-Object Value

The regular expression pattern KB\d{5,} describes a string consisting of the literal characters K and B, followed by 5 or more digits.

As a result, $KBIDs will now contain the KBXXXXXX identifiers from the input strings.

To remove duplicates, use Sort-Object -Unique:

$UniqueKBIDs = $KBIDs |Sort-Object -Unique

Solution 2:[2]

Here is a regex to match all of the KBs followed by numbers (tested on your provided text string: KB\d*

Example:

import re


res = re.findall( r'KB\d*', '0#" "KB4565588 or KB4565635 is not installed...')

print(res)

>>> ['KB4565588', 'KB4565635']

So, if you would like to get only numbers without "KB", then you could do this:

filtered_res = [elem[2:] for elem in res]

print(filtered_res)

>>> ['4565588', '4565635']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mathias R. Jessen
Solution 2 LUKASANUKVARI