'Extracting multiple values from text file

New to PowerShell and trying hard to get something out of the text file.

Here is sample data:

RDFC1111 Z MED 22 23:18:39 MPHSFHFKD OF THE AAAOAAAY GAAAAAA~ PAGE; 1 
DEFSDLSERD FSGHS CONRFGL CEERTE 
ASDF DFGF ASDA ERDFG REEVHT 
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS 
JHGTR ------- ASDV---------- FIELD IN ERROR ERROR DESCRIPTION 
TYPE JINYNGT HJUBGD N[ID7BER CRT JUR YR INFORMATION NJHGJVGFW -------------------- --------------------- 
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH 
„ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH 
AKASHDEEP „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH 
AMANDEEP „ ERROR MESSAGE LOPJUT PKJH IS MFiNDATORY 
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH 
AJAYKARPN „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
~#~ 3~ ~•~5~ 9~4 -~o aa<a•„webnuai u~,~~~L Lt~«<.uc ~, 
- w 
J, _ 
si~_o_. ,.._,~r~':L _ ~r.;~~n< r~~ n~in:~r~~P t+m5'1' P~ EQ~rc~. 'r0 ~S~ta`~I~ 7 
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH 

I'm trying to extract the bold fields from each line.

The output I'm looking for:

MK12001 C2100123
MK13103 C2100124
MRDOOP C21005237
JPPK C2123133

What I'm getting currently:

MK12001 3160
C2100123
MK13103 3160
C2100124
MRDOOP 3160
C21005237
JPPK 3160
C2123133

Problem:

Because I'm using "3160" for my match criteria for my first field so it's showing up in the results as well and for my second field which is a ticket# (C1234567), due to the use of "pipe" operator or "second" search/match criteria its going in the next line, If someone can help me to keep the ticket # in the same field then I guess I can live with having "3160" in between so it will look like

MK12001 3160 C2100123

or if someone can suggest me to only display the bold fields i.e before the 3160 then that would be awesome.

MK12001 C2100123

P.S: with my script, I'm already changing the "0" to "C" in ticket # field (C1234567)

Here is the code so far:

#Location of original file
$Location = "C:\Temp\Ap8.txt"
#Location of file where the "0" is replaced with "C"
$Location2 = "C:\Temp\results9.txt"
#Final results
$Location3 = "C:\Temp\tickets9.txt"

#get the original file
$Change = Get-Content $Location

# replace C with 0
$Change | ForEach-Object {$_ -Replace "3160 311 50 0",  "3160 311 50 C"} | 

#write the results to staging file
Set-Content $Location2

#get the staging/udpated file
Get-Content $Location2 -Raw |

#look up for a specific fileds, I have to fetch two fields from each line therefore using pipe operator inbetween
Select-String "\s\w{1,8}\s3160| C\d{7}" -AllMatches |


  % { $_.Matches.Groups.Value } |
  Out-File $Location3 -Encoding ascii -Force


Solution 1:[1]

i am truly bad with complex regex patterns, so this is done with string operators and only very simple regex patterns. [grin]

the code ...

#region >>> fake reading in a text file
#    when ready to do thiw with real data, use Get-Content
$InStuff = @'
RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME PETER „ ERROR MESSAGE SERVER NAME IS MANDATORY 
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME SUNNY „ ERROR MESSAGE SERVER NAME IS MFiNDATORY
'@ -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file

$Result = foreach ($IS_Item in $InStuff)
    {
    $TempBlock = ($IS_Item -split 'server')[0].trim()
    $First = (($TempBlock -split '3160')[0].Trim().Split())[-1]
    $Second = (($TempBlock -split '3160')[1].Trim().Split())[-1] -replace '\d{3}$' -replace '^0', 'C'

    '{0} {1}' -f $First, $Second
    }

$Result

output ...

MK12001 C2100123
MK13103 C2100124
MRDOOP C21005237
JPPK C2123133

what it does ...

  • fakes reading in a text file
    when doing this with real data, use Get-Content.
  • iterates thru the collection of lines
  • grabs the block that has the wanted data
  • splits out, trims, and saves the 1st target value
  • splits out, trims, and saves the 2nd target value
  • builds the output string from the 2 above values
  • sends that to the $Result collection
  • shows that collection on screen

Solution 2:[2]

I suggest a regex approach based on the -match and -replace operators:

# The substring that the lines of interest must contain
# Note:
#  [regex]::Escape() escapes the literal string so that the regex
#  engine uses it literally - which isn't strictly necessary in this case.
#  Alternatively, omit [regex]::Escape() and formulate the string
#  *as a regex* to begin with.
$searchStr = [regex]::Escape(' 3160 311 50 ')

# Filter the lines down to those of interest with -match,
# then use -replace to extract the tokens on either side of the search string.
@(Get-Content $Location) `
  -match $searchStr `
  -replace "^.+ (\w+)${searchStr}0(\w+)\d{3} .+$", '$1 C$2'

The above outputs to the display; pipe to Out-File (or, with text input, preferably, Set-Content) as needed.

For an explanation of the regex and the ability to experiment with it, see this regex101.com page.

Output with your sample data:

MK56502 C2156334
CD15622 C3214114
AK12102 C2652224
HJPOOL C6585527
RAAK C2197133
NUMJ13 C5501950
AY51619 C2177107

Alternatively, use a switch statement, which requires only one regex operation, but requires calling via a script block (& { ... }) in order to be able to pipe to a file-writing cmdlet:

& {
  switch -Regex -File $Location {
    ' (\w+) 3160 311 50 0(\w+)\d{3} ' { '{0} C{1}' -f $Matches[1], $Matches[2] }
  }
} # | Set-Content ...

Solution 3:[3]

Thanks, everyone for your help. Obviously, sometimes it's hard to post the original data (privacy) and also to fully express the problem but I'm so glad to be part of this community as everyone tries their best to help.

Original Problem: We receive a hard copy of a pdf, then we scan it and convert it using the OCR functionality of the printer and then convert it into a text file. But during the whole process, I lose some content and some typos are created. and I want to fetch two fields out of the whole text files which are available in random files.

  1. The first solution is working to fetch all the good matching tickets but doesn't include typo's in the output

  2. The second solution gives me all the data including the good matching tickets as well as tickets that are not matching but it doesn't allow me to add an extra exception by which I can add a condition to lower the number of bad/typo tickets.

#Solution 1:

  #Location of original file
    $Location = "C:\Temp\Ap8.txt"
  #Location of file where the "0" is replaced with "C"
    $Location2 = "C:\Temp\file2.txt"
  #Location of the file where the in-between string ' 3160 311 50 ' is replaced with nothing so that you are left with only the fields you need.
    $Location3 = "C:\Temp\file3.txt"
#Add the final results
    $Location4 = "C:\Temp\file4.txt"
  
    
    #get the original file
    $Change = Get-Content $Location
    
    # replace C with 0
    $Change | ForEach-Object {$_ -Replace "3160 311 50 0",  "3160 311 50 C"} | 
    
    #write the results to staging file
    Set-Content $Location2
    
    #get the staging/udpated file
    Get-Content $Location2 -Raw |
    
    #look up for a specific fileds, I have to fetch two fields from each line therefore using pipe operator inbetween
    Select-String "\s\w{1,8} 3160 311 50 \w{1,8}" -AllMatches |
    #Select-String "\s\w{1,8} 3160 311 50 \w{7}" -AllMatches |
    
    
      % { $_.Matches.Groups.Value } |
    
      #Set-Content $Location3
    
      # to write the content on location3 the code line just below also works start with out-file
    
      Out-File $Location3 -Encoding ascii -Force
    
    
    $PlateTicket =  Get-Content $Location3  
    
    $PlateTicket | ForEach-Object {$_ -Replace " 3160 311 50",  ""} | 
    
    Set-Content $Location4 `

#Solution 2:

# The substring that the lines of interest must contain
$Location = "C:\Temp\Ap8.txt"
$Location2 = "C:\Temp\Results.txt"
$searchStr = [regex]::Escape(' 3160 311 50 ')


@(Get-Content $Location) `
  -match $searchStr `
  -replace "^.+ (\w+)${searchStr}0(\w+)\d{3} .+$", '$1 C$2'|

    Set-Content $Location2

Credits to all the contributors/ original posters.

Thanks again, everyone. Just posting a summary, it's not my original work.

Solution 4:[4]

you could use a "simpler" one line command

So given your both example as input.txt

RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET 
„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY 
„ ERROR MESSAGE SERVER LOCAL IS MANDATORY 
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME 
PETER „ ERROR MESSAGE SERVER NAME IS MANDATORY 
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME 
SUNNY „ ERROR MESSAGE SERVER NAME IS MFiNDATORY
RDFC1111 Z MED 22 23:18:39 MPHSFHFKD OF THE AAAOAAAY GAAAAAA~ PAGE; 1 
DEFSDLSERD FSGHS CONRFGL CEERTE 
ASDF DFGF ASDA ERDFG REEVHT 
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS 
JHGTR ------- ASDV---------- FIELD IN ERROR ERROR DESCRIPTION 
TYPE JINYNGT HJUBGD N[ID7BER CRT JUR YR INFORMATION NJHGJVGFW -------------------- --------------------- 
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH 
„ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY 
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY 
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH 
AKASHDEEP „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH 
AMANDEEP „ ERROR MESSAGE LOPJUT PKJH IS MFiNDATORY 
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH 
AJAYKARPN „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY 
~#~ 3~ ~•~5~ 9~4 -~o aa<a•„webnuai u~,~~~L Lt~«<.uc ~, 
- w 
J, _ 
si~_o_. ,.._,~r~':L _ ~r.;~~n< r~~ n~in:~r~~P t+m5'1' P~ EQ~rc~. 'r0 ~S~ta`~I~ 7 
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH 

we can use a cmd file where you could pre filter out the bad lines but here I am accepting it is every line with a valid 3160 and you don't need the second line starting echo: its just for explanation of search result

@echo off & SETLOCAL EnableDelayedExpansion
echo: & echo      Filtered lines with 3160 & echo: & findstr /c:3160 input.txt & echo: & echo      Modified filtered output & echo:
for /f "tokens=1,2,3,4,5,6,7 usebackq" %%A in (`type input.txt ^|findstr " 3160"`) do @if %%C==3160 (set "num=%%F" & echo %%B C!num:~1,10! ) ELSE ( if %%D==3160 (set "num=%%G" & echo %%C C!num:~1,10!))

Result (including what seems to be one rogue one, and one bad one) NOTE I left the length in both 2nd cases as 10! to highlight the error one but you will need to change both to 7! :-)


     Filtered lines with 3160

RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH

     Modified filtered output

MK12001 C2100123001
MK13103 C2100124001
MRDOOP C2100523700
JPPK C2123133001
SDFGTE: CGHJINCIAL
MK56502 C2156334001
CD15622 C3214114001
AK12102 C2652224001
HJPOOL C6585527001
RAAK C2197133001
NUMJ13 C5501950001
AY51619 C2177107001

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lee_Dailey
Solution 2
Solution 3 G R
Solution 4