'Extracting multiple values from text file
New to PowerShell and trying hard to get something out of the text file.
Here is sample data:
RDFC1111 Z MED 22 23:18:39 MPHSFHFKD OF THE AAAOAAAY GAAAAAA~ PAGE; 1
DEFSDLSERD FSGHS CONRFGL CEERTE
ASDF DFGF ASDA ERDFG REEVHT
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS
JHGTR ------- ASDV---------- FIELD IN ERROR ERROR DESCRIPTION
TYPE JINYNGT HJUBGD N[ID7BER CRT JUR YR INFORMATION NJHGJVGFW -------------------- ---------------------
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH
„ ERROR MESSAGE LOPJUT PKJH IS MANDATORY
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH
AKASHDEEP „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH
AMANDEEP „ ERROR MESSAGE LOPJUT PKJH IS MFiNDATORY
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH
AJAYKARPN „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY
~#~ 3~ ~•~5~ 9~4 -~o aa<a•„webnuai u~,~~~L Lt~«<.uc ~,
- w
J, _
si~_o_. ,.._,~r~':L _ ~r.;~~n< r~~ n~in:~r~~P t+m5'1' P~ EQ~rc~. 'r0 ~S~ta`~I~ 7
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH
I'm trying to extract the bold fields from each line.
The output I'm looking for:
MK12001 C2100123
MK13103 C2100124
MRDOOP C21005237
JPPK C2123133
What I'm getting currently:
MK12001 3160
C2100123
MK13103 3160
C2100124
MRDOOP 3160
C21005237
JPPK 3160
C2123133
Problem:
Because I'm using "3160" for my match criteria for my first field so it's showing up in the results as well and for my second field which is a ticket# (C1234567), due to the use of "pipe" operator or "second" search/match criteria its going in the next line, If someone can help me to keep the ticket # in the same field then I guess I can live with having "3160" in between so it will look like
MK12001 3160 C2100123
or if someone can suggest me to only display the bold fields i.e before the 3160 then that would be awesome.
MK12001 C2100123
P.S: with my script, I'm already changing the "0" to "C" in ticket # field (C1234567)
Here is the code so far:
#Location of original file
$Location = "C:\Temp\Ap8.txt"
#Location of file where the "0" is replaced with "C"
$Location2 = "C:\Temp\results9.txt"
#Final results
$Location3 = "C:\Temp\tickets9.txt"
#get the original file
$Change = Get-Content $Location
# replace C with 0
$Change | ForEach-Object {$_ -Replace "3160 311 50 0", "3160 311 50 C"} |
#write the results to staging file
Set-Content $Location2
#get the staging/udpated file
Get-Content $Location2 -Raw |
#look up for a specific fileds, I have to fetch two fields from each line therefore using pipe operator inbetween
Select-String "\s\w{1,8}\s3160| C\d{7}" -AllMatches |
% { $_.Matches.Groups.Value } |
Out-File $Location3 -Encoding ascii -Force
Solution 1:[1]
i am truly bad with complex regex patterns, so this is done with string operators and only very simple regex patterns. [grin]
the code ...
#region >>> fake reading in a text file
# when ready to do thiw with real data, use Get-Content
$InStuff = @'
RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET„ ERROR MESSAGE SERVER LOCAL IS MANDATORY
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY„ ERROR MESSAGE SERVER LOCAL IS MANDATORY
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME PETER „ ERROR MESSAGE SERVER NAME IS MANDATORY
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME SUNNY „ ERROR MESSAGE SERVER NAME IS MFiNDATORY
'@ -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file
$Result = foreach ($IS_Item in $InStuff)
{
$TempBlock = ($IS_Item -split 'server')[0].trim()
$First = (($TempBlock -split '3160')[0].Trim().Split())[-1]
$Second = (($TempBlock -split '3160')[1].Trim().Split())[-1] -replace '\d{3}$' -replace '^0', 'C'
'{0} {1}' -f $First, $Second
}
$Result
output ...
MK12001 C2100123
MK13103 C2100124
MRDOOP C21005237
JPPK C2123133
what it does ...
- fakes reading in a text file
when doing this with real data, useGet-Content. - iterates thru the collection of lines
- grabs the block that has the wanted data
- splits out, trims, and saves the 1st target value
- splits out, trims, and saves the 2nd target value
- builds the output string from the 2 above values
- sends that to the
$Resultcollection - shows that collection on screen
Solution 2:[2]
I suggest a regex approach based on the -match and -replace operators:
# The substring that the lines of interest must contain
# Note:
# [regex]::Escape() escapes the literal string so that the regex
# engine uses it literally - which isn't strictly necessary in this case.
# Alternatively, omit [regex]::Escape() and formulate the string
# *as a regex* to begin with.
$searchStr = [regex]::Escape(' 3160 311 50 ')
# Filter the lines down to those of interest with -match,
# then use -replace to extract the tokens on either side of the search string.
@(Get-Content $Location) `
-match $searchStr `
-replace "^.+ (\w+)${searchStr}0(\w+)\d{3} .+$", '$1 C$2'
The above outputs to the display; pipe to Out-File (or, with text input, preferably, Set-Content) as needed.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Output with your sample data:
MK56502 C2156334
CD15622 C3214114
AK12102 C2652224
HJPOOL C6585527
RAAK C2197133
NUMJ13 C5501950
AY51619 C2177107
Alternatively, use a switch statement, which requires only one regex operation, but requires calling via a script block (& { ... }) in order to be able to pipe to a file-writing cmdlet:
& {
switch -Regex -File $Location {
' (\w+) 3160 311 50 0(\w+)\d{3} ' { '{0} C{1}' -f $Matches[1], $Matches[2] }
}
} # | Set-Content ...
Solution 3:[3]
Thanks, everyone for your help. Obviously, sometimes it's hard to post the original data (privacy) and also to fully express the problem but I'm so glad to be part of this community as everyone tries their best to help.
Original Problem: We receive a hard copy of a pdf, then we scan it and convert it using the OCR functionality of the printer and then convert it into a text file. But during the whole process, I lose some content and some typos are created. and I want to fetch two fields out of the whole text files which are available in random files.
The first solution is working to fetch all the good matching tickets but doesn't include typo's in the output
The second solution gives me all the data including the good matching tickets as well as tickets that are not matching but it doesn't allow me to add an extra exception by which I can add a condition to lower the number of bad/typo tickets.
#Solution 1:
#Location of original file
$Location = "C:\Temp\Ap8.txt"
#Location of file where the "0" is replaced with "C"
$Location2 = "C:\Temp\file2.txt"
#Location of the file where the in-between string ' 3160 311 50 ' is replaced with nothing so that you are left with only the fields you need.
$Location3 = "C:\Temp\file3.txt"
#Add the final results
$Location4 = "C:\Temp\file4.txt"
#get the original file
$Change = Get-Content $Location
# replace C with 0
$Change | ForEach-Object {$_ -Replace "3160 311 50 0", "3160 311 50 C"} |
#write the results to staging file
Set-Content $Location2
#get the staging/udpated file
Get-Content $Location2 -Raw |
#look up for a specific fileds, I have to fetch two fields from each line therefore using pipe operator inbetween
Select-String "\s\w{1,8} 3160 311 50 \w{1,8}" -AllMatches |
#Select-String "\s\w{1,8} 3160 311 50 \w{7}" -AllMatches |
% { $_.Matches.Groups.Value } |
#Set-Content $Location3
# to write the content on location3 the code line just below also works start with out-file
Out-File $Location3 -Encoding ascii -Force
$PlateTicket = Get-Content $Location3
$PlateTicket | ForEach-Object {$_ -Replace " 3160 311 50", ""} |
Set-Content $Location4 `
#Solution 2:
# The substring that the lines of interest must contain
$Location = "C:\Temp\Ap8.txt"
$Location2 = "C:\Temp\Results.txt"
$searchStr = [regex]::Escape(' 3160 311 50 ')
@(Get-Content $Location) `
-match $searchStr `
-replace "^.+ (\w+)${searchStr}0(\w+)\d{3} .+$", '$1 C$2'|
Set-Content $Location2
Credits to all the contributors/ original posters.
Thanks again, everyone. Just posting a summary, it's not my original work.
Solution 4:[4]
you could use a "simpler" one line command
So given your both example as input.txt
RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET
„ ERROR MESSAGE SERVER LOCAL IS MANDATORY
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY
„ ERROR MESSAGE SERVER LOCAL IS MANDATORY
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME
PETER „ ERROR MESSAGE SERVER NAME IS MANDATORY
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME
SUNNY „ ERROR MESSAGE SERVER NAME IS MFiNDATORY
RDFC1111 Z MED 22 23:18:39 MPHSFHFKD OF THE AAAOAAAY GAAAAAA~ PAGE; 1
DEFSDLSERD FSGHS CONRFGL CEERTE
ASDF DFGF ASDA ERDFG REEVHT
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS
JHGTR ------- ASDV---------- FIELD IN ERROR ERROR DESCRIPTION
TYPE JINYNGT HJUBGD N[ID7BER CRT JUR YR INFORMATION NJHGJVGFW -------------------- ---------------------
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH
„ ERROR MESSAGE LOPJUT PKJH IS MANDATORY
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY
„ ERROR MESSAGE LOPJUT ADDRESS IS MANDATORY
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH
AKASHDEEP „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH
AMANDEEP „ ERROR MESSAGE LOPJUT PKJH IS MFiNDATORY
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH
AJAYKARPN „ ERROR MESSAGE LOPJUT PKJH IS MANDATORY
~#~ 3~ ~•~5~ 9~4 -~o aa<a•„webnuai u~,~~~L Lt~«<.uc ~,
- w
J, _
si~_o_. ,.._,~r~':L _ ~r.;~~n< r~~ n~in:~r~~P t+m5'1' P~ EQ~rc~. 'r0 ~S~ta`~I~ 7
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH
we can use a cmd file where you could pre filter out the bad lines but here I am accepting it is every line with a valid 3160 and you don't need the second line starting echo: its just for explanation of search result
@echo off & SETLOCAL EnableDelayedExpansion
echo: & echo Filtered lines with 3160 & echo: & findstr /c:3160 input.txt & echo: & echo Modified filtered output & echo:
for /f "tokens=1,2,3,4,5,6,7 usebackq" %%A in (`type input.txt ^|findstr " 3160"`) do @if %%C==3160 (set "num=%%F" & echo %%B C!num:~1,10! ) ELSE ( if %%D==3160 (set "num=%%G" & echo %%C C!num:~1,10!))
Result (including what seems to be one rogue one, and one bad one) NOTE I left the length in both 2nd cases as 10! to highlight the error one but you will need to change both to 7! :-)
Filtered lines with 3160
RD32 MK12001 3160 211 50 02100123001 SERVER LOCAL - STREET
RD32 MK13103 3160 211 50 02100124001 SERVER LOCAL - CITY
RD32 J4834-00009-92051 MRDOOP 3160 211 50 021005237001 SERVER GIVEN NAME
RD32 B5509-00000-00522 JPPK 3160 211 50 02123133001 SERVER GIVEN NAME
QWERTY SDFGTE: 3160 - ASDMPBVC FGHJINCIAL OFGTFDSS
PD31 MK56502 3160 311 50 02156334001 LOPJUT SURPKJH
PD31 CD15622 3160 311 50 03214114001 LOPJUT ADDRESS - STREET
PD31 AK12102 3160 311 50 02652224001 LOPJUT ADDRESS - CITY
PD31 A4833-00009-61001 HJPOOL 3160 311 50 06585527001 LOPJUT GIVEN PKJH
PD31 A5709-00000-00322 RAAK 3160 311 50 02197133001 LOPJUT GIVEN PKJH
PD31 A4781-00009-90503 NUMJ13 3160 311 50 05501950001 LOPJUT GIVEN PKJH
PD31 50394-00008-80406 AY51619 3160 311 50 02177107001 LOPJUT GIVEN PKJH
Modified filtered output
MK12001 C2100123001
MK13103 C2100124001
MRDOOP C2100523700
JPPK C2123133001
SDFGTE: CGHJINCIAL
MK56502 C2156334001
CD15622 C3214114001
AK12102 C2652224001
HJPOOL C6585527001
RAAK C2197133001
NUMJ13 C5501950001
AY51619 C2177107001
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Lee_Dailey |
| Solution 2 | |
| Solution 3 | G R |
| Solution 4 |
