'How can I filter out text twice in Powershell?
I have a Powershell script that returned an output that's close to what I want, however there are a few lines and HTML-style tags I need to remove. I already have the following code to filter out:
get-content "atxtfile.txt" | select-string -Pattern '<fields>' -Context 1
However, if I attempt to pipe that output into a second "select-string", I won't get any results back. I was looking at the REGEX examples online, but most of what I've seen involves the use of coding loops to achieve their objective. I'm more used to the Linux shell where you can pipe output into multiple greps to filter out text. Is there a way to achieve the same thing or something similar with PowerShell? Here's the file I'm working with as requested:
<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.force.com/2006/04/metadata">
<actionOverrides>
<actionName>Accept</actionName>
<type>Default</type>
</actionOverrides>
<actionOverrides>
<actionName>CancelEdit</actionName>
<type>Default</type>
</actionOverrides>
<actionOverrides>
<actionName>Today</actionName>
<type>Default</type>
</actionOverrides>
<actionOverrides>
<actionName>View</actionName>
<type>Default</type>
</actionOverrides>
<compactLayoutAssignment>SYSTEM</compactLayoutAssignment>
<enableFeeds>false</enableFeeds>
<fields>
<fullName>ActivityDate</fullName>
</fields>
<fields>
<fullName>ActivityDateTime</fullName>
</fields>
<fields>
<fullName>Guid</fullName>
</fields>
<fields>
<fullName>Description</fullName>
</fields>
</CustomObject>
So, I only want the text between the <fullName> descriptor and I have the following so far:
get-content "txtfile.txt" | select-string -Pattern '<fields>' -Context 1
This will give me everything between the <fields> descriptor, however I essentially need the <fullName> line without the XML tags.
Solution 1:[1]
The simplest PSv3+ solution is to use PowerShell's built-in XML DOM support, which makes an XML document's nodes accessible as a hierarchy of objects with dot notation:
PS> ([xml] (Get-Content -Raw txtfile.txt)).CustomObject.fields.fullName
ActivityDate
ActivityDateTime
Guid
Description
Note how even though .fields is an array - representing all child <fields> elements of top-level element <CustomObject> - .fullName was directly applied to it and returned the values of child elements <fullName> across all array elements (<field> elements) as an array.
This ability to access a property on a collection and have it implicitly applied to the collection's elements, with the results getting collected in an array, is a generic PSv3+ feature called member-access enumeration.
As an alternative, consider using the Select-Xml cmdlet (available in PSv2 too), which supports XPath queries that generally allow for more complex extraction logic (though not strictly needed here); Select-Xml is a high-level wrapper around the [xml] .NET type's .SelectNodes() method.
The following is the equivalent of the solution above:
$namespaces = @{ ns="http://soap.force.com/2006/04/metadata" }
$xpathQuery = '/ns:CustomObject/ns:fields/ns:fullName'
(Select-Xml -LiteralPath txtfile.txt $xpathQuery -Namespace $namespaces).Node.InnerText
Note:
Unlike with dot notation, XML namespaces must be considered when using Select-Xml.
Given that <CustomObject> and all its descendants are in namespace xmlns, identified via URI http://soap.force.com/2006/04/metadata, you must:
- define this namespace in a hashtable you pass as the
-Namespaceargument- Caveat: Default namespace
xmlnsis special in that it cannot be used as the key in the hashtable; instead, choose an arbitrary key name such asns, but be sure to use that chosen key name as the node-name prefix (see next point).
- Caveat: Default namespace
- prefix all node names in the XPath query with the namespace name followed by
:; e.g.,ns:CustomObject
Solution 2:[2]
Ok. So if you have that file then:
[xml]$xml = Get-Content atextfile.txt
$xml.CustomObject.fields | select fullname
Solution 3:[3]
mklement0 has provided the best solution to the problem. But to answer the question about filtering text twice using Select-String.
If we pipe the results of Select-String into Out-String -Stream we can pass it to Select-String again.
This can all be done on one line but I used a variable to try and make it more readable.
$Match = Get-Content "atxtfile.txt" | Select-String -Pattern '<fields>' -Context 1
$Match | Out-String -Stream | Select-String -Pattern "Guid"
If we pipe $match to Get-Member, we will find a couple of interesting properties.
$Match.Matches.Value
This will display all the instances of <fields> (the pattern match).
$Matches.Context.PostContext
$Matches.Context.PreContext
This will contain the lines before and after <fields> (the context before and after).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | EBGreen |
| Solution 3 | mklement0 |
