'Powershell slowness in "get-childitem . -directory -recurse" when there are lots of files
I run:
PS F:\> gci F:\logs\PRV_RequestLogs\inbound -r -directory | %{ $_.fullname }
and it shows:
F:\logs\PRV_RequestLogs\inbound\2020-02-03
F:\logs\PRV_RequestLogs\inbound\2020-02-04
...
F:\logs\PRV_RequestLogs\inbound\2022-05-09
F:\logs\PRV_RequestLogs\inbound\2022-05-10
then it "hangs" there.
Then I run the following command in another windows trying to find what is going on:
PS F:\> C:\temp\handle64.exe -p 3204
and I found:
ACC: File (RWD) F:\logs\PRV_RequestLogs\inbound\2020-04-28
...
F08: File (RWD) F:\logs\PRV_RequestLogs\inbound\2020-04-28
and the directory keep changing, so So it traverse each directory tring to find sub-directories, there are none but lots of files.
It took hours to complete the process. Never thought of the process would be so slow. It looks like it going through each file and test if it is a directory. Is there a quicker way to do this? I am using powershell 5.0 on Windows 2012R2.
Solution 1:[1]
Get-ChildItem is known to be slow to traverse directories recursively but it is what is built-in to PowerShell, it's a very handy and easy to use cmdlet. If you're looking for speed and efficiency you may need to default to .NET API calls to IO.Directory.
Haven't tested this enough but I believe it should be working, as in, finding any directory that is empty and is older than 90 days in this case.
Worth noting that this code requires .NET Framework 4+ if running Windows PowerShell.
$queue = [Collections.Generic.Queue[IO.DirectoryInfo]]::new()
$olderThan = [datetime]::Now.AddDays(-90) # => Set limit date here!
$queue.Enqueue('F:\logs\PRV_RequestLogs\inbound') # => Starting path here!
while($queue.Count) {
$dir = $queue.Dequeue()
try {
$isEmpty = $true
foreach ($i in $dir.EnumerateDirectories()) {
if($i.Attributes -band [IO.FileAttributes]::ReparsePoint) {
# skip if it is ReparsePoint
continue
}
$isEmpty = $false
$queue.Enqueue($i)
}
if($isEmpty -and $dir.CreationTime -lt $olderThan -and -not $dir.EnumerateFiles().GetEnumerator().MoveNext()) {
# output the directory if the 3 conditions are `$true`, no folders or files and older than
$dir
}
}
catch {
# if we were unable to enumerate this directory, go next
continue
}
}
Relevant Documentation
Solution 2:[2]
To complement Santiago Squarzon's helpful answer:
In addition to the implementation inefficiency you have discovered, there is another reason Get-ChildItem is slow: it decorates each System.IO.FileInfo / System.IO.DirectoryInfo output object with instance-level ETS (Extended Type System) provider properties - see GitHub issue #7501, which explains the problem and suggests a streamlined class-level solution.
Using .NET APIs directly is indeed needed to work around the performance problems - a general caveat is that you should always use full paths when calling .NET methods, because .NET's working directory usually differs from PowerShell's.
The fast equivalent of:
Get-ChildItem F:\logs\PRV_RequestLogs\inbound -Recurse -Directory | % { $_.fullname }
is the following, which uses the [System.IO.Directory]::EnumerateDirectories() method:
[System.IO.Directory]::EnumerateDirectories(
'F:\logs\PRV_RequestLogs\inbound',
'*',
'AllDirectories' # Equivalent of -Recurse, a [System.IO.SearchOption] enum value
)
Caveats:
Inaccessible directories:
In Windows PowerShell / .NET Framework they invariably cause an exception that aborts the enumeration.
In PowerShell (Core) 7+ / .NET you now have to option to ignore inaccessible directories, via a new method overload that accepts a
[System.IO.EnumerationOptions]instance.
Symlinks / reparse points:
- In Windows PowerShell / .NET Framework they are invariably followed (including by
Get-ChildItem -Recurse). - In PowerShell (Core) 7+ / .NET you now have to option to opt-out (and
Get-ChildItemnow requires opt-in via-FollowSymlink)
- In Windows PowerShell / .NET Framework they are invariably followed (including by
The [System.IO.DirectoryInfo] class too has a .EnumerateDirectories() method, albeit as an instance method; the major difference is that [System.IO.DirectoryInfo] instances rather than path strings are returned:
([System.IO.DirectoryInfo] 'F:\logs\PRV_RequestLogs\inbound').EnumerateDirectories(
'*',
'AllDirectories'
)
You can leverage the above to perform the desired filtering - finding only empty directories that are older than N days - with a minimum of PowerShell code:
$cutOffDate = (Get-Date).Date.AddDays(-30) # adjust as needed.
([System.IO.DirectoryInfo] 'F:\logs\PRV_RequestLogs\inbound').EnumerateDirectories(
'*',
'AllDirectories'
) | Where-Object {
$_.LastWriteTime -lt $cutOffDate -and
0 -eq ($_.EnumerateFileSystemInfos() | Select-Object -First 1).Count
}
Note the use of the .EnumerateFileSystemInfos() method to enumerate all files and subdirectories, followed by selecting - at most - one via Select-Object -First 1, which stops the enumeration as soon as one item has been enumerated. A .Count value of 0 implies that the directory at hand is empty.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
