'How to replace all files NAME in A folder with character "_" if the character ASCII encoding is greater than 128 use powershell

The example file name is PO 2171 Cresco REVISED.pdf ..... Many of these files, the file name is not standard, the space position is not fixed. The middle space is characters ASCII code greater than 128, and I want to replace characters ASCII code greater than 128 with "_" one-time.

I haven't learned Powershell yet. Thank you very much.



Solution 1:[1]

Theo's answer is effective, but there's a simpler, more direct solution, using the .NET regex Unicode code block \p{IsBasicLatin}, which directly matches any ASCII-range Unicode character (all .NET strings are Unicode strings, internally composed of UTF-16 code units).

Its negation, \P{IsBasicLatin} (note the uppercase P), matches any character outside the ASCII range, so that you can use the following to replace all non-ASCII-range characters with _, with the help of the regex-based -replace operator:

(Get-ChildItem -File) |  # Get all files in the current dir.
  Rename-Item -NewName { $_.Name -replace '\P{IsBasicLatin', '_' } -WhatIf

Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.

Note:

  • Enclosing the Get-ChildItem call in (...) ensures that all matching files are collected first, before renaming is performed. This prevents problems that could arise from already-renamed files re-entering the enumeration of files.

  • Since only files (-File) are to be renamed, you needn't worry about file names that do not contain non-ASCII-range characters: Rename-Item quietly ignores attempts to rename files to the name they already have.

    • Unfortunately, this is not true for directories, where such an attempt causes an error; this unfortunate discrepancy, present as of PowerShell 7.2.4, is the subject of GitHub issue #14903.
  • Strictly speaking, .NET characters ([char] (System.Char) instances) are 16-bit Unicode code units (UTF-16), which can individually only represent a complete Unicode character in the so-called BMP (Basic Multilingual Plane), i.e. in the code-point range 0x0-0xFFFF. Unicode characters beyond that range, notably emoji such as ?, require representation by two .NET [char] instances, so-called surrogate pairs. Therefore, the above solution replaces such characters with two _ characters, as the following example demonstrates:

      PS> 'A ?!' -replace '\P{IsBasicLatin}', '_'
    
      A __!        # !! *two* '_' chars.
    

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1