'How to replace all files NAME in A folder with character "_" if the character ASCII encoding is greater than 128 use powershell
The example file name is PO 2171 Cresco REVISED.pdf ..... Many of these files, the file name is not standard, the space position is not fixed. The middle space is characters ASCII code greater than 128, and I want to replace characters ASCII code greater than 128 with "_" one-time.
I haven't learned Powershell yet. Thank you very much.
Solution 1:[1]
Theo's answer is effective, but there's a simpler, more direct solution, using the .NET regex Unicode code block \p{IsBasicLatin}, which directly matches any ASCII-range Unicode character (all .NET strings are Unicode strings, internally composed of UTF-16 code units).
Its negation, \P{IsBasicLatin} (note the uppercase P), matches any character outside the ASCII range, so that you can use the following to replace all non-ASCII-range characters with _, with the help of the regex-based -replace operator:
(Get-ChildItem -File) | # Get all files in the current dir.
Rename-Item -NewName { $_.Name -replace '\P{IsBasicLatin', '_' } -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Note:
Enclosing the
Get-ChildItemcall in(...)ensures that all matching files are collected first, before renaming is performed. This prevents problems that could arise from already-renamed files re-entering the enumeration of files.Since only files (
-File) are to be renamed, you needn't worry about file names that do not contain non-ASCII-range characters:Rename-Itemquietly ignores attempts to rename files to the name they already have.- Unfortunately, this is not true for directories, where such an attempt causes an error; this unfortunate discrepancy, present as of PowerShell 7.2.4, is the subject of GitHub issue #14903.
Strictly speaking, .NET characters (
[char](System.Char) instances) are 16-bit Unicode code units (UTF-16), which can individually only represent a complete Unicode character in the so-called BMP (Basic Multilingual Plane), i.e. in the code-point range0x0-0xFFFF. Unicode characters beyond that range, notably emoji such as ?, require representation by two .NET[char]instances, so-called surrogate pairs. Therefore, the above solution replaces such characters with two_characters, as the following example demonstrates:PS> 'A ?!' -replace '\P{IsBasicLatin}', '_' A __! # !! *two* '_' chars.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
