'how to regex chinese characters in C#?
Am trying to use the regex in C# to match chinese characters.
\p{Han}+
However, C# fails to run, saying Unknown property Han
Solution 1:[1]
Theoretically we can accomplish the requirement by Unicode Script of regular expression.
But, C# doesn't support Unicode Script (but Unicode Categories are fine.)
It'll throw ArgumentException like this:
[System.ArgumentException: parsing "\p{Han}+" - Unknown property 'Han'.]
at System.Text.RegularExpressions.RegexCharClass.SetFromProperty(String capname, Boolean invert, String pattern)
at System.Text.RegularExpressions.RegexCharClass.AddCategoryFromName(String categoryName, Boolean invert, Boolean caseInsensitive, String pattern)
at System.Text.RegularExpressions.RegexParser.ScanBackslash()
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, TimeSpan matchTimeout, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern)
Detailed infos are referenced here.
Solution 2:[2]
In .Net, you need to prepend Is to Unicode block properties.
I don't know what the corresponding block is for Han, or if it's supported, but you can try:
\p{IsHan}+
See MSDN for a list of supported types.
This works for other alphabets. See an example for Greek and Latin.
Solution 3:[3]
dotnet platform regex match chinese characters:
\p{IsCJKUnifiedIdeographs}+
Solution 4:[4]
This might work:
\p{L}
That would allow letters from any alphabet, if you want only Chinese character (no English ones) then I may need more time.
Also I am assuming you are using Regex correctly, test this code with \p{Han}+ to see if it still does not work.
Regex regex = new Regex(@"\p{Han}+");///the requirement.
Match match = regex.Match("YourString");
if (match.Success)
{
Console.WriteLine("MATCH VALUE: " + match.Value);
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | marc_s |
| Solution 2 | alelom |
| Solution 3 | H.M Keh |
| Solution 4 | Jacob Cummins |
