'how to regex chinese characters in C#?

Am trying to use the regex in C# to match chinese characters.

\p{Han}+

However, C# fails to run, saying Unknown property Han



Solution 1:[1]

Theoretically we can accomplish the requirement by Unicode Script of regular expression.

But, C# doesn't support Unicode Script (but Unicode Categories are fine.)

It'll throw ArgumentException like this:

[System.ArgumentException: parsing "\p{Han}+" - Unknown property 'Han'.]

at System.Text.RegularExpressions.RegexCharClass.SetFromProperty(String capname, Boolean invert, String pattern)
at System.Text.RegularExpressions.RegexCharClass.AddCategoryFromName(String categoryName, Boolean invert, Boolean caseInsensitive, String pattern)
at System.Text.RegularExpressions.RegexParser.ScanBackslash()
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, TimeSpan matchTimeout, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern)

Detailed infos are referenced here.

Solution 2:[2]

In .Net, you need to prepend Is to Unicode block properties.

I don't know what the corresponding block is for Han, or if it's supported, but you can try:

\p{IsHan}+

See MSDN for a list of supported types.

This works for other alphabets. See an example for Greek and Latin.

Solution 3:[3]

Solution 4:[4]

This might work:

\p{L}

That would allow letters from any alphabet, if you want only Chinese character (no English ones) then I may need more time.

Also I am assuming you are using Regex correctly, test this code with \p{Han}+ to see if it still does not work.

        Regex regex = new Regex(@"\p{Han}+");///the requirement.
        Match match = regex.Match("YourString");
        if (match.Success)
        {
            Console.WriteLine("MATCH VALUE: " + match.Value);
        }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 marc_s
Solution 2 alelom
Solution 3 H.M Keh
Solution 4 Jacob Cummins