'C++: Clang v12 erroneous AST column numbering; non-ASCII string characters converted to two bytes

I have generated an AST for the following test code:

#include <stdexcept>
int main()
{
throw std::runtime_error("Îļþ");
throw std::runtime_error("1234");
}

I have some questions about the four specific AST lines below, with the first two being for the first throw statement and the second two being for the second throw statement:

|   |-ExprWithCleanups 0x217cd583800 <line:4:1, col:36> 'void'
|   |               `-StringLiteral 0x217cd5836e8 <col:26> 'const char [9]' lvalue "\303\216\303\204\302\274\303\276"

|   `-ExprWithCleanups 0x217cd5839f0 <line:5:1, col:32> 'void'
|                   `-StringLiteral 0x217cd5838d8 <col:26> 'const char [5]' lvalue "1234"
  1. The AST indicates that the statement on line 4 extends to column 36, whereas it is indicating column 32 for the statement on line 5. However, the code for both statements clearly only extends to column 32. This presents a significant problem when trying to parse items in the source code based on the AST information. Is this a bug?

  2. It appears that the characters in the string literal in the first case are occupying two bytes even though none of their values exceed 255 (decimal). Is there a way to force characters having values <= 255 to only occupy 1 byte?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source