'Friendly Error Report on Parse using Nearley.js
Is it possible to render user friendly parser errors using Nearley.js?
const parser = new nearley.Parser((bracketexpr_grammar));
parse(): void{
try {
parser.feed(this._sql);
this._rawData = parser.results[0];
} catch(error){
console.log(error);
this._errors.push(error.offset);
}
}
What I tried:
error.offset
: Only display the line the error happened (not what I want).error
: it gives me a giant error, example:
.
Invalid syntax at line 1 col 1:
b4d455
^
Unexpected "b"
Instead of a "b", I was expecting to see one of the following:
A "#" based on:
csscolor → ● "#" hexdigit hexdigit hexdigit hexdigit hexdigit hexdigit
A "#" based on:
csscolor → ● "#" hexdigit hexdigit hexdigit
A "r" based on:
csscolor$string$1 → ● "r" "g" "b"
csscolor → ● csscolor$string$1 _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ ")"
A "h" based on:
csscolor$string$2 → ● "h" "s" "l"
csscolor → ● csscolor$string$2 _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ ")"
A "r" based on:
csscolor$string$3 → ● "r" "g" "b" "a"
csscolor → ● csscolor$string$3 _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ "," _ decimal _ ")"
A "h" based on:
csscolor$string$4 → ● "h" "s" "l" "a"
csscolor → ● csscolor$string$4 _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ "," _ decimal _ ")"
Instead of that giant piece of error, I want something more simple and clean like this:
Invalid syntax at line 1 col 1:
b4d455
^
Unexpected "b"
Is is possible?
Solution 1:[1]
This extended error messaging is from a change to Nearley
made in the change User-friendly Error Reporting Feature v1. It does not appear this is currently configurable. It might be worth opening an Issue with Nearley if you are interested in requesting this feature.
I had the same problem when upgrading to a version with this new feature. As a workaround it is possible to override the error-reporting to use the prior implementation, although this is not documented and could break especially when upgrading Nearley
.
const parser = new Parser(...);
// Nearley error message has been extended to show all possible correct parses.
// As a workaround, previous implementation has been overwritten here instead.
// When Nearley allows turning off this extended error messaging, remove this workaround.
parser.reportError = function(token) {
var message = this.lexer.formatError(token, 'invalid syntax') + '\n';
message += 'Unexpected ' + (token.type ? token.type + ' token: ' : '');
message +=
JSON.stringify(token.value !== undefined ? token.value : token) + '\n';
return message;
};
Solution 2:[2]
In version 2.20.1
of Nearley, the Error object has an attribute token
than can be used to simplify the message. In the example below we make use of a RegExp to traverse the message
attribute of the error and add the expected tokens to the message.
The RegExp is based on the observation that in Nearley JS error message, as you can see in the question above, there are many repetitions of the A "<something>" based on:
pattern (that for named tokens changes to A <something> token based on:
)
function parseFromFile(origin) {
try {
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
const source = fs.readFileSync(origin, 'utf8');
parser.feed(source);
let results = parser.results;
if (results.length > 1) throw new Error(`Language Design Error: Ambiguous Grammar! Generated ${results.length}) ASTs`);
if (results.length == 0) {
console.error("Unexpected end of Input error. Incomplete Egg program. Expected more input");
process.exit(1);
}
const ast = results[0];
return ast;
}
catch(e) {
let token = e.token;
let message = e.message;
let expected = message.match(/(?<=A ).*(?= based on:)/g).map(s => s.replace(/\s+token/i,''));
let newMessage = `Unexpected ${token.type} token "${token.value}" `+
`at line ${token.line} col ${token.col}.`;
if (expected && expected.length) newMessage += ` Tokens expected: ${[...new Set(expected)]}`;
throw new Error(newMessage)
}
}
When executed with an erroneous input the message is simplified to:
? egg-oop-parser-solution git:(master) ? bin/eggc.js test/errors/unexpected-token.egg
Unexpected LCB token "{" at line 1 col 2. Tokens expected: "(","[",".",EOF
Another related idea with error management is to introduce in your Grammar production rules for specific error situations with an associated semantic action that deals with the error. For more info see this section
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | MatthewG |
Solution 2 |