'Is there a better way to write this incredibly long regex, or perform this error check?

I was trying to find a way to error/style correct a non-standard custom menu file from a decade old video game that I was working on in Notepad++ and this was the best I could come up with.

The below returns any curly brackets that aren't followed by an EOL character or are preceded by anything other than line start and 1-4 tabs, it works fine but seems like it could be a lot more elegant. Any returned brackets are incorrect unless they're the first or last in the file. More tabs are technically okay highly unlikely.

    (?<!^\t)(?<!^\t\t)(?<!^\t\t\t)(?<!^\t\t\t\t)[{}]|[{}](?!\R)

Properly formatted:

    Menu "MenuName"
    {
        Menu "SubMenuName"
        {
            Option "OptionName" "Command"
            Option "OptionName" "Command"
        }
    }
// This is a comment line
// [ curly brackets in comment lines are made square so they don't get counted as balancing

All curly brackets should be on a separate line by themselves with nothing but preceding tabs. They should also be paired but I've got a plugin handling that.

Improperly formatted:

    Menu "MenuName"{
        Menu "SubMenuName"
        {
            Option "OptionName" "Command"
            Option "OptionName" "Command"   }
    }Menu "That bracket would be wrong since the line should end afterwards.
    {   //this would also be wrong
// Nothing should ever follow a bracket except a End Of Line character.

Is there some better way to implement this search/check, considering Notepad++ uses Boost regex and doesn't allow variable-length lookbehinds? Also perhaps keeping in mind that I learned everything I know about regex last night.

The expression also returns the first (no preceding tab) and last (no EOL character) but I'm okay with that particular behavior.


The full content of a file I use as a template:

It loads from a loose file in the data folder, completely as is.

//DO NOT DELETE, needs a line here for some reason.
Menu "MenuNameTEMPLATE"
{
    Title "TitleName"
    Option "OptionName" "Command"
    Divider
    LockedOption
    {
        DisplayName "OptionName"
        Command "Command"
        Icon "IconName"
        PowerReady "PowerIdentifiers"
    }
    LockedOption
    {
        DisplayName "OptionName"
        Command "Command"
        Icon "IconName"
        Badge "BadgeIdentifiers"
    }
    LockedOption
    {
        DisplayName "OptionName"
        Command "Command"
        Icon "IconName"
    }
    Menu "SubMenuName"
    {
        Title "TitleName"
        Option "OptionName" "Command"
        Option "OptionName" "Command"
    }
}


Solution 1:[1]

  • Ctrl+F
  • Find what: ^\h+[{}](?=\h*\R)(*SKIP)(*FAIL)|[{}]
  • CHECK Wrap around
  • CHECK Regular expression
  • Find All in Current Document

Explanation:

  ^                   # beginning of line
    \h+                 # 1 or more horizontal spaces, you can use \t{1,4} if you only want tabulations
    [{}]                # open or close brace
    (?=                 # positive lookahead, make sure we have after:
        \h*                 # 0 or more horizontal spaces
        \R                  # any kind of linebreak
    )                   # end lookahead
    (*SKIP)(*FAIL)      # skip this match and consider that fails
|                   # OR
    [{}]                # open or close brace

Screenshot:

enter image description here

Solution 2:[2]

I want to start by saying that regex is 100% the wrong tool for this, you want a custom parser to handle both validating your file and parsing it into a model you can then use.

However, with the limitations imposed by your question, the following should do it:

^(?:[^{}]*|\t{1,4}[{}])$

Rather than worry about look-arounds, simply match what you expect to find. See it in action here: https://regex101.com/r/nYNqHw/1

Solution 3:[3]

Since you tagged this Boost, and others rightfully remarked you need to not use regexes together with some editor facilities here, here's a starting point for a proper parser using Boost Spirit.

We'll parse into some types:

struct Option { std::string name, command; };
struct Divider { };

struct LockedOption {
    struct Property { std::string key, value; };
    using Properties = std::vector<Property>;
    Properties properties; // DisplayName, Command, Icon, PowerReady, Badge...?
};

struct Menu;
using MenuItem = boost::variant<Option, Divider, LockedOption,
                                boost::recursive_wrapper<Menu>>;
struct Menu {
    std::string                  id;
    boost::optional<std::string> title;
    std::vector<MenuItem>        items;
};

Now, we can define a parser:

namespace Parser {
    using namespace boost::spirit::x3;
    rule<struct menu_rule, Menu> const menu{"menu"};

    auto const qstring = rule<void, std::string>{"quoted string"} = //
        lexeme['"' > *('"' >> char_('"') | ~char_('"')) > '"'];

    auto const option = rule<void, Option>{"option"} = //
        "Option" > qstring > qstring;

    auto property   = (!lit('}')) > lexeme[+graph] > qstring;
    auto properties = rule<void, LockedOption::Properties>{"properties"} =
        *(property > eol);

    auto const lockedoption = rule<void, LockedOption>{"lockedoption"} = //
        "LockedOption" > eol                                             //
        > '{' > eol                                                      //
        > properties                                                     //
        > '}';

    auto divider = rule<void, Divider>{"divider"} = //
        lit("Divider") >> attr(Divider{});

    auto item = rule<void, MenuItem>{"menu|option|lockedoption|divider"} =
        menu | option | lockedoption | divider;

    auto title = "Title" > qstring;

    auto menu_def =            //
        "Menu" > qstring > eol //
        > '{' > eol            //
        > -(title > eol)       //
        > *(item > eol)        //
        > '}';

    auto ignore = blank | "//" >> *~char_("\r\n") >> (eol|eoi);

    BOOST_SPIRIT_DEFINE(menu)

    Menu parseMenu(std::string const& text) try {
        Menu result;
        parse(begin(text), end(text), skip(ignore)[expect[menu] > *eol > eoi],
              result);
        return result;
    } catch (expectation_failure<std::string::const_iterator> const& ef) {
        throw std::runtime_error(
            "At " + std::to_string(std::distance(begin(text), ef.where())) +
            ": Expected " + ef.which() + " (Got '" +
            std::string(ef.where(), std::find(ef.where(), end(text), '\n')) +
            "')");
    }
} // namespace Parser

All we need to use is the parseMenu function, which returns the parsed Menu. So if we wire up the samples from your question:

for (std::string const& sample :
     {
        // ... 
     })
{
    static int i = 0;
    fmt::print("----- {}\n", ++i);
    try {
        fmt::print("Parsed: {}\n", Parser::parseMenu(sample));
    } catch (std::exception const& e) {
        std::cout << "Parse failed: " << e.what() << "\n";
    }
}

We can get the output (see live demo below):

----- 1
Parsed: Menu "MenuName"
{
    Title ""
    Menu "SubMenuName"
{
    Title ""
    Option "OptionName" "Command"
    Option "OptionName" "Command"
}
}
----- 2
Parse failed: At 19: Expected eol (Got '{')
----- 3
Parsed: Menu "MenuNameTEMPLATE"
{
    Title "TitleName"
    Option "OptionName" "Command"
    Divider
    LockedOption
{
    DisplayName "OptionName"
    Command "Command"
    Icon "IconName"
    PowerReady "PowerIdentifiers"
}
    LockedOption
{
    DisplayName "OptionName"
    Command "Command"
    Icon "IconName"
    Badge "BadgeIdentifiers"
}
    LockedOption
{
    DisplayName "OptionName"
    Command "Command"
    Icon "IconName"
}
    Menu "SubMenuName"
{
    Title "TitleName"
    Option "OptionName" "Command"
    Option "OptionName" "Command"
}
}
----- 4
Parse failed: At 70: Expected menu|option|lockedoption|divider (Got '                    Road Rage')

LIVE DEMO

On Compiler Explorer: https://godbolt.org/z/sW3Y5z9nq

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <map>
#include <fmt/ranges.h>

struct Option { std::string name, command; };
struct Divider { };

struct LockedOption {
#if 1
    struct Property { std::string key, value; };
    using Properties = std::vector<Property>;
#else
    using Properties = std::map<std::string, std::string>;
#endif
    Properties properties; // DisplayName, Command, Icon, PowerReady, Badge...?
};

struct Menu;
using MenuItem = boost::variant<Option, Divider, LockedOption,
                                boost::recursive_wrapper<Menu>>;
struct Menu {
    std::string                  id;
    boost::optional<std::string> title;
    std::vector<MenuItem>        items;
};

#ifdef BOOST_SPIRIT_X3_DEBUG
    [[maybe_unused]] std::ostream& operator<<(std::ostream& os, Option)       { return os << "Option";       }
    [[maybe_unused]] std::ostream& operator<<(std::ostream& os, Divider)      { return os << "Divider";      }
    [[maybe_unused]] std::ostream& operator<<(std::ostream& os, LockedOption) { return os << "LockedOption"; }
    [[maybe_unused]] std::ostream& operator<<(std::ostream& os, Menu)         { return os << "Menu";         }
#endif

struct MenuItemFormatter : fmt::formatter<std::string> {
    template <typename... Ts>
    auto format(boost::variant<Ts...> const& var, auto& ctx) const {
        return boost::apply_visitor(
            [&](auto const& el) {
                return format_to(ctx.out(), fmt::runtime("{}"), el);
            }, var);
    }

    auto format(LockedOption const& lo, auto& ctx) const {
        auto out = fmt::format_to(ctx.out(), "LockedOption\n{{\n");
        for (auto const& [k, v] : lo.properties)
            out = fmt::format_to(out, "    {} \"{}\"\n", k, v);

        return fmt::format_to(out, "}}");
    }

    auto format(Divider const&, auto& ctx) const {
        return fmt::format_to(ctx.out(), "Divider");
    }

    auto format(Option const& o, auto& ctx) const {
        return fmt::format_to(ctx.out(), "Option \"{}\" \"{}\"", o.name, o.command);
    }

    auto format(Menu const& m, auto& ctx) const {
        return fmt::format_to(
            ctx.out(), "Menu \"{}\"\n{{\n    Title \"{}\"\n    {}\n}}", m.id,
            m.title.value_or(""), fmt::join(m.items, "\n    "));
    }
};

template <> struct fmt::formatter<MenuItem>     : MenuItemFormatter{};
template <> struct fmt::formatter<LockedOption> : MenuItemFormatter{};
template <> struct fmt::formatter<Divider>      : MenuItemFormatter{};
template <> struct fmt::formatter<Option>       : MenuItemFormatter{};
template <> struct fmt::formatter<Menu>         : MenuItemFormatter{};

BOOST_FUSION_ADAPT_STRUCT(Option, name, command)
BOOST_FUSION_ADAPT_STRUCT(LockedOption, properties)
BOOST_FUSION_ADAPT_STRUCT(LockedOption::Property, key, value)
BOOST_FUSION_ADAPT_STRUCT(Menu, id, title, items)

    namespace Parser {
        using namespace boost::spirit::x3;
        rule<struct menu_rule, Menu> const menu{"menu"};

        auto const qstring = rule<void, std::string>{"quoted string"} = //
            lexeme['"' > *('"' >> char_('"') | ~char_('"')) > '"'];

        auto const option = rule<void, Option>{"option"} = //
            "Option" > qstring > qstring;

        auto property   = lexeme[+graph] > qstring;
        auto properties = rule<void, LockedOption::Properties>{"properties"} =
            *((!lit('}')) > property > eol);

        auto const lockedoption = rule<void, LockedOption>{"lockedoption"} = //
            "LockedOption" > eol                                             //
            > '{' > eol                                                      //
            > properties                                                     //
            > '}';

        auto divider = rule<void, Divider>{"divider"} = //
            lit("Divider") >> attr(Divider{});

        auto item = rule<void, MenuItem>{"menu|option|lockedoption|divider"} =
            menu | option | lockedoption | divider;

        auto title = "Title" > qstring;

        auto menu_def =                   //
            "Menu" > qstring > eol        //
            > '{' > eol                   //
            > -(title > eol)              //
            > *((!lit('}')) > item > eol) //
            > '}';

        auto ignore = blank | "//" >> *~char_("\r\n") >> (eol|eoi);

        BOOST_SPIRIT_DEFINE(menu)

        Menu parseMenu(std::string const& text) try {
            Menu result;
            parse(begin(text), end(text), skip(ignore)[expect[menu] > *eol > eoi],
                  result);
            return result;
        } catch (expectation_failure<std::string::const_iterator> const& ef) {
            throw std::runtime_error(
                "At " + std::to_string(std::distance(begin(text), ef.where())) +
                ": Expected " + ef.which() + " (Got '" +
                std::string(ef.where(), std::find(ef.where(), end(text), '\n')) +
                "')");
        }
    } // namespace Parser

int main() {
    for (std::string const& sample :
         {
             R"(    Menu "MenuName"
                        {
                            Menu "SubMenuName"
                            {
                                Option "OptionName" "Command"
                                Option "OptionName" "Command"
                            }
                        }
                    // This is a comment line
                    // // [ curly brackets in comment lines are made square so they don't get counted as balancing)",
             R"(    Menu "MenuName"{
                            Menu "SubMenuName"
                            {
                                Option "OptionName" "Command"
                                Option "OptionName" "Command"   }
                        }Menu "That bracket would be wrong since the line should end afterwards.
                        {   //this would also be wrong
                    // Nothing should ever follow a bracket except a End Of Line character.)",
             R"(//DO NOT DELETE, needs a line here for some reason.
                    Menu "MenuNameTEMPLATE"
                    {
                        Title "TitleName"
                        Option "OptionName" "Command"
                        Divider
                        LockedOption
                        {
                            DisplayName "OptionName"
                            Command "Command"
                            Icon "IconName"
                            PowerReady "PowerIdentifiers"
                        }
                        LockedOption
                        {
                            DisplayName "OptionName"
                            Command "Command"
                            Icon "IconName"
                            Badge "BadgeIdentifiers"
                        }
                        LockedOption
                        {
                            DisplayName "OptionName"
                            Command "Command"
                            Icon "IconName"
                        }
                        Menu "SubMenuName"
                        {
                            Title "TitleName"
                            Option "OptionName" "Command"
                            Option "OptionName" "Command"
                        }
             })",
             R"(Menu "Not So Good"
                {
                    Title "Uhoh"
                    Road Rage
                }
             )",
         }) //
    {
        static int i = 0;
        fmt::print("----- {}\n", ++i);
        try {
            fmt::print("Parsed: {}\n", Parser::parseMenu(sample));
        } catch (std::exception const& e) {
            std::cout << "Parse failed: " << e.what() << "\n";
        }
    }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Blindy
Solution 3