'Nom parser that skips escaped terminator characters

I've checked the other SO answers for nom parser combinator questions, but this one doesn't seem to have been asked yet.

I am attempting to parse delimited regular expressions, they will always be delimited with /...../, perhaps with the modifiers at the end (which for all the data I need to parse right now is out of scope.) however if there's an escaped \/ in the middle of the string, my parser is stopping prematurely, on the first / even if it was preceeded with a \.

I have this parser:

use nom::bytes::complete::{tag, take_until};
use nom::{combinator::map_res, sequence::tuple, IResult};
use regex::Regex;

pub fn regex(input: &str) -> IResult<&str, Regex> {
    map_res(
        tuple((tag("/"), take_until("/"), tag("/"))),
        |(_, re, _)| Regex::new(re),
    )(input)
}

Naturally the take_until stops at the first / without noticing that the previous character was a \, I've looked at peek and recognize, and map and a whole bunch of other things, but I'm just coming up short, I feel like I literally want take_until("/") with some kind of either encoding awareness, or simply .. I am anyway, using map_res to hand-off to Rust's regex crate to do the parsing.

I also tried something like this using the escaped combinator, but the examples are somewhat unclear and I couldn't make it work:

pub fn regex(input: &str) -> IResult<&str, Regex> {
    map_res(
        tuple((
            tag("/"),
            escaped(many1(anychar), '\\', one_of(r"/")),
            tag("/"),
        )),
        |(_, re, _)| {
            println!("mapres {}", re);
            Regex::new(re)
        },
    )(input)
}

My test cases are as such (the .unwrap().as_str() is just to have a small example, since regex::Regex doesn't implement PartialEq):

#[cfg(test)]
mod tests {
    use super::regex;
    use super::Regex;
    #[test]
    fn test_parse_regex_simple() {
        assert_eq!(
            Regex::new(r#"hello world"#).unwrap().as_str(),
            regex("/hello world/").unwrap().1.as_str()
        );
    }
    #[test]
    fn test_parse_regex_with_escaped_forwardslash() {
        assert_eq!(
            Regex::new(r#"hello /world"#).unwrap().as_str(),
            regex(r"/hello \/world/").unwrap().1.as_str(),
        );
    }
}


Solution 1:[1]

The accepted answer from Chayim Friedman is correct, I however was able to extend it also to handle \w \d and other such modifiers thusly, it's simply an extension of Chayim's idea in the escaped_transform version:


pub fn regex(input: &str) -> IResult<&str, Regex> {
    map_res(
        delimited(
            tag("/"),
            escaped_transform(
                none_of("\\/"),
                '\\',
                alt((
                    value(r"/", tag("/")),
                    value(r"\d", tag("d")),
                    value(r"\W", tag("W")),
                    value(r"\w", tag("w")),
                    value(r"\b", tag("b")),
                    value(r"\B", tag("B")),
                )),
            ),
            tag("/"),
        ),
        |re| Regex::new(&re),
    )(input)
}

note this list is also incomplete, but https://docs.rs/regex/1.5.6/regex/#escape-sequences gives a complete set of escapes, and https://github.com/Geal/nom/blob/main/examples/string.rs gives a more detailed explanation of how to handle \u{....} type escape sequences.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lee Hambley