'Regex expression to exclude a string between two capture groups

I am trying to capture the user name and channel id of that user from an api string using regex.

Unfortunately I can not use a JSON Parser on the JSON format so I (beginner) am stuck with Regex.

My solution finds the username matches its string, finds the channel id and also matched that string. Because it's non-greedy, it finds the shortest possible solution and creates several capture groups, if multiple persons are connected.

But a problem arises, if multiple users of the server are online but some not connected to a channel. Regex then finds the first username and uses the in-between space until it finds the channel id of the next user. Then it obviously gives me the correct channel id but the incorrect user.

I excluded the symbol "{" at some point, because it separates different users and that worked. Unfortunately on some occasions "{" can also occur inside the users parameters so some were not captured anymore.

Now I tried to ban the string ""id"" from the allowed string between the two capture groups instead.

But I can't get it to work. Do you have any suggestions?

This example captures User 1 and 3 correctly but matches username User 2 with the channel id of Bot 1. I don't know much about flavors but it said PCRE(PHP) on the test site and so far that worked for my program. I shortened the avatar links and beginning with ....

Regular Expression:

username": "((?!Bot 1).*?)".*?channel_id": "([0-9]*?)"

String snippet:

"members": [{"id": "0", "username": "User 1", "discriminator": "0000", "avatar": null, "status": "online", "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "0123456789", "avatar_url": "https://..."}, {"id": "1", "username": "User 2", "discriminator": "0000", "avatar": null, "status": "online", "game": {"name": "pls help"}, "avatar_url": "https://..."}, {"id": "2", "username": "Bot 1", "discriminator": "0000", "avatar": null, "status": "online", "game": {"name": "music | ;;help"}, "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "1234567890", "avatar_url": "https://..."}, {"id": "3", "username": "User 3", "discriminator": "0000", "avatar": null, "status": "online", "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "2345678901", "avatar_url": "https://..."}], "presence_count": 4}



Solution 1:[1]

Don't allow {"id" between username and channel:

username": "((?!Bot 1)[^"]*)"(?:(?!\{"id").)*channel_id": "(\d+)"

See live demo.

Username and channel ID are captured in groups 1 and 2.

Some other minor adjustments included.

Solution 2:[2]

Like other suggested, plan A should be to parse the object. For plan B your regex might look like this:

"username": "([^"]+)"

It gets a bit trickier if you allow escapes, for example, if a username is "User says "hi" always". In which case you could use the pattern described here: Unroll Loop, when to use:

enter image description here

Here we would have the normal case being [^"\\] (not double-quote or escape char), and the special case being \\" (escape double-quote).

To add on the channel_id, and assuming each object starts with {"id": ... you could then do:

enter image description here

Hope it helps...Needless to say, it's pretty overkill! I'd simplify it a bit, or rather get rid of the regex entirely if you can. Good luck!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2