'How to combine non-adjacent groups without using branch resets or capturing inside lookarounds?

Suppose I have the following text:

# Should match

- [ ] Some task
- [ ] Some task | [[link]]
- [ ] Some task ^abcdef
- [ ] Some task | [[link]] ^abcdef
- [ ] ! Some task
- [ ] ! Some task | [[link]]
- [ ] ! Some task ^abcdef
- [ ] ! Some task | [[link]] ^abcdef
- [ ] Task one | [ ] ! Task two | [ ] Task three ^abcdef

|     Tracker | Task                    | Backlog  |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ] Task item           | [[linK]] |
| 00:00-00:00 | [ ] Task item ^abcdef   | [[link]] |
| 00:00-00:00 | [ ] [[task-item]]       | [[link]] |
| 00:00-00:00 | [ ] ! Task item         | [[linK]] |
| 00:00-00:00 | [ ] ! Task item ^abcdef | [[link]] |
| 00:00-00:00 | [ ] ! [[task-item]]     | [[link]] |

# Should not match

- [ ] 
- [ ]
- [ ]  
- [ ] ! 
- [ ] !
- [ ] !  

|     Tracker | Task                    | Backlog  |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ]                     | [[linK]] |
| 00:00-00:00 | [ ] !                   | [[linK]] |

I am interested in several capture groups as follows:

  • group $1:

    • match: [ and ]
  • group $2:

    • match: any single character (e.g., \s) between [ and ]
  • group $3:

    • match: !, ?, or * that follows after [ ]
  • group $4:

    • match: task text after [ ] without modifier present
  • group $5:

    • match: task text after [ ] ! with modifier present

I came up with the following regex (i.e., see demo here):

(?<= \s )
  # Match opening braket (i.e., `[`).
  ( \[ )

  # Match any single character (e.g., `x`).
  ( . )

  # Matching closing braket (i.e., `]`)
  ( \] )
(?= \s* [?!*]? \s* )

# Exclude entries without text (i.e., incl. in tables).
(?! 
  \s* [?!*]? \s* \|
  |
  \s* [?!*]? \s* $
)

# Match the text (i.e., capture based on modifier presence).
(?:
  # Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
  \s* ( [!?*] ) \s* ( .*? )
  |
  # Match the text that does not follow a modifier.
  \s* (?! [!?*]) \s* ( .*? )
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)

Which seems to work (i.e., see the picture below), with one exception. The [ and ] notation brackets are captured in separate groups (i.e., [ in the group $1 and ] in the group $3). How can I capture [ and ] as part of the same group (i.e., $1)?

demo for regex mentioned

I am using this regex for a TextMate grammar in VS Code and according to the documentation the expression needs to be a valid Oniguruma regular expression. Based on some attempts, I noticed that the following are not supported:

  • branch resets (i.e., \K)
  • capturing inside lookarounds
  • named capture groups

Edit

The fourth bird indicated in the comments that with the /J flag enabled the regex below works (i.e., see demo):

(?<= \s )
  # Match opening braket (i.e., `[`).
  (?<g1> \[)

  # Match any single character (e.g., `x`).
  (?<g2> .)

  # Matching closing braket (i.e., `]`)
  (?<g1> \])
(?= \s* [?!*]? \s* )

# Exclude entries without text (i.e., incl. in tables).
(?! 
  \s* [?!*]? \s* \|
  |
  \s* [?!*]? \s* $
)

# Match the text (i.e., capture based on modifier presence).
(?:
  # Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
  \s* (?<g3>[!?*]) \s* (?<g4>.*?)
  |
  # Match the text that does not follow a modifier.
  \s* (?! [!?*]) \s* (?<g5>.*?)
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)

It does. However, as I just discovered, it seems that I cannot use named capture groups for TextMate grammars and, therefore, I need a different solution.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source