'Perl splitting on multiple occurrence of the same pattern

I wrote the following Perl script to split on multiple occurrences of the same pattern.

The pattern is: (some text)

This is what I've tried:

foreach my $line (@input) {

  if ($line =~ /(\(.*\))+/g) {

    my @splitted = split(/(\(.*\))/, $line);

    foreach my $data (@splitted) {
      print $data, "\n";
    }
  }
}

For the given input text:

Non-rapid eye movement sleep (NREMS).
Cytokines such as interleukin-1 (IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).

I'm getting the following output:

Non-rapid eye movement sleep
(NREMS).
Cytokines such as interleukin-1
(IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).

The code doesn't split the text on the second and third occurrence of the pattern in line 2 of the text. I can't figure out what I'm doing wrong.



Solution 1:[1]

Split by this instead:

(\([^(]*\))

Your regex is greedy, so make it non greedy (\(.*?\)).

See demo.

https://regex101.com/r/dU7oN5/14

Problem with your regex can e seen here https://regex101.com/r/dU7oN5/15

Your regex matches ( and then greedily looks for the last ) and not the first ) it encounters. So the whole last line is being captured by it.

Solution 2:[2]

You haven't described your purpose, but I suggest that you use a regular expression match instead of split. But it looks like you're processing free-form text, which will never work properly in the general case.

This program finds all of the text (and bracketed meanings) in the input data.

use strict;
use warnings;

while (<DATA>) {
  while ( / ( [^()]* ) \( ( [^()]* ) \) /xg ) {
    my ($defn, $abbr) = ($1, $2);
    print "$defn\n";
    print "-- $abbr\n\n";
  }
}

__DATA__
Non-rapid eye movement sleep (NREMS).
Cytokines such as interleukin-1 (IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).

output

Non-rapid eye movement sleep 
-- NREMS

Cytokines such as interleukin-1 
-- IL-1

, tumor necrosis factor, acidic fibroblast growth factor 
-- FGF

, and interferon-alpha 
-- IFN-alpha

Solution 3:[3]

Have a try with:

foreach my $line (@input) {
    if($line =~/\(.*?\)/) { # modifier g can be removed here
        my @splitted = split(/(\(.+?\))/, $line); # make the match non greedy
        foreach my $data (@splitted) { 
            print $data, "\n"; 
        }
    }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 U. Windl
Solution 2 Borodin
Solution 3 U. Windl