'Apply negative lookbehind to the entire group before it

I want to capture the model of a phone but not the storage in the title. So I don't want the regex to match xxxGB.

I am expecting to match:
iphone 13 from: "iphone 13 256gb - midnight"
iphone 13 pro max from "iphone 13 pro max 256gb - sierra blue"
iphone 13 pro from "iphone 13 pro 128gb - graphite"
galaxy tab a8 from "galaxy tab a8 wifi 128gb - grey"

The regular expression I have is

r'[A-Za-z]+\s?[A-Za-z\+\.\d]*((\spro|\smax|\slight|\smini|\splus|\sultra|\[A-Za-z]?\d+(?!gb)))*|$'

but the look behind only applied to the last number before "gb" not the entire number after the space

apple iphone 13 256gb - midnight
<re.Match object; span=(6, 18), match='iphone 13 25'>
<re.Match object; span=(32, 32), match=''>
apple iphone 13 pro 128gb - graphite
<re.Match object; span=(6, 22), match='iphone 13 pro 12'>
<re.Match object; span=(36, 36), match=''>
apple iphone 13 pro max 256gb - sierra blue
<re.Match object; span=(6, 26), match='iphone 13 pro max 25'>
<re.Match object; span=(43, 43), match=''>
samsung galaxy tab a8 wifi 128gb - grey
<re.Match object; span=(8, 21), match='galaxy tab a8'>
<re.Match object; span=(39, 39), match=''>

The testing template can be found from here: https://regex101.com/r/dn0Hyr/1

Many thanks!!



Solution 1:[1]

You may use this regex to match phone models:

^[A-Za-z]+(?: (?!wifi|\d*gb)[\dA-Za-z]+)*

RegEx Demo

RegEx Details:

  • ^: Start
  • [A-Za-z]+: Match 1+ letters
  • (?: (?!wifi|\d*gb)[\dA-Za-z]+)*: Delimited by space match 1+ of letters or digits as long as word is not wifi or digits followed by gb. Repeat this group 0 or more times

Solution 2:[2]

An alternative between two positive look ahead:

Figure I - Regex A

/^.*(?=\swifi\s\d{3})|^.*(?=\s\d{3})/gm

RegEx A at RegEx101

Figure II - RegEx A

Segment Meaning
^.* Starting with anything BUT a newline occurring zero or more times...
(?=\swifi\s\d{3}) ...is a match if it is before a space, literal "wifi", a space, and 3 digits...
| OR
^.* ...starting with anything BUT a newline occurring zero or more times...
(?=\s\d{3}) ...is a match if it is before a space and 3 digits.

or a shortened version without the alternative and matches 2 and 3 digits as per The fourth bird's comment below. Note, rather than an alternative, a non-capturing group (?:wifi\s)? is nested inside the look ahead and the quantifier ? doesn't make the match a requirement just a possibility:

Figure III - RegEx B

/^.*?(?=\s(?:wifi\s)?\d{2,3}gb)/gm

Regex B at RegEx101

Figure IV - RegEx B

Segment Meaning
^.*? Starting with anything BUT a newline occurring zero or more times until...
(?=\s(?:wifi\s)?... ...there's a space, literal "wifi", and a space occurring once or not at all...
...\d{2,3}gb) ...followed by 2 or 3 digits, and literal "gb"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anubhava
Solution 2