'Matching numeric substring in a URL to flag
Been scorching the net and stack overflow content for the past week, tried everything ik but can't find my error
I want to flag these URLs (in Pyspark) as Brand
- https://aaa.com/en-GB/GB/c10092.html
- https://aaa.com/en-GB/GB/c10040-p0.html
- https://aaa.aaa.com/en-GB/GB/p/100713
The fixed pattern I saw here was that after "/c" and "p/" there were at least 3 digits and wrote this
f1 = df1.withColumn("Flag", when((col("uni_referer").rlike("%https://aaa.aaa.com/en-GB/GB/p/\d{3}%")),'Brand'))
But it's not flagging, can someone please help? Thanks
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
