'Convert PySpark URL Decoder into Scala
I have created a PySpark udf by doing the following:
from urllib.parse import urljoin, urlparse
import unicodedata
from pyspark.sql.functions import col, udf, count, substring
from pyspark.sql.types import StringType
decode_udf = udf(lambda val: urljoin(unicodedata.normalize('NFKC',val), urlparse(unicodedata.normalize('NFKC',val)).path), StringType())
For reference, the code above takes a url like this:
https://www.dagens.dk/udland/steve-irwins-soen-taet-paa-miste-livet-ny-video-viser-flugt-fra-kaempe-krokodille?utm_medium=Social&utm_source=Facebook#Echobox=1644308898
and transforms into
https://www.dagens.dk/udland/steve-irwins-soen-taet-paa-miste-livet-ny-video-viser-flugt-fra-kaempe-krokodille
How can I convert this into Scala? I have tried many ways to replicate the code but unsuccessful. Thanks in advance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
