'Extract urlparse netloc from pandas dataframe

I have a pandas data frame containing URL strings which I'd like to clean since I need them for a data matching exercise. Examples for df["website"] are as follows:

http://www.example.com
https://www.example.com
www3.cde.com
www.efg.com/en
ww.aaa.com
abcde.com
en.aers.com

I would like to extract subdomain + domain + TLD, which I tried with the following code:

from urllib.parse import urlparse
df["website_clean"]=""
df["website_clean"]=urlparse(df["website"]).netloc

However, I get the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can someone assist?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Extract urlparse netloc from pandas dataframe

Sources

Related Questions