'Extract urlparse netloc from pandas dataframe
I have a pandas data frame containing URL strings which I'd like to clean since I need them for a data matching exercise. Examples for df["website"] are as follows:
- http://www.example.com
- https://www.example.com
- www3.cde.com
- www.efg.com/en
- ww.aaa.com
- abcde.com
- en.aers.com
I would like to extract subdomain + domain + TLD, which I tried with the following code:
from urllib.parse import urlparse
df["website_clean"]=""
df["website_clean"]=urlparse(df["website"]).netloc
However, I get the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can someone assist?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
