'numpy.where does not work for empty strings
As a SEO manager, I am using this python code to see whether the H1 tags are the same on the desktop version and mobile version of different pages of a website:
##Print the path of your current working directory
import os
print(os.getcwd())
#What you get here is where you should save your CSV crawls
##Import Panda Library
import pandas as pd
import numpy
##Load the crawls to Pandas
dfTextonly = pd.DataFrame(pd.read_csv('mobile.csv', low_memory=False, header=0))
dfTextonly = dfTextonly[['Address', 'H1-1']].copy()
dfJS = pd.DataFrame(pd.read_csv('desktop.csv', low_memory=False, header=0))
dfJS = dfJS[['Address','H1-1']].copy()
#Combine the two crawls into one dataframe
df = pd.merge(dfTextonly, dfJS, left_on='Address', right_on='Address', how='outer')
##Check the differences
df["H1s are equal"] = numpy.where((df["H1-1_y"] == df["H1-1_x"]), "yes", "no")
##Export in Excel
df.to_excel("test-results.xlsx")
However, the problem is that numpy.where in this code returns the value "no" whenever H1-1_y and H1-1_x are both "nan" (empty strings), while it should return "yes" since in this case, they are the same. Can somebody help me with this?
Sample Data
Solution 1:[1]
If it is about handling NaNs as you've mentioned in a comment, you can use pandas where which handle NaN == NaN as true. The code looks a bit hackish, so you can decide if you want that but you could try
df["H1s are equal"] = pd.Series(["yes"]*len(df["H1-1_y"])).where(df["H1-1_y"]==df["H1-1_x"], "No")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Simon Hawe |
