'Ghost NaN values in Pandas DataFrame, strange behaviour with Numpy
This is a very strange problem, I tried a lot of things but I can't find a way to solve it.
I have a DataFrame with data collected from API : no problem with that, then I'm using a library which is pandas-ta (https://github.com/twopirllc/pandas-ta), so this add new columns to the DataFrame.
Of course, sometimes there is NaN values in the new columns added (there is a lot of reasons but the main one is that some indicators are length-based).
Basic problem, so basic solution, just need to type df.fillna(0, inplace=True) and it works !
But when when I check the df.values (or the conversion to_numpy()) there is still nan values.
Properties of the problem :
_NaN not found with np.where() in the array both with np.nan & pandas-ta.npNaN
_df.isna().any().any() returns False
_NaN are float values, not string
_array has a dtype equal to object
_I tried various methods to replace the NaNs, not only fillna, but with the fact that they are not recognized it does not work at all
_I also thought it was because of large numbers, but using to_numpy(dtype='float64') gives the same problem
So these values are here only when converted to numpy array and not recognized.
These values are also here when I use PCA to my dataset, where I get a message error because of the NaNs.
Thanks a lot for your time, sorry for the mistakes I'm not a native speaker.
Have a good day y'all.
Edit :
There is a screen of the operations I'm doing and the result printed, you can see one NaN value.

Solution 1:[1]
You will want to save the input as an int to compare:
students = {11111: "A+", 22222: "B+", 33333: "D+"}
ID = int(input("please enter the student ID:"))
for key in students:
if ID == key:
print(students[int(ID)])
break
else:
print("ID not found")
if len(str(ID)) < 5:
print("invalid Id")
elif len(str(ID)) > 5:
print("invalid Id")
That will let you compare correctly, but I think a better version would be the following:
students = {11111: "A+", 22222: "B+", 33333: "D+"}
ID = int(input("please enter the student ID:"))
found = False
if ID in students:
print(students[ID])
found = True
if not found:
print("ID not found")
if len(str(ID)) != 5:
print("invalid Id")
Solution 2:[2]
I prefer this code to you:
students = {11111: "A+", 22222: "B+", 33333: "D+"}
ID = int(input("please enter the student ID: "))
if len(str(ID)) == 5:
if ID in students.keys():
print(students[ID])
else:
print('ID no found')
else:
print("invalid ID")
It first check the length of the input, then if your input be in the dictionary it prints the response, else prints "ID not found".
Solution 3:[3]
First of all, you should follow Asking the user for input until they give a valid response regarding your input-loop. It should be something like this:
while True:
ID = input("please enter the student ID:")
if len(ID) != 5:
print("invalid Id")
else:
# check the id
break
Now, regarding how you check the keys - it is not necessary to loop over a dict to check if a key exists. The whole advantage of dicts is that they are hash tables and give an O(1) look-up time. So you can simply do:
if ID in students:
print(students[ID])
else:
print("ID not found")
But since you're just printing, this can all be simplified using the get method which has a default argument that is returned if the key is not found. So an if/else is not even necessary:
print(students.get(ID, "ID not found"))
Lastly, remember that input always returns a string. Your keys are ints. So you will have to convert the ID to an int before using it as a key.
All together your code could be:
while True:
ID = input("please enter the student ID:")
if len(ID) != 5:
print("invalid Id")
else:
print(students.get(int(ID), "ID not found"))
break
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Ali SHOKOUH ABDI |
| Solution 3 |
