'Persisting "nana values" from SQLLIte database

I am trying my best to drop all values from Columns and rows with nan values, or my code will break with these.

Before you ask, yes, I ask Google, and I have the correct code block to drop all nan values.

# ____________________________________________________________________________________ SQLite3 Integration

# Read sqlite query results into a pandas DataFrame
con = sqlite3.connect("Sensors Database.db")  # Name of database
df = pd.read_sql_query("SELECT * FROM Hollow_Data_1", con)
# Verify that result of SQL query is stored in the dataframe
con.close()
# print the database table in Console
# drop all nan value

df = df.dropna(subset=['latitude', 'longitude', 'alt'])
print(df)

# df = pd.read_csv('DutchHollow_07222018_non_nan.csv')

The last line is a testing CSV string to ensure my code is working without these nan values.

I only see the same thing when I run the print(df.head()). It's like it never even drops them.

Database snippet:

Database snipet

This database is designed to gather data from sensors from drones, and the nanas values are crap data (the drone is powering on and programming the flight, not in the air yet). My dashboard plots these points to a Mapbox.

This can be handled by NOT NULL in the database schema in real-time, so it doesn't have these NAN values in the table.

Essentially, I want it to drop all rows and columns with NAN values.

I try to copy the dataframe

nan_df = dropna() and dropna(how=all) dropna(subset=['latitude', 'longitude', 'alt']) or (inplace=True) 

Every time I print(df.head), it persists. It's like the dropna is not there.

This code will be open source: https://github.com/coxchris859?tab=repositories



Solution 1:[1]

Messing around with your data, it appears that your csv is just kinda scuffed.

Instead of true nans, your nans are coming in as nan <-- yes, with a space before it.

Given your df:

             gps_date     latitude     longitude   alt  ... Temperature  Humidity  Pressure        Voc
0      7/8/2018 20:41   37.7122685   -120.961832  30.4  ...       39.55   1011.68    27.130  1277076.0
1      7/8/2018 20:17            0             0   nan  ...       39.66   1014.00    28.967    10943.0
2      7/8/2018 20:17          nan           nan   nan  ...       41.19   1014.02    28.633    15895.0
3      7/8/2018 20:17          nan           nan   nan  ...       42.05   1014.04    27.901    21403.0
4      7/8/2018 20:17          nan           nan   nan  ...       42.49   1014.05    27.169    27909.0
...               ...          ...           ...   ...  ...         ...       ...       ...        ...
4060  7/22/2018 21:50   37.7085305   -121.072975  38.1  ...       42.54   1014.45    22.296   995778.0
4061  7/22/2018 21:50  37.70852517  -121.0729798  38.1  ...       42.53   1014.45    22.305   998589.0
4062  7/22/2018 21:50   37.7085225  -121.0729787  38.2  ...       42.54   1014.44    22.307   999294.0
4063  7/22/2018 21:50  37.70852533   -121.072976  38.4  ...       42.54   1014.45    22.323  1000000.0
4064  7/22/2018 21:50  37.70853217  -121.0729735  38.6  ...       42.54   1014.46    22.323   999294.0

[4065 rows x 21 columns]

Doing:

df = df.replace(' nan', np.nan)
df = df.dropna()

Output:

             gps_date     latitude     longitude       alt  ... Temperature  Humidity  Pressure        Voc
0      7/8/2018 20:41   37.7122685   -120.961832      30.4  ...       39.55   1011.68    27.130  1277076.0
101    7/8/2018 20:19  37.72486737  -120.9415272  -179.979  ...       42.33    999.77    22.664   511798.0
103    7/8/2018 20:19   37.7193156  -120.9505354    10.642  ...       42.22    999.79    22.596   521619.0
104    7/8/2018 20:19  37.71908237  -120.9503735     1.043  ...       42.12    999.88    22.523   524717.0
105    7/8/2018 20:19  37.71871426  -120.9502485    -11.66  ...       42.03    999.80    22.539   528246.0
...               ...          ...           ...       ...  ...         ...       ...       ...        ...
4060  7/22/2018 21:50   37.7085305   -121.072975      38.1  ...       42.54   1014.45    22.296   995778.0
4061  7/22/2018 21:50  37.70852517  -121.0729798      38.1  ...       42.53   1014.45    22.305   998589.0
4062  7/22/2018 21:50   37.7085225  -121.0729787      38.2  ...       42.54   1014.44    22.307   999294.0
4063  7/22/2018 21:50  37.70852533   -121.072976      38.4  ...       42.54   1014.45    22.323  1000000.0
4064  7/22/2018 21:50  37.70853217  -121.0729735      38.6  ...       42.54   1014.46    22.323   999294.0

[3939 rows x 21 columns]

Alternatively, you can run something like df.latitude = df.latitude.astype(float) on each column that's scuffed, and this appears to auto-fix the scuffed nans and leave you with the correct dtype for those columns.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BeRT2me