'Plotly Choropleth Map Showing Wrong Colors

So I am making a choropleth map showing cumulative cases of West Nile from 2006 to 2015 for each county in California.

The issue I am running into is that the map is displaying colors for a few counties wrong. Output Screenshot

Those counties in black should not be so, they should be as light as the other counties. Additionally it is not consistent either. Inyo is the same color as Fresno but Inyo had 0 cases in 2006 and Fresno had 11.

Here is the code that I used to generate the plot:

fig = px.choropleth(df, geojson=counties,
                    locations='id',
                    color='Cumulative_Cases',
                    color_continuous_scale='purples',
                    featureidkey="id",
                    range_color=(0, df['Cumulative_Cases'].max()),
                    scope='usa',
                    animation_frame="Year",
                    animation_group='id',
                    labels={'Cumulative_Cases':'West Nile Cases'},  
                    hover_data=['County','Cumulative_Cases'],
                    )
fig.update_geos(fitbounds='locations',visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Here is what my dataframe looks like: Data Screenshot

You can download my dataset here: https://www.dropbox.com/s/una3ztqs2lp5ngf/df.csv?dl=0



Solution 1:[1]

import pandas as pd
import requests
import plotly.express as px

# data distionary
df_dd = pd.read_csv(
    "https://data.chhs.ca.gov/dataset/3205b420-3f62-4a02-8d2e-9a9ed34c49f4/resource/03bc3b8f-98f8-4d7a-bb12-ffcfe19446ca/download/westnileviruscases2006-present-dd.csv"
)

# cases
df = pd.read_csv(
    "https://data.chhs.ca.gov/dataset/3205b420-3f62-4a02-8d2e-9a9ed34c49f4/resource/6ef33c1b-9f54-49f2-a92e-51a1b78f0a06/download/vendor.csv",
    names=[c.replace(" ", "_") for c in df_dd.iloc[:, 0].values],
).sort_values(["County", "Year", "Week_Reported"])

# aggregate and calc cumulative
df = df.groupby(["County", "Year"], as_index=False).agg({"Positive_Cases": "sum"})

# geojson
counties = requests.get(
    "https://raw.githubusercontent.com/codeforgermany/click_that_hood/main/public/data/california-counties.geojson"
).json()

# fill in missing values with zero, not needed but improves visualisation
df = (
    pd.merge(
        df,
        pd.MultiIndex.from_product(
            [
                pd.json_normalize(counties["features"])["properties.name"].values,
                df["Year"].unique(),
            ],
            names=["County", "Year"],
        ).to_frame(),
        left_on=["County", "Year"],
        right_index=True,
        how="right",
    )
    .fillna(0)
    .sort_values(["County", "Year"])
)

# calculate cumulative cases
df["Cumulative_Cases"] = df.groupby("County")["Positive_Cases"].cumsum()

fig = px.choropleth(
    df,
    geojson=counties,
    locations="County",
    color="Cumulative_Cases",
    color_continuous_scale="purples",
    featureidkey="properties.name",
    range_color=(0, df["Cumulative_Cases"].max()),
    scope="usa",
    animation_frame="Year",
    animation_group="County",
    labels={"Cumulative_Cases": "West Nile Cases"},
    hover_data=["County", "Cumulative_Cases"],
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.show()

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rob Raymond