'Trying to create a bubble chart with ploty using a clean dataframe - Python
So below is my flex_df.head(10). What im trying to do is create a bubble chart that has the has salary as the x-axis and the count of the role (Role not OriginalTitle) on the y-axis. Then to wrap it all up, i need it to have different bubble sizes to show the different source files that brought in this data.
Im trying to use plotly express but none of my code that i tries works so i have nothing viable to post.
Role IsRemote Country Salary SourceFile OriginalTitle
Data Engineer TRUE USA 56 Flex-Jobs Data Engineer
Data Engineer TRUE USA 56 hired.com Data Engineer
Data Engineer TRUE USA 56 simplyhired Data Engineer
Data Scientist TRUE Poland 100 hired.com Data Science Consultant
Data Scientist TRUE Wrocław 100 indeed Data Science Consultant
Data Engineer TRUE USA 56 indeed Data Engineer
Data Engineer TRUE USA 56 Flex-Jobs Data Engineer
Data Scientist TRUE USA 15 Flex-Jobs Data Science Engineer
Data Scientist TRUE USA 20 Flex-Jobs Manager, Data Science
Data Analyst TRUE USA 56 Flex-Jobs Senior Data Science Analyst
Solution 1:[1]
You have defined x, y and size arguments of a scatter.
- size needs to be numeric, hence have used https://pandas.pydata.org/docs/reference/api/pandas.factorize.html to change from categorical to numeric
- you have not defined how to deal with roles that are contributed to by multiple sources. Hence assumed it's a simple aggregation
- with data structured it is now very simple to generate a scatter
import io
import pandas as pd
import plotly.express as px
flex_df = pd.read_csv(
io.StringIO(
"""Role IsRemote Country Salary SourceFile OriginalTitle
Data Engineer TRUE USA 56 Flex-Jobs Data Engineer
Data Engineer TRUE USA 56 hired.com Data Engineer
Data Engineer TRUE USA 56 simplyhired Data Engineer
Data Scientist TRUE Poland 100 hired.com Data Science Consultant
Data Scientist TRUE Wroc?aw 100 indeed Data Science Consultant
Data Engineer TRUE USA 56 indeed Data Engineer
Data Engineer TRUE USA 56 Flex-Jobs Data Engineer
Data Scientist TRUE USA 15 Flex-Jobs Data Science Engineer
Data Scientist TRUE USA 20 Flex-Jobs Manager, Data Science
Data Analyst TRUE USA 56 Flex-Jobs Senior Data Science Analyst"""
),
sep="\s\s+",
engine="python",
)
px.scatter(
flex_df.groupby(["Role", "SourceFile"], as_index=False)
.size()
.assign(bubblesize=lambda df: pd.factorize(df["SourceFile"])[0] + 1),
x="size",
y="Role",
size="bubblesize",
hover_data=["SourceFile"],
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rob Raymond |

