'How to sort ndarray with multiple columns?
My goal is to suggest a set of phone numbers id in order of priority based on two criteria: the number of calls made and a status provided after the last call.
The number of calls is the most important criteria. The more calls a phone number has, the lower its priority.
This phone number is linked to a status that must also be sorted.
class PhoneCallOutcomeEnum(Enum):
""" Class PhoneCallOutcomeEnum """
cancelled = "cancelled"
hung_up = "hung_up"
voicemail = "voicemail"
answered = "answered"
The order is: answered, hang_up, voicemail and cancelled.
For example a phone number called 0 times with a status of "answered" will be the top item in the list while a number already called 6 times with a status of not answered will be at the bottom.
I used pandas (dataframe) but I'm not sure if it's suitable for my needs. I was recommended to use numpy.
My goal is to sort by nb_calls and outcome_status Example:
| id | nb_calls | outcome_status |
|---|---|---|
| 4 | 0 | dial-up |
| 1 | 1 | answered |
| 2 | 1 | answered |
| 3 | 1 | hang_up. |
| 7 | 1 | voicemail |
| 8 | 2 | answered |
In the example, the first criterion is the number of calls (nb_calls) and then the status outcome_status in order (answered, hang_up, voicemail and cancelled.).
What is the best way to sort like this?
Solution 1:[1]
Let's first create a dataframe with some Names columns just to experiment
df = pd.DataFrame(columns=["nb_calls", "outcome_status"]).reset_index(drop=True)
df['nb_calls'] = np.random.randint(0,7,size=50)
df['outcome_status'] = np.random.choice(["answered", "hang_up", "voicemail", "cancelled"], size=50)
df['Name'] = np.random.choice(["John", "Jane", "Mary", "Bob", "Tom", "Jack", "Sue", "Linda", "Peter"], size=50)
Next sort the dataframe by nb_calls
df.sort_values(by=['nb_calls'], inplace=True)
Next create a sort order dictionary
sort_dic = {'answered': 0, 'hang_up': 1, 'voicemail': 2, 'cancelled': 3}
Create a new column outcome_status_label
df['outcome_status_label'] = df['outcome_status'].map(sort_dic)
groupby nb_calls and sort the dataframe by outcome_status_label by apply lamda
df.groupby(['nb_calls']).apply(lambda x: x.sort_values(by=['outcome_status_label'])).reset_index(drop=True)
Done. Drop the outcome_status_label column since you don't need it anymore
df.drop(columns=['outcome_status_label'], inplace=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kyriakos |
