'How to sort ndarray with multiple columns?

My goal is to suggest a set of phone numbers id in order of priority based on two criteria: the number of calls made and a status provided after the last call.

The number of calls is the most important criteria. The more calls a phone number has, the lower its priority.

This phone number is linked to a status that must also be sorted.

class PhoneCallOutcomeEnum(Enum):
    """ Class PhoneCallOutcomeEnum """
    cancelled = "cancelled"
    hung_up = "hung_up"
    voicemail = "voicemail"
    answered = "answered"

The order is: answered, hang_up, voicemail and cancelled.

For example a phone number called 0 times with a status of "answered" will be the top item in the list while a number already called 6 times with a status of not answered will be at the bottom.

I used pandas (dataframe) but I'm not sure if it's suitable for my needs. I was recommended to use numpy.

My goal is to sort by nb_calls and outcome_status Example:

id nb_calls outcome_status
4 0 dial-up
1 1 answered
2 1 answered
3 1 hang_up.
7 1 voicemail
8 2 answered

In the example, the first criterion is the number of calls (nb_calls) and then the status outcome_status in order (answered, hang_up, voicemail and cancelled.).

What is the best way to sort like this?



Solution 1:[1]

Let's first create a dataframe with some Names columns just to experiment

df = pd.DataFrame(columns=["nb_calls", "outcome_status"]).reset_index(drop=True)
df['nb_calls'] = np.random.randint(0,7,size=50)
df['outcome_status'] = np.random.choice(["answered", "hang_up", "voicemail", "cancelled"], size=50)
df['Name'] = np.random.choice(["John", "Jane", "Mary", "Bob", "Tom", "Jack", "Sue", "Linda", "Peter"], size=50)

Next sort the dataframe by nb_calls

df.sort_values(by=['nb_calls'], inplace=True)

Next create a sort order dictionary

sort_dic = {'answered': 0, 'hang_up': 1, 'voicemail': 2, 'cancelled': 3}

Create a new column outcome_status_label

df['outcome_status_label'] = df['outcome_status'].map(sort_dic)

groupby nb_calls and sort the dataframe by outcome_status_label by apply lamda

df.groupby(['nb_calls']).apply(lambda x: x.sort_values(by=['outcome_status_label'])).reset_index(drop=True)

Done. Drop the outcome_status_label column since you don't need it anymore

df.drop(columns=['outcome_status_label'], inplace=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kyriakos