'How do extract specific info in a timeit report for visualization in plt?

How do I pick specific values from the result of running %%timeit? Struggling to find a way out in terms of visualizing loop times. Found an excellent youtube video, showing to significantly speeding up pandas performance, essentially comparing vectorization to apply and iterrows. While the result remains astonishing, I need to find a way to load run performances of mean/std for each method directly into a df, and in stead of having to set up the values in the df manually.

Tried to add '-o' after the %%timeit, but that does not quite appear to solve my challenge from level of detail perspective.

This is the result from running %%timeit:

2.56 ms ± 72.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is the result from running %%timeit -o:

<TimeitResult : 2.56 ms ± 72.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>

Subsequently, I'd like to load the mean and std values directly into the below df:

results = pd.DataFrame(
[['Loop', mean_std_values_from_loop_timeit],
['Apply', mean_std_values_from_apply_timeit],
['Vectorized', mean_std_values_from_vectorized_timeit]],
columns=['Type', 'Mean'],
)

results.set_index('Type')['Mean'].plot(kind='bar', title='Time to run Reward Calc')

I did import numpy, pandas....and, of course, matplotlib.pyplot. So this is not a problem of visualization, but extracting details from %%timeit. Any support appreciated...;o)



Solution 1:[1]

import pandas as pd
import numpy as np
import timeit
import matplotlib.pyplot as plt

def get_data(size=10000):
    df = pd.DataFrame()
    df['col1'] = np.random.randint(0, 100, size)
    df['col2'] = np.random.randint(0, 100, size)
    df['col3'] = np.random.rand(size)*100
    df['col4'] = np.random.choice(['one', 'two', 'three'], size)
    df['col5'] = np.random.choice(['four', 'five', 'six'])
    return df

print(get_data(size=10000))

def reward_calc(row):
    if row['col1'] >= 90:
        return row['col4']
    if (row['col2'] > 5) & (row['col3'] > 0.5):
        return row['col4']
    return row['col5']

# %%timeit

df = get_data()

start_time = timeit.default_timer()
for index, row in df.iterrows():
    df.loc[index, 'reward'] = reward_calc(row)
time_1 = timeit.default_timer() - start_time

print('Function 1 took', time_1)

df = get_data()

start_time = timeit.default_timer()
df['reward'] = df.apply(reward_calc, axis=1)
time_2 = timeit.default_timer() - start_time

print('Function 2 took', time_2)

df = get_data()

# df = pd.DataFrame()

start_time = timeit.default_timer()
df['reward'] = df['col5']
df.loc[((df['col3'] > 0.5) & (df['col2'] > 5)) | (df['col1'] > 90), 'reward'] = df['col4']
time_3 = timeit.default_timer() - start_time
print('Function 3 took', time_3)

results = pd.DataFrame(
[['Loop', time_1],
['Apply', time_2],
['Vectorized', time_3]],
columns=['Type', 'Mean'],
)

results.set_index('Type')['Mean'].plot(kind='bar', title='Time to run Reward Calc')
plt.show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Newbie