'How do extract specific info in a timeit report for visualization in plt?
How do I pick specific values from the result of running %%timeit? Struggling to find a way out in terms of visualizing loop times. Found an excellent youtube video, showing to significantly speeding up pandas performance, essentially comparing vectorization to apply and iterrows. While the result remains astonishing, I need to find a way to load run performances of mean/std for each method directly into a df, and in stead of having to set up the values in the df manually.
Tried to add '-o' after the %%timeit, but that does not quite appear to solve my challenge from level of detail perspective.
This is the result from running %%timeit:
2.56 ms ± 72.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This is the result from running %%timeit -o:
<TimeitResult : 2.56 ms ± 72.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)>
Subsequently, I'd like to load the mean and std values directly into the below df:
results = pd.DataFrame(
[['Loop', mean_std_values_from_loop_timeit],
['Apply', mean_std_values_from_apply_timeit],
['Vectorized', mean_std_values_from_vectorized_timeit]],
columns=['Type', 'Mean'],
)
results.set_index('Type')['Mean'].plot(kind='bar', title='Time to run Reward Calc')
I did import numpy, pandas....and, of course, matplotlib.pyplot. So this is not a problem of visualization, but extracting details from %%timeit. Any support appreciated...;o)
Solution 1:[1]
import pandas as pd
import numpy as np
import timeit
import matplotlib.pyplot as plt
def get_data(size=10000):
df = pd.DataFrame()
df['col1'] = np.random.randint(0, 100, size)
df['col2'] = np.random.randint(0, 100, size)
df['col3'] = np.random.rand(size)*100
df['col4'] = np.random.choice(['one', 'two', 'three'], size)
df['col5'] = np.random.choice(['four', 'five', 'six'])
return df
print(get_data(size=10000))
def reward_calc(row):
if row['col1'] >= 90:
return row['col4']
if (row['col2'] > 5) & (row['col3'] > 0.5):
return row['col4']
return row['col5']
# %%timeit
df = get_data()
start_time = timeit.default_timer()
for index, row in df.iterrows():
df.loc[index, 'reward'] = reward_calc(row)
time_1 = timeit.default_timer() - start_time
print('Function 1 took', time_1)
df = get_data()
start_time = timeit.default_timer()
df['reward'] = df.apply(reward_calc, axis=1)
time_2 = timeit.default_timer() - start_time
print('Function 2 took', time_2)
df = get_data()
# df = pd.DataFrame()
start_time = timeit.default_timer()
df['reward'] = df['col5']
df.loc[((df['col3'] > 0.5) & (df['col2'] > 5)) | (df['col1'] > 90), 'reward'] = df['col4']
time_3 = timeit.default_timer() - start_time
print('Function 3 took', time_3)
results = pd.DataFrame(
[['Loop', time_1],
['Apply', time_2],
['Vectorized', time_3]],
columns=['Type', 'Mean'],
)
results.set_index('Type')['Mean'].plot(kind='bar', title='Time to run Reward Calc')
plt.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Newbie |
