'How to make a python program that calculates a result for each row of the input table?
I am trying to make a python program that will calculate a result based on a formula, given factors and input data table. I am having troubles making it and I couldn't find a solution on my own nor a similar problem on SO. I hope someone can help!
Below I am showing my sample data in order to understand if I am doing something wrong.
My input data looks like this. I have a number of cars (N_cars) on a given length of the road (l) and their average speed (v).
input_columns = ['l', 'N_cars', 'v']
input_data = [[3.5, 1000, 100], [5.7, 500, 110],
[10, 367, 110], [11.1, 1800, 95],
[2.8, 960, 105], [4.7, 800, 120],
[10.4, 103, 111], [20.1, 1950, 115]]
input_df = pd.DataFrame(input_data, columns=input_columns)
input_df
l N_cars v
0 3.5 1000 100
1 5.7 500 110
2 10.0 367 110
3 11.1 1800 95
4 2.8 960 105
5 4.7 800 120
6 10.4 103 111
7 20.1 1950 115
I also know the factors needed for the formula for each category of cars, and I know the % of each category of total category. I also have different options for each category (3 options that I have here are just an example, there are many more options).
factors_columns = ['category', 'category %', 'option', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
factors_data = [['A', 58, 'opt_1', 0.000011, 0.23521, 0.93847, 0.39458, 0.00817, 0.24566, 0.0010, 0],
['A', 58, 'opt_2', 0.000011, 0.23521, 0.93145, 0.39458, 0.00467, 0.24566, 0.0010, 0],
['A', 58, 'opt_3', 0.000011, 0.23521, 0.93145, 0.39458, 0.00467, 0.24566, 0.0010, 0],
['B', 22, 'opt_1', 0.002452, 0.48327, 0.83773, 0.92852, 0.00871, 0.29568, 0.0009, 0.02],
['B', 22, 'opt_2', 0.002899, 0.49327, 0.83773, 0.92852, 0.00871, 0.30468, 0.0009, 0.02],
['B', 22, 'opt_3', 0.002452, 0.48327, 0.83773, 0.92852, 0.00771, 0.29568, 0.0119, 0.01],
['C', 17, 'opt_1', 0.082583, 0.39493, 0.02462, 0.82714, 0.00918, 0.28572, 0.0012, 0],
['C', 17, 'opt_2', 0.072587, 0.35493, 0.02852, 0.82723, 0.00912, 0.29572, 0.0018, 0],
['C', 17, 'opt_3', 0.082583, 0.39493, 0.02852, 0.82714, 0.00962, 0.28572, 0.0012, 0.01],
['D', 3, 'opt_1', 0.018327, 0.32342, 0.82529, 0.92752, 0.00988, 0.21958, 0.0016, 0],
['D', 3, 'opt_2', 0.014427, 0.32342, 0.82729, 0.92752, 0.00968, 0.22558, 0.0026, 0],
['D', 3, 'opt_3', 0.018327, 0.32342, 0.82729, 0.94452, 0.00988, 0.21258, 0.0016, 0]]
factors_df = pd.DataFrame(factors_data, columns=factors_columns)
factors_df
category category % option a b c d e f g h
0 A 58 opt_1 0.000011 0.23521 0.93847 0.39458 0.00817 0.24566 0.0010 0.00
1 A 58 opt_2 0.000011 0.23521 0.93145 0.39458 0.00467 0.24566 0.0010 0.00
2 A 58 opt_3 0.000011 0.23521 0.93145 0.39458 0.00467 0.24566 0.0010 0.00
3 B 22 opt_1 0.002452 0.48327 0.83773 0.92852 0.00871 0.29568 0.0009 0.02
4 B 22 opt_2 0.002899 0.49327 0.83773 0.92852 0.00871 0.30468 0.0009 0.02
5 B 22 opt_3 0.002452 0.48327 0.83773 0.92852 0.00771 0.29568 0.0119 0.01
6 C 17 opt_1 0.082583 0.39493 0.02462 0.82714 0.00918 0.28572 0.0012 0.00
7 C 17 opt_2 0.072587 0.35493 0.02852 0.82723 0.00912 0.29572 0.0018 0.00
8 C 17 opt_3 0.082583 0.39493 0.02852 0.82714 0.00962 0.28572 0.0012 0.01
9 D 3 opt_1 0.018327 0.32342 0.82529 0.92752 0.00988 0.21958 0.0016 0.00
10 D 3 opt_2 0.014427 0.32342 0.82729 0.92752 0.00968 0.22558 0.0026 0.00
11 D 3 opt_3 0.018327 0.32342 0.82729 0.94452 0.00988 0.21258 0.0016 0.00
GOAL: For each option (opt_1, opt_2, opt_3), I have to calculate the result based on this formula (factors are taken from the factors table, but v is coming from the input table):
formula = ( (a*v*v) + (b*v) + c + (d/v) ) / ( (e*v*v) + (f*v) + g) * (1 - h)
result = l * N_cars * formula
BUT, I have to take into account the % of each category of cars. For each row of the input_df I have to make calculations 3 times, for each of the 3 options. For example, for the index 0 of input_df, I have 1000 cars, v=100 and l=3.5, the output should be something like this:
for opt_1:
result = 3.5 * 1000 * ( (58% of category A {formula for index 0 of factors_df}) +
(22% of category B {formula for index 3 of factors_df) +
(17% of category C {formula for index 6 of factors_df}) +
(3% of category D {formula for index 9 of factors_df}) )
for opt_2:
result = 3.5 * 1000 * ( (58% of category A {formula for index 1 of factors_df}) +
(22% of category B {formula for index 4 of factors_df) +
(17% of category C {formula for index 7 of factors_df}) +
(3% of category D {formula for index 10 of factors_df}) )
for opt_3:
result = 3.5 * 1000 * ( (58% of category A {formula for index 2 of factors_df}) +
(22% of category B {formula for index 5 of factors_df) +
(17% of category C {formula for index 8 of factors_df}) +
(3% of category D {formula for index 11 of factors_df}) )
So, as an output, for each of the rows in input_df, I should have 3 results, each for one of the 3 options.
I can do the calculation manually for each step, but what I am having troubles with is to make a loop that does it automatically for each input row and all 3 options and then passes to the next input row and so on until the last input row.
If anyone has an idea how to solve this easily, please do help!
Solution 1:[1]
Here is the code I wrote. It's somewhat long but it works. May be you (or someone) can modify and make it shorter.
# Transforming factors_df
df = factors_df.pivot(columns=["category", "option"])
df.reset_index(inplace=True)
# Renaming column names for each combination of option and category
df.columns = [s3 + s2 + s1 for (s1, s2, s3) in df.columns.to_list()]
df.drop(columns=["index"], inplace=True)
# Flattening to a single row to be able to apply formula
df = pd.DataFrame(df.max()).T
# Merging input with transformed factors data
input_df["tmp"] = 1
df["tmp"] = 1
df = pd.merge(input_df, df, on="tmp", how="left")
df.drop("tmp", axis=1, inplace=True)
# Calculating values for opt_1 using the formula
df["opt_1_value"] = (
df["l"]
* df["N_cars"]
* (
(
df["opt_1Acategory %"]
/ 100
* (
df["opt_1Aa"] * df["v"] * df["v"]
+ df["opt_1Ab"] * df["v"]
+ df["opt_1Ac"]
+ df["opt_1Ad"] / df["v"]
)
/ (
(
df["opt_1Ae"] * df["v"] * df["v"]
+ df["opt_1Af"] * df["v"]
+ df["opt_1Ag"]
)
* (1 - df["opt_1Ah"])
)
)
+ (
df["opt_1Bcategory %"]
/ 100
* (
df["opt_1Ba"] * df["v"] * df["v"]
+ df["opt_1Bb"] * df["v"]
+ df["opt_1Bc"]
+ df["opt_1Bd"] / df["v"]
)
/ (
(
df["opt_1Be"] * df["v"] * df["v"]
+ df["opt_1Bf"] * df["v"]
+ df["opt_1Bg"]
)
* (1 - df["opt_1Bh"])
)
)
+ (
df["opt_1Ccategory %"]
/ 100
* (
df["opt_1Ca"] * df["v"] * df["v"]
+ df["opt_1Cb"] * df["v"]
+ df["opt_1Cc"]
+ df["opt_1Cd"] / df["v"]
)
/ (
(
df["opt_1Ce"] * df["v"] * df["v"]
+ df["opt_1Cf"] * df["v"]
+ df["opt_1Cg"]
)
* (1 - df["opt_1Ch"])
)
)
)
)
# Calculating values for opt_2 using the formula
df["opt_2_value"] = (
df["l"]
* df["N_cars"]
* (
(
df["opt_2Acategory %"]
/ 100
* (
df["opt_2Aa"] * df["v"] * df["v"]
+ df["opt_2Ab"] * df["v"]
+ df["opt_2Ac"]
+ df["opt_2Ad"] / df["v"]
)
/ (
(
df["opt_2Ae"] * df["v"] * df["v"]
+ df["opt_2Af"] * df["v"]
+ df["opt_2Ag"]
)
* (1 - df["opt_2Ah"])
)
)
+ (
df["opt_2Bcategory %"]
/ 100
* (
df["opt_2Ba"] * df["v"] * df["v"]
+ df["opt_2Bb"] * df["v"]
+ df["opt_2Bc"]
+ df["opt_2Bd"] / df["v"]
)
/ (
(
df["opt_2Be"] * df["v"] * df["v"]
+ df["opt_2Bf"] * df["v"]
+ df["opt_2Bg"]
)
* (1 - df["opt_2Bh"])
)
)
+ (
df["opt_2Ccategory %"]
/ 100
* (
df["opt_2Ca"] * df["v"] * df["v"]
+ df["opt_2Cb"] * df["v"]
+ df["opt_2Cc"]
+ df["opt_2Cd"] / df["v"]
)
/ (
(
df["opt_2Ce"] * df["v"] * df["v"]
+ df["opt_2Cf"] * df["v"]
+ df["opt_2Cg"]
)
* (1 - df["opt_2Ch"])
)
)
)
)
# Calculating values for opt_3 using the formula
df["opt_3_value"] = (
df["l"]
* df["N_cars"]
* (
(
df["opt_3Acategory %"]
/ 100
* (
df["opt_3Aa"] * df["v"] * df["v"]
+ df["opt_3Ab"] * df["v"]
+ df["opt_3Ac"]
+ df["opt_3Ad"] / df["v"]
)
/ (
(
df["opt_3Ae"] * df["v"] * df["v"]
+ df["opt_3Af"] * df["v"]
+ df["opt_3Ag"]
)
* (1 - df["opt_3Ah"])
)
)
+ (
df["opt_3Bcategory %"]
/ 100
* (
df["opt_3Ba"] * df["v"] * df["v"]
+ df["opt_3Bb"] * df["v"]
+ df["opt_3Bc"]
+ df["opt_3Bd"] / df["v"]
)
/ (
(
df["opt_3Be"] * df["v"] * df["v"]
+ df["opt_3Bf"] * df["v"]
+ df["opt_3Bg"]
)
* (1 - df["opt_3Bh"])
)
)
+ (
df["opt_3Ccategory %"]
/ 100
* (
df["opt_3Ca"] * df["v"] * df["v"]
+ df["opt_3Cb"] * df["v"]
+ df["opt_3Cc"]
+ df["opt_3Cd"] / df["v"]
)
/ (
(
df["opt_3Ce"] * df["v"] * df["v"]
+ df["opt_3Cf"] * df["v"]
+ df["opt_3Cg"]
)
* (1 - df["opt_3Ch"])
)
)
)
)
# Removing unnecessary columns
df_final = df[["l", "N_cars", "v", "opt_1_value", "opt_2_value", "opt_3_value"]]
print(df_final)
Output:
l N_cars v opt_1_value opt_2_value opt_3_value
0 3.5 1000 100 1496.002370 1420.656629 1534.748740
1 5.7 500 110 750.997279 710.944885 767.411691
2 10.0 367 110 551.157686 521.754019 562.906668
3 11.1 1800 95 2685.551348 2554.477141 2756.164589
4 2.8 960 105 1439.467965 1364.815604 1475.082027
5 4.7 800 120 1206.116125 1138.614075 1229.225287
6 10.4 103 111 154.744048 146.445615 157.990346
7 20.1 1950 115 2933.825622 2773.297776 2990.828374
Solution 2:[2]
Another way to do it, not nearly as elegand as @ddejhon 's solution, tho:
def formula(input_index, factors_index):
formula = ((factors_df.loc[factors_index,'a']*input_df['v'][input_index]**2)+
(factors_df.loc[factors_index,'b']*input_df['v'][input_index])+
(factors_df.loc[factors_index,'c'])+
(factors_df.loc[factors_index,'d']/input_df['v'][input_index])
)/(
(factors_df.loc[factors_index,'e']*input_df['v'][input_index]**2)+
(factors_df.loc[factors_index,'f']*input_df['v'][input_index])+
(factors_df.loc[factors_index,'g'])
)*(1-factors_df.loc[factors_index,'h'])
return formula
index_list = [factors_df[factors_df['option'] == opt].index.tolist() for opt in factors_df['option'].unique().tolist()]
Edit1: got rid of that ugly nested for structure and replaced it with list comprehension
output_df = pd.DataFrame(np.repeat(input_df.values, len(factors_df['option'].unique()), axis=0))
output_df.columns = input_df.columns
output_df['option'] = factors_df['option'].unique().tolist()*len(input_df.index)
output_df['formula'] = [n for sub_list in [[sum(factors_df['category %'].unique()[k]/100 * formula(i,j[k])
for k in range(len(factors_df['category'].unique())))
for j in index_list] for i in input_df.index] for n in sub_list]
output_df['result'] = output_df['l'] * output_df['N_cars'] * output_df['formula']
Output:
output_df
l N_cars v option formula result
0 3.5 1000.0 100.0 opt_1 1.546196 5411.685077
1 3.5 1000.0 100.0 opt_2 1.461442 5115.048256
2 3.5 1000.0 100.0 opt_3 1.571710 5500.985916
3 5.7 500.0 110.0 opt_1 1.552751 4425.339734
4 5.7 500.0 110.0 opt_2 1.463121 4169.893681
5 5.7 500.0 110.0 opt_3 1.573192 4483.595803
6 10.0 367.0 110.0 opt_1 1.552751 5698.595376
7 10.0 367.0 110.0 opt_2 1.463121 5369.652565
8 10.0 367.0 110.0 opt_3 1.573192 5773.612841
9 11.1 1800.0 95.0 opt_1 1.542578 30820.717985
10 11.1 1800.0 95.0 opt_2 1.460466 29180.106606
11 11.1 1800.0 95.0 opt_3 1.570810 31384.785443
12 2.8 960.0 105.0 opt_1 1.549580 4165.270216
13 2.8 960.0 105.0 opt_2 1.462324 3930.726187
14 2.8 960.0 105.0 opt_3 1.572499 4226.877893
15 4.7 800.0 120.0 opt_1 1.558526 5860.057879
16 4.7 800.0 120.0 opt_2 1.464497 5506.509637
17 4.7 800.0 120.0 opt_3 1.574334 5919.496692
18 10.4 103.0 111.0 opt_1 1.553361 1663.960420
19 10.4 103.0 111.0 opt_2 1.463271 1567.455541
20 10.4 103.0 111.0 opt_3 1.573319 1685.339848
21 20.1 1950.0 115.0 opt_1 1.555727 60976.735053
22 20.1 1950.0 115.0 opt_2 1.463842 57375.300546
23 20.1 1950.0 115.0 opt_3 1.573800 61685.075902
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ddejohn |
| Solution 2 |
