'Power BI Python Script truncates dataframe after first use
When I run a script normally on Python, it runs smoothly, however, when running the same script on Power Bi, it does not work as expected.
Hi, I have the following python script that runs perfectly on VS Code:
# 'dataset' holds the input data for this script
import pandas as pd
import numpy as np
dataset = pd.read_csv('PBI_Test.csv')
used = dataset[dataset.PPSNU_1D_Qty > 0].sort_values(by='luci_price',ascending=False)[['service_order_id','item_product_id','Part_Type']].groupby(by='service_order_id')[['item_product_id','Part_Type']].agg(' | '.join).reset_index()
used.columns = ['service_order_id','used_item_id','used_part_type']
req = dataset[dataset.PPSNR_1D_Qty > 0].sort_values(by='luci_price',ascending=False)[['service_order_id','item_product_id','Part_Type']].groupby(by='service_order_id')[['item_product_id','Part_Type']].agg(' | '.join).reset_index()
req.columns = ['service_order_id','req_item_id','req_part_type']
final = pd.merge(left=used,right=req,on='service_order_id')
final
however, when I use the same exact script on Power Bi to manipulate the dataset, it gives me the following error:
ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
I know it happens because "req" does not have a Part_Type column but I don't understand why.
req.columns = ['service_order_id','req_item_id','req_part_type']
Any help would be greatly appreciated.
Solution 1:[1]
I figured it out. I suppose the group by join with a column with null values which this dataframe had:
req = dataset[dataset.PPSNR_1D_Qty > 0]
was causing issues. so I dropped any null values in Part_Type and it worked.
dataset = dataset[dataset.part_type.notna()]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rodriguez |
