'Power BI Python Script truncates dataframe after first use

When I run a script normally on Python, it runs smoothly, however, when running the same script on Power Bi, it does not work as expected.

Hi, I have the following python script that runs perfectly on VS Code:

# 'dataset' holds the input data for this script

import pandas as pd
import numpy as np

dataset = pd.read_csv('PBI_Test.csv')

used = dataset[dataset.PPSNU_1D_Qty > 0].sort_values(by='luci_price',ascending=False)[['service_order_id','item_product_id','Part_Type']].groupby(by='service_order_id')[['item_product_id','Part_Type']].agg(' | '.join).reset_index()
used.columns = ['service_order_id','used_item_id','used_part_type']

req = dataset[dataset.PPSNR_1D_Qty > 0].sort_values(by='luci_price',ascending=False)[['service_order_id','item_product_id','Part_Type']].groupby(by='service_order_id')[['item_product_id','Part_Type']].agg(' | '.join).reset_index()
req.columns = ['service_order_id','req_item_id','req_part_type']

final = pd.merge(left=used,right=req,on='service_order_id')
final

however, when I use the same exact script on Power Bi to manipulate the dataset, it gives me the following error:

ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements

I know it happens because "req" does not have a Part_Type column but I don't understand why.

req.columns = ['service_order_id','req_item_id','req_part_type']

Any help would be greatly appreciated.



Solution 1:[1]

I figured it out. I suppose the group by join with a column with null values which this dataframe had:

req = dataset[dataset.PPSNR_1D_Qty > 0]

was causing issues. so I dropped any null values in Part_Type and it worked.

dataset = dataset[dataset.part_type.notna()]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rodriguez