Category "pandas"

When one of my column in dataframe is nested list, how should i transform it to multi-dimensional np.array?

I have the following data frame. test = { "a": [[[1,2],[3,4]],[[1,2],[3,4]]], "b": [[[1,2],[3,6]],[[1,2],[3,4]]] } df = pd.DataFrame(test) df a b 0

Filter rows in dataframe based on value counts [duplicate]

I have a large dataframe/Questionaire df (871 x 24) containing a column named "Identifier" which stores an unique ID for each of the participa

Save multiple/distinct .CSV files after for loop execution

I have 65 xml files that I need to convert to .CSV, and save each converted file as a separate .CSV file. I have tried using a for loop but am not having any lu

Testing in pandas library: Why is function style chosen over class based testing?

Why is functional style testing facilitating testing compared to class based testing? Is this just additional library specific functionality or are there any ge

Groupby and create a dummy =1 if column values do not contain 0, =0 otherwise

My df id var1 A 9 A 0 A 2 A 1 B 2 B 5 B 2 B 1 C 1 C 9 D 7 D 2 D 0 .. desired output will ha

Importing pandas_profiling

I am working on Automating the EDA, while I want to import the pandas_profiling, I am facing an error: ImportError: cannot import name 'soft_unicode' from 'mark

Pandas Lookup to be deprecated - elegant and efficient alternative

The Pandas lookup function is to be deprecated in a future version. As suggested by the warning, it is recommended to use .melt and .loc as an alternative. df =

Use a list with function names to iteratively apply over a dataframe column

Context: I'm allowing a user to add specific methods for a cleaning process pipeline (appended to a main list with all the methods chosen). Each element from th

AttributeError: 'numpy.ndarray' object has no attribute 'columns' even after using pandas dataframe

import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split,cross_val_score from sklearn.tree import DecisionTreeCl

Append new data into an existing frame and upload to sheets Python

I'm connected to my APIs client, sent the credentials, I made the request, I asked the API for data and put it to a DF. Then, I have to upload this data to a sh

Python - Hours between two dates, excluding weekends

I'm doing my first steps in python programing language. I want to create a script that aims to open an excel file and add an extra column that will be the hourl

Pandas: Calculate Difference between a row and all other rows and create column with the name

We have data as below Name value1 Value2 finallist 0 cosmos 10 20 [10,20] 1 network 30 40 [30,40] 2 unab 20 40 [20,40]

Updating a Value of A Panda Dataframe with a Function

I have a function which updates a dataframe that I have passed in: def update_df(df, x, i): for i in range(x): list = ['name' + str(i), i + 2, i - 1

ModuleNotFoundError: pandas 1.3.5 with pyinstaller 4.10

I'm trying to compile a python script using pyinstaller and pyinstaller says " 10230 INFO: Building EXE from EXE-00.toc completed successfully" but when I execu

Dataframe Operation Splicing

I have a single column dataframe without headers and I want to split it into multiple columns as follows The current dataframe - 1 2 3 4 5 . . 100 I want to re

python how to use string value for custom sort?

I have an datafremae like this time_posted 0 5 days ago 1 an hour ago 2 a day ago 3 6 hours ago 4 4 hours ago I tried this df.sort_values(by='time_p

How to correctly generate training data based on percentages?

I have a question. I am currently generating training data for my bayesian network as follows: (also as code down below) -> infected stands for people who a

Improve performance of LineString creation, that currently is created by a lambda function

I have a dataframe like this (this example has only four rows, but in practice it has O(10^6) rows): DF: nodeid lon lat wayid 0 1 1.70

When using read_sql_query in pandas, how to write the SQL across multiple lines?

my question is pretty much what it sounds like: Is it possible to write my SQL across multiple lines for ease of reading when using the read_sql_query method pl

Unable to read a column of an excel by Column Name using Pandas

Excel Sheet I want to read values of the column 'Site Name' but in this sheet, the location of this tab is not fixed. I tried, df = pd.read_excel('TestFile.xlsx