Category "pandas"

Pandas approximating/rounding large numbers from csv

I am reading numbers from a csv file into a pandas dataframe. When the numbers I am reading are approximately >1E12, pandas will approximate the number to 3

How to create ratios using value counts and separate fields in Python?

Using the data frame shown below I'd like to create manager to assistant and manager to associate percentages/ ratios based/ per location. I'm looking for the

Searching a value within range between columns in pandas (not date columns and no sql)

thanks in advance for help. I have two dataframes as given below. I need to create column category in sold frame based on information in size frame. It should c

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I want to filter my dataframe with an or condition to keep rows with a particular column's values that are outside the range [-0.25, 0.25]. I tried: df = df[(df

Sort multiIndex table based on other table

I have a multiIndex data frame like this probe_names PLAGL1 GRB10 MEST H19 KCNQ1OT1 MEG3 MEG8 SNRPN \ Patient_1 0 0.55 0.53 0.53

Compare two excel files for the difference using pandas with multiple tabs

I found this nice script online which does a great job comparing the differences between 2 excel sheets but there's an issue - it doesn't work if the excel file

I have a dataframe with a json substring in 1 of the columns. i want to extract variables and make columns for them

imports json df = pd.read_json("C:/xampp/htdocs/PHP code/APItest.json", orient='records') print(df) I would like to create three columns extra: ['name','l

how to "transpose" datas from a date to another one in python

Sorry i had a lot of trouble explaining my problem in the title but i hope it will be more understandable with this example : i have a data source that tells me

Pandas rolling window cumsum, with incomplete series

I have a pandas df as follows: YEAR MONTH USERID TRX_COUNT 2020 1 1 1 2020 2 1 2 2020 3 1 1 2020 12

When one of my column in dataframe is nested list, how should i transform it to multi-dimensional np.array?

I have the following data frame. test = { "a": [[[1,2],[3,4]],[[1,2],[3,4]]], "b": [[[1,2],[3,6]],[[1,2],[3,4]]] } df = pd.DataFrame(test) df a b 0

Filter rows in dataframe based on value counts [duplicate]

I have a large dataframe/Questionaire df (871 x 24) containing a column named "Identifier" which stores an unique ID for each of the participa

Save multiple/distinct .CSV files after for loop execution

I have 65 xml files that I need to convert to .CSV, and save each converted file as a separate .CSV file. I have tried using a for loop but am not having any lu

Testing in pandas library: Why is function style chosen over class based testing?

Why is functional style testing facilitating testing compared to class based testing? Is this just additional library specific functionality or are there any ge

Groupby and create a dummy =1 if column values do not contain 0, =0 otherwise

My df id var1 A 9 A 0 A 2 A 1 B 2 B 5 B 2 B 1 C 1 C 9 D 7 D 2 D 0 .. desired output will ha

Importing pandas_profiling

I am working on Automating the EDA, while I want to import the pandas_profiling, I am facing an error: ImportError: cannot import name 'soft_unicode' from 'mark

Pandas Lookup to be deprecated - elegant and efficient alternative

The Pandas lookup function is to be deprecated in a future version. As suggested by the warning, it is recommended to use .melt and .loc as an alternative. df =

Use a list with function names to iteratively apply over a dataframe column

Context: I'm allowing a user to add specific methods for a cleaning process pipeline (appended to a main list with all the methods chosen). Each element from th

AttributeError: 'numpy.ndarray' object has no attribute 'columns' even after using pandas dataframe

import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split,cross_val_score from sklearn.tree import DecisionTreeCl

Append new data into an existing frame and upload to sheets Python

I'm connected to my APIs client, sent the credentials, I made the request, I asked the API for data and put it to a DF. Then, I have to upload this data to a sh

Python - Hours between two dates, excluding weekends

I'm doing my first steps in python programing language. I want to create a script that aims to open an excel file and add an extra column that will be the hourl