Category "pandas"

How to do point biserial correlation for multiple columns in one iteration

I am trying to calculate a point biserial correlation for a set of columns in my datasets. I am able to do it on individual variable, however if i need to calcu

using statsmodels with a groupby

Consider this simple example import pandas as pd import statsmodels.formula.api as sm df = pd.DataFrame({'Y' : [1,2,3,4,5,6,7], 'X' : [2,3,4

Pandas and scikit-learn: KeyError: [....] not in index

I do not understand why do I get the error KeyError: '[ 1351 1352 1353 ... 13500 13501 13502] not in index' when I run this code: cv = KFold(n_splits=10) fo

Flatten a nested JSON? [duplicate]

I am trying to flatten the following JSON and flatten it hierarchically: https://justpaste.it/6e60p I am using pandas json_normalize function

Convert pandas.groupby to dict

Consider, dataframe d: d = pd.DataFrame({'a': [0, 2, 1, 1, 1, 1, 1], 'b': [2, 1, 0, 1, 0, 0, 2], 'c': [1, 0, 2, 1, 0, 2, 2]

Python: How to create multi line cells in excel when exporting a pandas dataframe

I have the following pandas Dataframe df = pd.DataFrame([ [['First Line', 'Second line']], [['First line', 'second line', 'third line']], [['first l

Load Pandas Dataframe to S3 passing s3_additional_kwargs

Please excuse my ignorance / lack of knowledge in this area! I'm looking to upload a dataframe to S3, but I need to pass 'ACL':'bucket-owner-full-control'. i

Python plotly Scattermapbox define colors by category

I want to draw some colored areas on a map. The coordinates are defined in a dataframe and I want each area to have a different color depending on the test_type

Python dictionary, how can I create a key with a string and the actual key combined?

I hope this is a quite easy question, but for me without a lot of python background I can't find an answer. df = pd.DataFrame( {'Messung': ['10bar','10bar',

OptionError:'Pattern matched multiple keys' pandas

I am trying to read a excel file. import requests url = 'http://www.nepalstock.com/todaysprice/export' r = requests.get(url, allow_redirects=True) open('todaypr

Search and filter text from a column using Pyspark

I am new to Data Scraping. I am reading the data from a file having JSON objects as one row {"name": "Soul Sweet \u2018Taters (Step-by-Step!)", "ingredients":

How do I melt a pandas with custom nam

I have a table like this device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd PNB0Q

How do I melt a pandas with custom nam

I have a table like this device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd PNB0Q

Percent change using Pandera for Pandas DataFrame

I have the following DataFrame. I need to do validation of balance and other numeric measures over date range. I want to check if for any group and date, the ba

Python Script to find file names from CSV will not concatenate

I am writing a script that will allow me to extract a segment of image files from a large folder. I put the image file names into a dataframe. I am having prob

Network Flow Dataframe - Merging Memory Error - Unable to allocate array with shape and data type

I have big 3 CSV files and they are all 76 same columns. The number of rows are different 17809 rows - 124262 rows - 108779 rows I am trying to merge these 3 d

Accumulate 1 and Reset to 0 once condition is met

Currently I have a dataset below and I try to accumulate the value if ColA is 0 while reset the value to 0 (restart counting again) if the ColA is 1 again. Col

pandas non equi join in range

I need to do 'pandas non equi join', when first table joined with second table in range. first_table EMPLOYEE_ID SALARY 100 3000.00 101 17000.00 102

Creating New columns from other pandas column

I would like to create a new Column from the genres column. The genres column contains one or multiple genres and I would like to create a column for each genre

Can't take value from series in python

I'm trying to get value from the pandas series. Like in the arrays I'm trying to get 3. value with tempArray[3] but the code gives me where the value inside the