Category "pandas"

How to keep top 500 rows a csv loop (python) and overwrite each file

I am trying to read more than 100 csv files in python to keep the TOP 500 rows (they each have more than 55,0000 rows). So far I know how to do that, but I need

How can I add a path to the CSV files created?

Im splitting a CSV file based on column "ColumnName". How can I make all the CSV files created save into a specified path? data = pd.read_csv(r'C:\Users\...\O

Pandas: return rows that have two matching columns commonality

I am trying to write a commonality script which will return rows in a pandas dataframe that have two matching columns, and also will sum up the number of rows w

import pandas throws TypeError: expected string or bytes-like object

After pip installing a private repo in my Conda environment I now get the error TypeError: expected string or bytes-like object when trying to import pandas. I

How to select top level columns in multi header pandas dataframe

I have a multi header dataframe and it looks like that: SPY ARKW Open Hig

Creating custom colourmap for geopandas.explore plot

all code: def rgb2hex(r,g,b): return '#{:02x}{:02x}{:02x}'.format(r,g,b) def rg(num): num = int(np.round((num / 100) * 124)) r = (124 - num) g

Convert JSON format column to new columns

I have a sub-Yelp Dataset in csv, and attributes column is in json format. I'm trying to convert that column to new columns, but none of the relevant code on di

BigQuery Results to Panda DataFrame in Chunks

I am trying to save the results of a BigQuery query to a Panda DataFrame using bigquery.Client.query.to_dataframe() This query can return millions of rows. Gi

How to do point biserial correlation for multiple columns in one iteration

I am trying to calculate a point biserial correlation for a set of columns in my datasets. I am able to do it on individual variable, however if i need to calcu

using statsmodels with a groupby

Consider this simple example import pandas as pd import statsmodels.formula.api as sm df = pd.DataFrame({'Y' : [1,2,3,4,5,6,7], 'X' : [2,3,4

Pandas and scikit-learn: KeyError: [....] not in index

I do not understand why do I get the error KeyError: '[ 1351 1352 1353 ... 13500 13501 13502] not in index' when I run this code: cv = KFold(n_splits=10) fo

Flatten a nested JSON? [duplicate]

I am trying to flatten the following JSON and flatten it hierarchically: https://justpaste.it/6e60p I am using pandas json_normalize function

Convert pandas.groupby to dict

Consider, dataframe d: d = pd.DataFrame({'a': [0, 2, 1, 1, 1, 1, 1], 'b': [2, 1, 0, 1, 0, 0, 2], 'c': [1, 0, 2, 1, 0, 2, 2]

Python: How to create multi line cells in excel when exporting a pandas dataframe

I have the following pandas Dataframe df = pd.DataFrame([ [['First Line', 'Second line']], [['First line', 'second line', 'third line']], [['first l

Load Pandas Dataframe to S3 passing s3_additional_kwargs

Please excuse my ignorance / lack of knowledge in this area! I'm looking to upload a dataframe to S3, but I need to pass 'ACL':'bucket-owner-full-control'. i

Python plotly Scattermapbox define colors by category

I want to draw some colored areas on a map. The coordinates are defined in a dataframe and I want each area to have a different color depending on the test_type

Python dictionary, how can I create a key with a string and the actual key combined?

I hope this is a quite easy question, but for me without a lot of python background I can't find an answer. df = pd.DataFrame( {'Messung': ['10bar','10bar',

OptionError:'Pattern matched multiple keys' pandas

I am trying to read a excel file. import requests url = 'http://www.nepalstock.com/todaysprice/export' r = requests.get(url, allow_redirects=True) open('todaypr

Search and filter text from a column using Pyspark

I am new to Data Scraping. I am reading the data from a file having JSON objects as one row {"name": "Soul Sweet \u2018Taters (Step-by-Step!)", "ingredients":

How do I melt a pandas with custom nam

I have a table like this device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd PNB0Q