Category "pandas"

How to get the Isoweek from DatetimeIndex

I have a simple pandas dataframe with a date as index: import pandas as pd data = {'date': ['2010-01-04','2014-03-15','2017-07-15','2019-12-28','2005-01-03'],

Select previous row every hour in pandas

I am trying to obtain the closest previous data point every hour in a pandas data frame. For example: time value 0 14:59:58 15 1 15:00:10 2

How to match Datetimeindex for all but the year?

I have a dataset with missing values and a Datetimeindex. I would like to fill this values with the mean values of other values reported at the same month, day

pandas equivalent to mutate accros

I would like to perform following operation in Pandas: library(tidyverse) df <- tibble(mtcars) df %>% select(ends_with('t')) %>% head(3) # A

Pyinstaller - app without needed library on macOS

I've prepared python script (using pycharm in both OS, projects with venv, pyinstaller cpmmand run in pycharm terminal) which begins with 'import pandas' and wa

importing data from csv - could not convert string to float

I am having difficulties importing some data from a csv file. Input from csv file (extract): Speed;A [rpm];[N.m] 700;-72,556 800;-58,9103 900;-73,1678

convert float64 (from excel import) to str using pandas

although the same question has been asked multiple times. I dont seem to make it work. I use python 3.8 and I rean an excel file like this df = pd.read_excel(r"

Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new su

convert float64 (from excel import) to str using pandas

although the same question has been asked multiple times. I dont seem to make it work. I use python 3.8 and I rean an excel file like this df = pd.read_excel(r"

Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new su

How to extract the query result from a Hive job output logs using DataprocHiveOperator?

I am trying to build a data migration pipeline using Airflow, source being a Hive table on a Dataproc cluster and the destination is BigQuery. I'm using Datapro

Deleting multiple rows under same App Name but with different number of reviews

I have a dataframe having many columns, 2 of them being 'App' and 'Reviews'. I discovered that for the same app there are multiple rows because they differ in t

How to create dummy variable for specifc values in a column?

I want to create a dummy variable for a specific value in a column. Let's say my database looks like this : I want a dummy variable just for the museums. pd.ge

Pandas combining slices and list to select columns

Let us assume that a DataFrame df has the following columns: ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7'] We can use a slice or a list to select some columns: Wit

Perform a merge by date field without creating an auxiliary column in the DataFrame

Be the following DataFrames in python pandas: | date | counter | |-----------------------------|------------------| | 2022-01-0

iterating different length arrays and replace values

I have a dataframe that looks like this: df = pd.DataFrame({'col1': [[[1,5,3],[0,0,0]], [[1,2,3],[0,0,0], [1,2,3]]]}) # which looks like this: col1 0 [[1

How to plot distribution of missing values in a dataframe

I have a data frame with 100's of column and would like to investigate the proportion of missing values by plotting graph. I'm able to get the proportion using

removing columns with pandas from csv - not found in axis

I'm trying to remove 1 column from .csv but I'm receiving an error. import pandas as pd df.drop("First Invoice #", axis = 1, inplace= True) KeyError: "['First

Concat null columns data with actual data in pandas?

I have set of columns need to be merged into single column where some columns have data and some don't have where it should be joined with the data to single co

pandas, creating dataframes based on tuple

I have a tuple that has data for several categories. Now I want to extract small dataframes from this tuple for each category based on a list I created. I want