Category "dataframe"

Pandas Dataframe: Replacing NaN with row average

I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like df.fil

DATAFRAME TO BIGQUERY - Error: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp1yeitxcu_job_4b7daa39.parquet'

I am uploading a dataframe to a bigquery table. df.to_gbq('Deduplic.DailyReport', project_id=BQ_PROJECT_ID, credentials=credentials, if_exists='append') And I

What is the difference between combine_first and fillna?

These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use

Grouping by multiple columns to find duplicate rows pandas

I have a df id val1 val2 1 1.1 2.2 1 1.1 2.2 2 2.1 5.5 3 8.8 6.2 4 1.1 2.2 5 8.8 6.2 I want t

How can I merge an empty data frame and a data frame in R

I'm trying to merge to data frames like this: data1 <- data.frame(hola = as.numeric(), toma = as.character()) data2 <- data.frame(hola = as.numeric(1), t

Pandas - dataframe groupby - how to get sum of multiple columns

This should be an easy one, but somehow I couldn't find a solution that works. I have a pandas dataframe which looks like this: index col1 col2 col3 col4

Python for Google Sheets: write dataframes to different sheets in the same workbook

Using the code below, I am able to write the dataframe df1 to the default first sheet (starting at cell ‘B7’) of the Google Sheet workbook. In the s

Python Pandas - Concat dataframes with different columns ignoring column names

I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings

Python Pandas - Concat dataframes with different columns ignoring column names

I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings

How to split a DataFrame based on consecutive index?

I have a DataFrame 'work' with non consecutive index, here is an example: Index Column1 Column2 4464 10.5 12.7 4465 11.3 12.8 4466 10.3 22.8 5123 1

Passing dataframe and using its name to create the csv file

I have a requirment where i need to pass different dataframes and print the rows in dataframes to the csv file and the name of the file needs to be the datafram

How to reorder indexed rows based on a list in Pandas data frame

I have a data frame that looks like this: company Amazon Apple Yahoo name A 0 130 0 C 173 0 0 Z 0 0

Why does lm generate NA for each independent variable?

I tried to make a linear regression with the lm function, but the output is NA for every independent variable. The dataframe is numeric. I have already tried t

Find the column name which has the maximum value for each row

I have a DataFrame like this one: In [7]: frame.head() Out[7]: Communications and Search Business General Lifestyle 0 0.745763 0.050847 0.118644

How To Solve KeyError: u"None of [Index([..], dtype='object')] are in the [columns]"

I'm trying to create a SVM model from what I found in github here, but it keeps returning this error. Traceback (most recent call last): File "C:\Users\Me\Do

Filter rows in csv file based on another csv file and save the filtered data in a new file

Good day all so I was trying to filter file2 based on file1, where file1 is a subset from file2. But file2 has a description column that I need to be able to an

Spark Scala Split dataframe into equal number of rows

I have a Dataframe and wish to divide it into an equal number of rows. In other words, I want a list of dataframes where each one is a disjointed subset of the

Sum over previous periods for each period for each subject - R

A MWE is as follows: library(dplyr) Period <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) Subject <- c(1, 2, 3, 1, 2, 3, 1, 2, 3) set.seed(1) Values <- round(rnor

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

I have a dataframe which looks like this: A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657

removing NA values from a DataFrame in Python 3.4

import pandas as pd import statistics df=print(pd.read_csv('001.csv',keep_default_na=False, na_values=[""])) print(df) I am using this code to create a data