Category "r"

Calculate row means on subset of columns

Given a sample data frame: C1<-c(3,2,4,4,5) C2<-c(3,7,3,4,5) C3<-c(5,4,3,6,3) DF<-data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3) DF I

Automatically Create Tabbed skim() results with Proper Output format

I'm trying to create dynamically create tabbed output of skim() results in a R Notebook, but the output format comes out all funky. I'm using the asis results o

How can I check if a block of multiple lines matches certain criteria, without loops?

I have a data set with 2 million lines, so loops are not an option. The problem is about as follows: Each line is a transaction by a person. A person can have m

creating dummy variable from ordinal data in r

I have an ordinal variable with the following categories very favorable (1) somewhat favorable (2) somewhat unfavorable (3) very unfavorable (4) don't know (8)

R: Extract first number despite irregular delimiter from a junk data

I am working on a dataframe df that has thousands of rows of junk data in which the first number is to be extracted despite irregular delimiter: dummy_numbers =

How to find average of variable (x value) aggregated by subset (certain days) of another variable (year) in R?

I would like to see if it is possible to use the aggregate function to find the mean for values from Sep. to Oct. of multiple years. I would like to compare the

Is there a way to retrieve the data from a BART package model in R?

I was wondering if there was a way to retrieve the data from a model built from the BART package in R? It seems to be possible using other bart packages, such a

Extract Alpha Diversity Output into excel or .csv file from phyloseq package

Does anyone know how to extract alpha diversity outputs from the estimate_richness() function from the phyloseq package? I am having a hard time finding the cor

How should I preferably "run" Python scripts in RStudio? Through run (using reticulate::repl_python()?) or source?

I just started to use RStudio with Python (up to now everything works) and I wonder if there is a preferred way to run scripts such as my small Test.py containi

What is the meaning of these error messages in running pivot_wider() in RStudio?

I'm a newbie in R. Is there anyone who can help me? I import a CSV of extract of stackoverflow data from, s <- read_csv("https://www.ics.uci.edu/~duboisc/sta

find exact match with grep

I am attempting to take a fairly large dataframe of comments from a survey and use grep to identify comments that contain one of a list of keywords index <-

Add leading zeros only if the number starts with 6 in R dataframe

I have a dataframe with numbers that I need to format. How do I add leading zeroes to only numbers that starts with 6? All examples seen using str_pad() or spri

R iterations only saving the first value of a vector

Up until now I find any problem I have has been had and posted here already, but this time I'm really at a loss. I am running grep in R to look for a list of re

Twitter streaming error, 'invalid length argument'?

The code is as follows; Supplier_List <- data.frame(companies = c("company1","company2","company3")) Streamed_Tweets <- purrr::map_df(Supplier_List$compa

Why does training Xgboost model with pseudo-Huber loss return a constant test metric?

I am trying to fit an xgboost model using the native pseudo-Huber loss reg:pseudohubererror. However, it doesn't seem to be working since nor the training nor t

Summing across in a dataframe with condition coming from another column

this is not a very good title for the question. I want to sum across certain columns in a data frame for each group, excluding one column for each of my groups.

How to filter out a row if there are two consecutive instances of the same value?

I have a data frame with multiple similar sequences in which column Z has a string pattern containing "VALUE1" and "VALUE2" (only these two patterns matter) and

Path diagram in r

I am trying to plot a path diagram of a Structural Equation Model(SEM) in R. I was able to plot it using semPlot::semPaths(). The output is similar to The SEM

New column with week on week spending by store

I have a dataset that I need to track customers spending week by week based on the store. store <- c(1,2,3,4,5,6,1,2,3,4,5,6) week <- c(1,1,1,1,1,1,2,2,2,

How to use the start function in ts in R when date and time are included?

I am exporting data from a CSV file that has two columns. One has time and the other has power. The time columns has the time in two different formats: mm-dd-yy