Category "statistics"

How to plot correlation matrix/heatmap with categorical and numerical variables

I have 4 variables of which 2 variables are nominal (dtype=object) and 2 are numeric(dtypes=int and float). df.head(1) OUT: OS_type|Week_day|clicks|avg_app_s

How to get multiple combinations of multiple lists in python (Multiple n Choose K or nCr)

I have been looking on google and stack overflow for a few hours and I am sure there is an answer for what this is mathematically or perhaps it is just what the

Statistical difference between linear regressions

I have a statistical question on which I am stuck: Imagine you have 5 corn fields. You know the number of corn plant there is in each fields. You know want to c

why a specific model is not appropriate, given a data with 6 variables (they are chr variables)

i want to show why a specific model is not appropriate, given a data with 6 variables (they are chr variables) the model is y= abc*(x1+x2) a and b from the data

How to test symmetry of distribution in Python?

Given data I want to test symmetry of their distribution. In R is function symmetry.test(..) https://www.rdocumentation.org/packages/lawstat/versions/3.4/topics

Issue with corr.test() results

I am running corr.test() to look at potential correlations between genes and bacteria in a dataframe using this code: spearman=cor.test(FullSet$counts.Bac, Full

P value and critical value in hypothesis testing

I need little clarification in p value and critical value approach in hypothesis testing regarding below example. Null Hypothesis : population mean = 80 Altern

How do I determine the likelihood of my data coming from a model distribution using Julia?

I am trying to do a statistical analysis in Julia on experimental data. I tried to create a model and use Turing to obtain distributions for the mean and standa

Can I include covariates outside of the minimally sufficient set in a causal framework that aren't in the causal pathway?

I am applying a causal method to a cohort study analysis on pollutant exposure and disease X. Based on our understanding of the disease, we believe that aging i

How to combine countpct and binomCI into the same summary statistic to be used in tableby function?

I'm using the tableby function from the arsenal package to create summary tables. For most of the statistics I need to generate, this package gives me exactly t

Difference in R and SPSS LMM output

I am working on a linear mixed model and am attempting to run one on the same data in r and spss. I'm using a treatment with two levels, looking at 10 different

How can I find the mode (a number) of a kde histogram in python

I want to determine the X value that has the highest pick in the histogram. The code to print the histogram: fig=sns.displot(data=df, x='degrees', hue="TYPE", k

Generate underlying distribution from bins in python

I found a PDF document describing the income distribution in the US in 1978. Per income range I have the percentage of the population that falls in that income

perl: Finding mean and variance of large numbers without overflow

I am using a subroutine (stats) to calculate statistics for a list of numbers. These numbers may be big enough to lose precision if stored as normal perl number

How to perform a Levene's test using scipy

I've been trying to use scipy.stats.levene with no success. I have a numpy matrix with shape (2128, 45100). Each row is a sample and belongs to one of 3 cluste

How to run cor.test() on two different dataframes

I would like to run cor.test() on two seperate dataframes but I am unsure how to proceed. I have two example dataframes with identical columns (patients) but di

returning cov and std from sklearn gaussian process?

I can return the covariance or the standard deviation from a GP using sklearn, like: y, cov = gp.predict(Xpredict,return_cov=True) y, std = gp.predict(Xpredict,

Strange statistics in Google Play Developer Console

Today I noticed strange statistics in my Google Play Developer Console in one of my application It is about Final installs on active devices: 17 July - th

How can I apply fisher.test in R to a large matrix, and extract p-values to a new matrix?

I have a large matrix (12 rows, 53 columns) with counts of how many times genes in my clusters "A", "B", "C", etc. overlap with clusters created by someone else

error with SAlib library for Sensitivity analysis in python

I am trying to perform sensitivity analysis using Sobol`s method. I always get an error which i can not solve. the code and the result are below. the input vari