'How to deal with 0 occurrence of one category in binary logistic regression in r?
I have two datasets holding the survival data of plants in 3 habitats at pre (T1) & post (T2) winter. At the beginning (T0) I had 40 plants in each habitat. At T1 I had 27 survivals at the Beach, 40 survivals in the Grey Dune and 38 survival in the White Dune (survival = 1)
T1 Habitat Survival Beach GreyDune WhiteDune 0 13 0 2 1 27 40 38
At T2 I had the following
T2:
Habitat
Survival Beach GreyDune WhiteDune
0 25 2 4
1 15 38 36
The data is stored in a table 1. column "Habitat" with levels = Beach, WhiteDune, GreyDune, 2. column "Survival" with either 0 (= dead) or 1(=alive) per plant
Now, I want to investigate if the survival (survival = 1) can be predicted by the habitat (Beach, White Dune, Grey Dune), & if the winter season has an impact on the survival chance.
My approach was to use a general liner model, family = binary
E.g. for T1 (pre winter)
```
mT1<-glm(Survival~Habitat, data=survivalT1, family=binomial(link='logit'))
```
E.g. for T2 (post winter)
```
mT2<-glm(Survival~Habitat, data=survivalT2, family=binomial(link='logit')
```
and then a combined model of both
```
mT1.T2 <- glm(Survival ~ Habitat * Time, data = survivalT1.T2, family =
binomial(link='logit')
```
However, because I have a 100% survival in the Grey Dunes at T1 my model output does not make sense, as there is no significance between Beach and Grey Dune
```
lsmeans(mT1, pairwise~Habitat, adjust="Tukey")
```
$lsmeans
Habitat lsmean SE df asymp.LCL asymp.UCL
Beach 0.731 0.338 Inf 0.069 1.39
GreyDune 19.566 1700.359 Inf -3313.076 3352.21
WhiteDune 2.944 0.726 Inf 1.523 4.37
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$contrasts
contrast estimate SE df z.ratio p.value
Beach - GreyDune -18.84 1700.4 Inf -0.011 0.9999
Beach - WhiteDune -2.21 0.8 Inf -2.766 0.0156
GreyDune - WhiteDune 16.62 1700.4 Inf 0.010 0.9999
Results are given on the log odds ratio (not the response) scale. P value adjustment: tukey method for comparing a family of 3 estimates
However, if I only look at the interaction between Beach & Grey Dune at T1, I do get a logical result.
Analysis of Deviance Table
Model: binomial, link: logit
Response: Alive
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 79 102.298
Habitat 1 33.492 78 68.806 7.156e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> lsmeans(tm, pairwise ~ Habitat, adjust = "Tukey")
$lsmeans
Habitat lsmean SE df asymp.LCL asymp.UCL
Beach -0.511 0.327 Inf -1.15 0.129
GreyDune 2.944 0.725 Inf 1.52 4.366
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$contrasts
contrast estimate SE df z.ratio p.value
Beach - GreyDune -3.46 0.796 Inf -4.343 <.0001
Results are given on the log odds ratio (not the response) scale.
Now, I like to ask if there is an opportunity to work around the 100% survival (or success) at T1 so I can still run a logistic regression and compare all three habitats, as well as pre- & post winter survival.
Thank you in advance for your input.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
