'How to deal with 0 occurrence of one category in binary logistic regression in r?

I have two datasets holding the survival data of plants in 3 habitats at pre (T1) & post (T2) winter. At the beginning (T0) I had 40 plants in each habitat. At T1 I had 27 survivals at the Beach, 40 survivals in the Grey Dune and 38 survival in the White Dune (survival = 1)

T1 Habitat Survival Beach GreyDune WhiteDune 0 13 0 2 1 27 40 38

At T2 I had the following

T2:
Habitat Survival Beach GreyDune WhiteDune 0 25 2 4 1 15 38 36

The data is stored in a table 1. column "Habitat" with levels = Beach, WhiteDune, GreyDune, 2. column "Survival" with either 0 (= dead) or 1(=alive) per plant

Now, I want to investigate if the survival (survival = 1) can be predicted by the habitat (Beach, White Dune, Grey Dune), & if the winter season has an impact on the survival chance.

My approach was to use a general liner model, family = binary

E.g. for T1 (pre winter)

```
mT1<-glm(Survival~Habitat, data=survivalT1, family=binomial(link='logit'))
```

E.g. for T2 (post winter)

```
mT2<-glm(Survival~Habitat, data=survivalT2, family=binomial(link='logit')
```

and then a combined model of both

```
mT1.T2 <- glm(Survival ~ Habitat * Time, data = survivalT1.T2, family = 
    binomial(link='logit')
```

However, because I have a 100% survival in the Grey Dunes at T1 my model output does not make sense, as there is no significance between Beach and Grey Dune

```
lsmeans(mT1, pairwise~Habitat, adjust="Tukey")
```

$lsmeans
 Habitat   lsmean       SE  df asymp.LCL asymp.UCL
 Beach      0.731    0.338 Inf     0.069      1.39
 GreyDune  19.566 1700.359 Inf -3313.076   3352.21
 WhiteDune  2.944    0.726 Inf     1.523      4.37

Results are given on the logit (not the response) scale. 
Confidence level used: 0.95 

$contrasts
 contrast             estimate     SE  df z.ratio p.value
 Beach - GreyDune       -18.84 1700.4 Inf  -0.011  0.9999
 Beach - WhiteDune       -2.21    0.8 Inf  -2.766  0.0156
 GreyDune - WhiteDune    16.62 1700.4 Inf   0.010  0.9999

Results are given on the log odds ratio (not the response) scale. P value adjustment: tukey method for comparing a family of 3 estimates

However, if I only look at the interaction between Beach & Grey Dune at T1, I do get a logical result.

Analysis of Deviance Table

Model: binomial, link: logit

Response: Alive

Terms added sequentially (first to last)


        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                       79    102.298              
Habitat  1   33.492        78     68.806 7.156e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> lsmeans(tm, pairwise ~ Habitat, adjust = "Tukey")
$lsmeans
 Habitat  lsmean    SE  df asymp.LCL asymp.UCL
 Beach    -0.511 0.327 Inf     -1.15     0.129
 GreyDune  2.944 0.725 Inf      1.52     4.366

Results are given on the logit (not the response) scale. 
Confidence level used: 0.95 

$contrasts
 contrast         estimate    SE  df z.ratio p.value
 Beach - GreyDune    -3.46 0.796 Inf  -4.343  <.0001

Results are given on the log odds ratio (not the response) scale. 

Now, I like to ask if there is an opportunity to work around the 100% survival (or success) at T1 so I can still run a logistic regression and compare all three habitats, as well as pre- & post winter survival.

Thank you in advance for your input.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source