'R: logistic regression, glm&predict: which class is predicted?
I have used the glm function (family=binomial) to fit a logistic model on my data. The dependent variable is binary. When I use (details below)
predict(glm.fit, newdata=datapoint, type="response")
The function returns a probability. To which class does this probability belong? I.e. if the returned value is 0.95, which of the two classes is it supposed to belong to?
I cannot find documentation that explains how this is determined
note: glm.fit is the result of glm() datapoint is the data I want a prediction on
Solution 1:[1]
Classes are predicted alphabetically by default, so if you have yes and no as your classes, then model will by default predict for "No" as it comes alphabetically before Yes
Solution 2:[2]
This is essentially answered here: glmnet: How do I know which factor level of my response is coded as 1 in logistic regression , although only if you know that glmnet uses the same rules as glm. It is also useful to know that factors are ordered alphabetically by default, so if you are in case 1 below (a factor), with a two-level factor, the second level (in alphabetical order) corresponds to successes.
From ?binomial:
For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:
- As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
- As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).
- As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user8854136 |
| Solution 2 | Ben Bolker |
