'GLMM: Needing overall advice on selecting model terms for glmm modelling in R

I would like to create a model to understand how habitat type affects the abundance of bats found, however I am struggling to understand which terms I should include. I wish to use lme4 to carry out a glmm model, I have chosen glmm as the distribution is poisson - you can't have half a bat, and also distribution is left skewed - lots of single bats.

My dataset is very big and is comprised of abundance counts recorded by an individual on a bat survey (bat survey number is not included as it's public data). My data set includes abundance, year, month, day, environmental variables (temp, humidity, etc.), recorded_habitat, surrounding_habitat, latitude and longitude, and is structured like the set shown below. P.S Occurrence is an anonymous recording made by an observer at a set location, at a location a number of bats will be recorded - it's not relevant as it's from a greater dataset.

occurrence abundance latitude longitude year month day (environmental variables
3456 45 53.56 3.45 2000 5 3 34.6
surrounding_hab recorded_hab
A B

Recorded habitat and surrounding habitat range in letters (A-I) corresponding to a habitat type. Also, the table is split as it wouldn't fit in the box.

These models shown below are the models I think are a good choice.

rhab1 <- glmer(individual_count ~ recorded_hab + (1|year) + latitude + longitude + sun_duration2, family = poisson, data = BLE) 
summary(rhab1)

rhab2 <- glmer(individual_count ~ surrounding_hab + (1|year) + latitude + longitude + sun_duration2, family = poisson, data = BLE)
summary(rhab2)

I'll now explain my questions in regards to the models I have chosen, with my current thinking/justification.

Firstly, I am confused about the mix of categorical and numeric variables, is it wise to include the environmental variables as they are numeric? My current thinking is scaling the environmental variables allowed the model to converge so including them is okay?

Secondly, I am confused about the mix of spatial and temporal variables, primarily if I should include temporal variables as the predictor is a temporal variable. I'd like to include year as a random variable as bat populations from one year directly affect bat populations the next year, and also latitude and longitude, does this seem wise?

I am also unsure if latitude and longitude should be random? The confusion arises because latitude and longitude do have some effect on the land use.

Additionally, is it wise to include recorded_habitat and surrounding_habitat in the same model? When I have tried this is produces a massive output with a huge correlation matrix, so I'm thinking I should run two models (year ~ recorded_hab) and (year ~ surrounding_hab) then discuss them separately - hence the two models.

Sorry this question is so broad! Any help or thinking is appreciated - including data restructuring or model term choice. I'm also new to stack overflow so please do advise on question lay out/rules etc if there are glaringly obvious mistakes.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source