'Sanity check: mixed model in lme4 with "complex" interactions
This is a little fiddly, but I'll do my best to explain.
Background. I have repeated-measures data for 10,000 human couples across 14 measurements. I'm modelling the within-pair association for sex/age-standardised traits (e.g., BMI z-scores). I've fit the following model with random intercepts and slopes using lme4:
lmer(bmi ~ BMI + average_age + age_diff + (BMI|hhid), data = d5)
I find (as expected) that there's a lot of variance in slopes between couples. I hypothesised that absolute BMI difference might affect the within-pair association.
To investigate this, I created a new variable measuring the BMI difference between each couple at baseline (i.e., their first measurement), calculated as follows: group_by(coupleID) %>% mutate(baseline_diff = abs(first(bmi) - abs(first(BMI)))
I then dropped the first observation for each couple from the dataset and re-fit the model, adding baseline_diff as an interaction effect:
lmer(bmi ~ BMI*baseline_diff + (BMI|CoupleID), data = d5)
I found that the interaction term significantly moderated with within-pair association (the more similar couples are at baseline, the stronger their association across subsequent measurements). You can see this on the plot below:
However, there are two problems: 1) the baseline difference is likely to have a stronger effect on early measurements than on later ones; 2) the baseline is an arbitrary point and, in reality, trait difference at all stages (e.g., diff_time1, diff_time2, diff_time3) will continuously affect the within-pair association.
Key issue. Therefore, what I want to do is measure the ongoing effect of continuous absolute BMI difference on the within-pair association across all measurements.
I was wondering whether I could do this by creating a lagged BMI difference variable, which, for each observation, measures the previous difference, as follows: group_by(hhid) %>% mutate(lag_diff = abs(lag(bmi - lag(BMI)))). My idea was then to fit this variable as an interaction effect, like this:
lmer(bmi ~ BMI*lag_diff + (BMI|CoupleID), data = d5)
In this model, the lagged difference significantly moderates the association (once again, larger differences = weaker association).
So, 1) does this model make sense? 2) Is there a better way to achieve what I want to achieve?
I've provided an example of the data structure, including all the new variables I've created, below (sorry it's a bit brief):
| CoupleID | bmi | BMI | baseline_diff | lag_diff | observations |
|---|---|---|---|---|---|
| 1 | -0.65 | -0.08 | 0.47 | 0.47 | 1 |
| 1 | -0.49 | -1.04 | 0.47 | 0.56 | 2 |
| 1 | -0.62 | 0.47 | 0.47 | 0.54 | 3 |
| 1 | -0.45 | 0.42 | 0.47 | 1.09 | 4 |
| 1 | -0.48 | -0.49 | 0.47 | 0.87 | 5 |
Thank you in advance for any help!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

