'how to handle the issue of dependent observations in multinomial logistic regression model
I have recently conducted a study where I built a multinomial logistic regression model to investigate whether keystroke logging analytics (e.g., pause time in writing, general typing rate, revision behavior) predict argument elements (i.e., final claim, primary claim, data) in adult persuasive essay writing. In other words, I want to investigate whether adult writers exhibited different patterns of writing behaviors (manifested in their keystroke activities while writing on a keyboard) when they were producing different argument elements in their written argumentation.
Below is the structure of the data I used for the study:
'data.frame': 244 obs. of 11 variables:
$ ID : int 1 1 1 2 2 3 3 3 4 4 ...
$ Prompt : Factor w/ 2 levels "Appearance","Competition": 2 2 2 2 2 2 2 2 2 2 ...
$ element : Factor w/ 3 levels "Data","FinalClaim",..: 1 2 3 1 2 1 2 3 1 2 ...
$ product_process_ratio : num 0.885 0.864 0.992 0.797 0.827 ...
$ chars_process_per_min_incl_space : num 46.3 20.2 12 56 51.8 ...
$ mean_process_time_in_p_burst_pt200 : num 2.04 5.29 9.49 2.75 2.94 ...
$ mean_typed_chars_in_p_burst_pt200 : num 1.57 1.78 1.89 2.57 2.54 ...
$ mean_pause_time_in_seconds_pt200 : num 0.786 1.08 0.643 0.41 0.395 ...
$ proportion_of_pause_time_pt200 : num 0.385 0.2031 0.0656 0.1477 0.1327 ...
$ mean_pause_time_sec_within_words_pt200 : num 0.357 0.538 0.354 0.321 0.375 ...
$ mean_pause_time_sec_between_words_pt200: num 1.28 1.611 1.777 0.712 0.45 ...
Initially, I built a multinomial logistic regression model using the nnet R package where I entered the three-class categorical variable "element" as the dependent variable. I then included the eight keystroke logging measures (from product_process_ratio to mean_pause_time_sec_between_words_pt200) and the categorical variable "prompt" as the independent variables. The model worked well and I've got some interesting results form the analyses.
But then I realized that the observations were not independent of each other. In this case, each participant produced 1-3 argument elements of different categories although each of them just wrote one essay. I am wondering if I should account for the random effects of the individuals.
To sum up, here are my two questions:
Given the dependence of observations in my data as each individual created at least one argument element of the three categories: Final Claim, Primary Claim, Data, should I build a multinomial mixed logistic regression model to investigate whether keystroke analytics (e.g., proportion of pause time, mean P-burst length, product process ratio) predict different argument elements? Or is it ok for me to stick with a simpler multinomial logistic regression model? In other words, do the dependent observations need to be concerned in a traditional multinomial logistic regression model?
If the dependency in the observations should be accounted for, what model should I use to analyze the data? Is there a good R package available for this purpose? (P.S. I've tried the mlogit package to build a multinomial mixed logit model, but it seems that this package cannot handle the type of data as the one I use here.)
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
