'Bayesian modeling of repeated binary measurements in PyMC3 (Python)

I am going to run a study in which multiple raters have to evaluate whether each of a number of papers is '1' or '0'. The reason I use multiple raters is that I suspect that each individual rater is likely to make mistakes, and I hope that by using multiple raters I can control for that.

My aim is to estimate the true proportion of '1' in the population of papers, and I want to do this using a bayesian model in PyMC3. More general answers about model specification without the concrete implementation in PyMC3 are of course also welcome.

This is how I've simulated some data:

n = 250 # number of papers we sample
p = 0.3 # true rate

true_sample = binom.rvs(1, 0.3, size=n)

# add error

def rating(array,error_rate):
    scores = []
    for i in array:
        scores.append(np.random.binomial(i, error_rate)) 
    return np.array(scores)

r = 10 # number of raters
r_error = np.random.uniform(0.7, 0.99,10) # how often does each rater rate a paper correctly

#get the data
                      
rated_data = {}
                      
for i in range(r):
    rated_data[f'rater_{i}'] = rating(true_sample, r_error[i])

df = pd.DataFrame(rated_data, index = [f'abstract_{i}' for i in range(250)])

This is the model I have tried:

with pm.Model() as binom_model2:
    
    p = pm.Beta('p',0.5,0.5) # this is the proportion of '1' in the population
    
    for i in range(10):      # error_r and p for each rater separately
    
        er = pm.Beta(f'er{i}',10,3)
        prob = pm.Binomial(f'prob{i}', p = (p * er), n = n,observed = df.iloc[:,i].sum() )

This seems to work fine, in that it gives good estimates of p and error_r (but do tell me if you think there are problems with the model!). However, it doesn't use all information that is available, namely, the fact that the ratings on each row of the dataframe are ratings of the same paper. I presume that a model that could incorporate this, would give even more accurate estimates of p and of the error-rates. I'm not sure how to do this, and any help would be appreciated.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Bayesian modeling of repeated binary measurements in PyMC3 (Python)

Sources

Related Questions