'Using R to collapse data into groups to get new weights, but still getting an infinite range

I have a dataset of neurosurgery patients for which I am creating survival curves. I am trying to adjust my curves to match the age-sex distribution of the 2000 US population, which is included in the R survival package. This 'uspop2' dataset is an array with and calendar year. First, I'm only going to look at ages 50 and over, so I'll create a table 'tab100' of observed age/sex counts within group for our own data, using the same upper age threshold. New weights are the values of 𝜋𝑝= pi.us/tab100.

Here is the first code I write (note that I am using R in rpy2 in google collab):

%%R

#Reweighting
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))

refpop <- uspop2[as.character(50:100),c("female", "male"), "2000"]
pi.us <- refpop/sum(refpop)
age100 <- factor(ifelse(mydata$Age..yrs. >100, 100, mydata$Age..yrs.), levels=50:100)
tab100 <- with(mydata, table(age100, mydata$Sex, mydata$group))/ nrow(mydata)
us.wt <- rep(pi.us, 3)/ tab100 #new weights by age,sex, group
range(us.wt)

This yields a range of 0.006709405 to Infinity! This infinite weight happens because the US population has all age-sex combos represented, but my neurosurgery patient dataset does not. To get rid of these infinite weights, I attempt to collapse the US population into separate age groups...

%%R

mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))

temp <- as.numeric(cut(50:100, c(49, 54, 59, 64, 69, 74, 79, 89, 110)+.5))
pi.us<- tapply(refpop, list(temp[row(refpop)], col(refpop)), sum)/sum(refpop)

print(pi.us)

tab2 <- with(mydata, table(mydata$Age..yrs., mydata$Sex, mydata$group))/nrow(mydata)

print(tab2)

us.wt <- rep(pi.us, 3)/tab2

print(range(us.wt))

index <- with(mydata, cbind(mydata$Age..yrs., mydata$Sex,
  as.numeric(mydata$group)))

mydata$uswt <- us.wt[index]
sfit3a <-survfit(Surv(Patient.LOS..days., Events) ~ group, data=mydata, weight=uswt)

Printing pi.us and tab2 show me that I did successfully collapse the ages into 8 groups. Yet when I set us.wt <- rep(pi.us, 3)/tab2, us.wt is still the exact same as before! It doesn't change. You can see below that the range outputted has a different lower bound, but still goes all the way up to infinity. It's no surprise, that I get a subscript out of bounds error for the next line of code. What the heck is going on?

[1] 0.4655699       Inf
R[write to console]: Error in `[.default`(us.wt, index) : subscript out of bounds


Error in `[.default`(us.wt, index) : subscript out of bounds

BTW I am basing my code exactly off of page 7 of this R paper: https://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf

What am I doing wrong? :( Thanks for your help!



Solution 1:[1]

This is an answer to your question but not a solution to your problem. Look at the index and us.wt objects. It should be apparent that the naming of margins of the us.wt array doesn't match the values in the third column of index

str(us.wt)
 'table' num [1:48, 1:3, 1:3] Inf Inf Inf Inf Inf ...
 - attr(*, "dimnames")=List of 3
  ..$ : chr [1:48] "2" "3" "4" "5" ...
  ..$ : chr [1:3] "" "F" "M"
  ..$ : chr [1:3] "Plunge <= 2 mm" "Plunge 2 - 4 mm" "Plunge > 4 mm"
> str(index)
 chr [1:240, 1:3] "2" "7" "11" "75" "59" "3" "88" "13" "75" "80" "5" "3" "65" "66" "93" "45" ...
> head(index)
     [,1] [,2] [,3]
[1,] "2"  "M"  "1" 
[2,] "7"  "M"  "3" 
[3,] "11" "M"  "1" 
[4,] "75" "M"  "3" 
[5,] "59" "M"  "1" 
[6,] "3"  "M"  "3" 

I also think the array construction of us.wt got messed up. Since there is no description of the logic or goals in constructing it, I'm not attempting to read you mind and offer suggestions. Here's how to see why I think it's messed up:

> Hmisc::describe(us.wt)
us.wt 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75 
     432        0       32    0.691      Inf      NaN    4.264    7.864   16.032      Inf      Inf 
     .90      .95 
     Inf      Inf 

lowest :  0.5586839  1.1173678  3.1599027  3.4755957  4.2639763
highest: 20.3399270 21.7412450 27.0462314 28.2128223        Inf
Warning message:
In w * sort(x - mean(x)) :
  longer object length is not a multiple of shorter object length
# Notice that more than half of the values are Inf

> head(us.wt)
, ,  = Plunge <= 2 mm

   
                      F         M
  2       Inf  7.053206  7.053206
  3       Inf 10.870622 10.870622
  4       Inf       Inf       Inf
  5       Inf 15.922230 15.922230
  6       Inf       Inf       Inf
  7       Inf 13.581011 13.581011

, ,  = Plunge 2 - 4 mm

   
                      F         M
  2       Inf 14.106411 14.106411
  3       Inf       Inf       Inf
  4       Inf 17.682214 17.682214
  5       Inf       Inf       Inf
  6       Inf       Inf       Inf
  7       Inf       Inf       Inf

, ,  = Plunge > 4 mm

   
                      F         M
  2       Inf       Inf       Inf
  3       Inf 10.870622 10.870622
  4       Inf 17.682214 17.682214
  5       Inf       Inf       Inf
  6       Inf 15.348446 15.348446
  7       Inf 13.581011 13.581011

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1