'pairwise similarity with consecutive points

I have a large matrix of document similarity created with paragraph2vec_similarity in doc2vec package. I converted it to a data frame and added a TITLE column to the beginning to later sort or group it.

Current Dummy Output:

Title Header DocName_1900.txt_1 DocName_1900.txt_2 DocName_1900.txt_3 DocName_1901.txt_1 DocName_1901.txt_2
Doc1 DocName_1900.txt_1 1.000000 0.7369358 0.6418045 0.6268959 0.6823404
Doc1 DocName_1900.txt_2 0.7369358 1.000000 0.6544884 0.7418507 0.5174367
Doc1 DocName_1900.txt_3 0.6418045 0.6544884 1.000000 0.6180578 0.5274650
Doc2 DocName_1901.txt_1 0.6268959 0.7418507 0.6180578 1.000000 0.5755243
Doc2 DocName_1901.txt_2 0.6823404 0.5174367 0.5274650 0.5755243 1.000000

What I want is a data frame giving similarity in consecutive order for each following document. That is, the score for Doc1.1 and Doc1.2; and Doc1.2 and Doc1.3. Because I am only interested with similarity scores inside each individual document -- in diagonal order as shown in bold above.

Expected Output

Title Similarity for 1-2 Similarity for 2-3 Similarity for 3-4
Doc1 0.7369358 0.6544884 NA
Doc2 0.5755243 NA NA NA
Doc3 0.6049844 0.5250659 0.5113757

I was able to produce one giving the similarity scores of one doc with the remaining all docs with x<-data.frame(col=colnames(m)[col(m)], row=rownames(m)[row(last)], similarity=c(m)). This is the closest I could get. Is there a better way? Because I am dealing with more than 500 titles with varying lengths. There is still the option of using diag but it gets everything to the end of matrix and I loose document grouping.



Solution 1:[1]

Another solution:

df %>%
  group_by(Title) %>%
  summarize(name = embed(Header, 2), .groups = 'drop') %>%
  mutate(value = transform(df, row.names = Header)[name],
         name = str_remove_all(paste(name[,2],name[,1], sep = '_'), '[^_]+[.]'))%>%
  pivot_wider()

# A tibble: 2 x 3
  Title `1_2`     `2_3`    
  <chr> <chr>     <chr>    
1 Doc1  0.7369358 0.6544884
2 Doc2  0.5755243 NA       

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 onyambu