'R make.unique starting in 1

I have a data frame with columns that are in groups of 4 like so:

a b c d a b c d a b c d a b c d...

Then, I use the function rep to create tags for the columns:

rep(c("a", "b", "c", "d"), len=ncol)

Finally I use the function make.unique to create the tags:

a b c d a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3...

However, I would like to get:

a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a4 b4 c4 d4...

Is there an easy way to accomplish this? In the make.unique documentation does not mention any parameters to obtain this behaviour.

r


Solution 1:[1]

n <- 4
ncol <- 16
paste(letters[seq(n)], rep(seq(ncol/n), each = n, len = ncol), sep = "")

Solution 2:[2]

make.unique.2 = function(x, sep='.'){
    ave(x, x, FUN=function(a){if(length(a) > 1){paste(a, 1:length(a), sep=sep)} else {a}})
}

Testing against your example:

> u = rep(c("a", "b", "c", "d"), 4)
> make.unique.2(u)
  [1] "a.1" "b.1" "c.1" "d.1" "a.2" "b.2" "c.2" "d.2" "a.3" "b.3" "c.3" "d.3"
 [13] "a.4" "b.4" "c.4" "d.4"

If an element is not duplicated, it is left alone:

> u = c('a', 'a', 'b', 'c', 'c', 'c', 'd')
> make.unique.2(u)
[1] "a.1" "a.2" "b"   "c.1" "c.2" "c.3" "d"

Solution 3:[3]

Wouldn't call this pretty, but it does the job:

> ncol <- 10
> apply(expand.grid(c("a","b","c","d"),1:((ncol+3)/4)), 1,
+   function(x)paste(x,collapse=""))[1:ncol]
 [1] "a1" "b1" "c1" "d1" "a2" "b2" "c2" "d2" "a3" "b3"

where ncol is the number of tags to generate.

Solution 4:[4]

Here is a further variant. Applying the function make.unique.2 by @adn.bps can still produces some duplicates:

> u = c("a", "a", "b", "c", "c", "d", "c", "a.1")
> make.unique.2(u)
[1] "a.1" "a.2" "b"   "c.1" "c.2" "d"   "c.3" "a.1"

To avoid that, I've done:

dotify <- function(x, avoid){
  l <- length(x)
  if(l == 1L){
    return(x)
  }
  numbers <- 1L:l
  out <- paste0(x, ".", numbers)
  ndots <- 1L
  while(any(out %in% avoid)){
    ndots <- ndots + 1L
    out <- paste0(x, paste0(rep(".", ndots), collapse = ""), numbers)
  }
  out
}

make.unique2 <- function(x){
  if(anyDuplicated(x)){
    splt <- split(x, x)
    u <- names(splt)
    for(i in 1L:length(splt)){
      splt_i <- splt[[i]]
      j <- match(splt_i[1L], u)
      avoid <- u[-j]
      splt_i_new <- dotify(splt_i, avoid)
      u <- c(avoid, splt_i_new)
      splt[[i]] <- splt_i_new
    }
    x <- unsplit(splt, x)
  }
  x
}

make.unique2(u)
# [1] "a..1" "a..2" "b"    "c.1"  "c.2"  "d"    "c.3"  "a.1" 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mdsumner
Solution 2 adn bps
Solution 3 NPE
Solution 4 Stéphane Laurent