'How to print non repeating elements with original list

given a list of integers nums, return a list of all the element but the repeating number should not be printed more than twice

example

input: nums = [1,1,2,3,3,4,4,4,5]

output: [1,1,2,3,3,4,4,5]



Solution 1:[1]

A more flexible implementation using itertools:

from itertools import islice, groupby, chain

nums = [1,1,2,3,3,4,4,4,5]

output = (islice(g, 2) for _, g in groupby(nums))
output = list(chain.from_iterable(output))
print(output) # [1, 1, 2, 3, 3, 4, 4, 5]

You can replace 2 in islice(g, 2) to tune the max repeats you want.

Solution 2:[2]

The easiest and I guess most straight forward way to use unique collections is with a set:

list(set(nums)) -> [1, 2, 3, 4, 5]

The downside of this approuch is that sets are unordered. And we cannot really depend on how the list will be sorted after the conversion.

If order is important in your case you can do this:

list(dict.fromkeys(nums))
[1, 2, 3, 4, 5]

dicts are ordered since python3 came out, and their keys are unique. So with this small trick we get a list of the unique keys of a dictionary, but still maitain the original order!

Solution 3:[3]

Using data.table & comparison with base and dplyr

data.table method

This is a data.table based version of the answer which is quicker than both the base and dplyr versions.

set.seed(65L)
df <- data.table(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0)
df[, v2 := cumsum(v1 > 0)]
head(df, 12)
    v1 v2
 1:  2  1
 2:  1  2
 3:  3  3
 4:  0  3
 5:  0  3
 6:  4  4
 7:  2  5
 8:  4  6
 9:  4  7
10:  0  7
11:  4  8
12:  2  9

Three-method comparison: equivalence

set.seed(65L)
df <- data.frame(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0)
df2 <- df
dt <- as.data.table(df)

# data.table
dt[, v2 := cumsum(v1 > 0)]

# base R
if (df$v1[1L] > 0) {df$v2[1L] <- 1}
for (i in 2:length(df$v1)) {
  df$v2[i] <- df$v2[i - 1] + if (df$v1[i] > 0) {1} else {0}
}

# dplyr
if (df2$v1[1L] > 0) {df2$v2[1L] <- 1}
df2 <- df2 %>% mutate(v2 = cumsum(v1>0))

all.equal(dt, df, check.attributes = FALSE)
[1] TRUE
all.equal(dt, df2, check.attributes = FALSE)
[1] TRUE
all.equal(df, df2, check.attributes = FALSE)
[1] TRUE

Three-method comparison: speed

library(microbenchmark)
microbenchmark(DT = dt[, v2 := cumsum(v1 > 0)],
Base = {if (df$v1[1L] > 0) {df$v2[1L] <- 1};for (i in 2:length(df$v1)) {df$v2[i] <- df$v2[i - 1] + if (df$v1[i] > 0) {1} else {0}}},
DP = {if (df2$v1[1L] > 0) {df2$v2[1L] <- 1};df2 <- df2 %>% mutate(v2 = cumsum(v1>0))},
setup = 'set.seed(65L);df <- data.table(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0); df2 <- df; dt <- as.data.table(df)',
control = list(order = 'block'), times = 1000L)

Unit: microseconds
 expr    min      lq      mean median      uq     max neval cld
   DT  204.1  210.20  216.6067  212.0  216.80   382.9  1000 a  
 Base 7956.1 8322.85 8936.3439 8457.6 8702.25 22219.4  1000   c
   DP  916.0  930.50  994.4782  939.8  977.60  6157.4  1000  b

So the dplyr method is about 9 times faster than the base loop and the data.table method is about 4.5 times faster than dplyr and over 40 times faster than base!

enter image description here

Solution 4:[4]

For these cases I typically replace the 0s with NA values, and use tidyr::fill() to copy the last non-missing (i.e. non-zero) value forward.

Here is an example:

df <- data.frame(
  value1 = c(1, 0, 0, 0, 2, 0, 0, 3, 4, 0)
) 


library(dplyr)

df %>% 
  mutate(
    value2 = ifelse(value1 == 0, NA_real_, value1)
  ) %>% 
  tidyr::fill(value2, .direction = "down")

and the result:

   value1 value2
1       1      1
2       0      1
3       0      1
4       0      1
5       2      2
6       0      2
7       0      2
8       3      3
9       4      4
10      0      4

This works even when the values are increasing by more/less than 1, which is not the case with e.g. cumsum().

Solution 5:[5]

No need for using a loop. One option using base R would be:

df <- data.frame(value1 = c(1,0,0,0,2,0,0,3,4,0))
df$value2 <- cumsum(ifelse(df$value1 > 0, 1, 0))

Which yields:

> df
   value1 value2
1       1      1
2       0      1
3       0      1
4       0      1
5       2      2
6       0      2
7       0      2
8       3      3
9       4      4
10      0      4

Solution 6:[6]

Using Base R

There may be more elegant ways to do this but assuming that the column "value2" is already in the dataframe you can do something like the following. This answer relies solely on base R and also it does not matter if v1 is increasing or decreasing, just that it is non-zero. I'll create a data frame as an example.

set.seed(65L)
df <- data.frame(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0)
 
head(df, 12)
   v1 v2
1   2  0
2   1  0
3   3  0
4   0  0
5   0  0
6   4  0
7   2  0
8   4  0
9   4  0
10  0  0
11  4  0
12  2  0
 
# Handle the first row seperately to get rid of i - 1 headaches
if (df$v1[1L] > 0) {df$v2[1L] <- 1}
 
# Now the loop. Safer to do seq_len(length(df$v1) - 1) + 1 but that's more confusing
 
for (i in 2:length(df$v1)) {
    df$v2[i] <- df$v2[i - 1] + if (df$v1[i] > 0) {1} else {0}
}
 
head(df, 12)
   v1 v2
1   2  1
2   1  2
3   3  3
4   0  3
5   0  3
6   4  4
7   2  5
8   4  6
9   4  7
10  0  7
11  4  8
12  2  9

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 j1-lee
Solution 2 Liron Berger
Solution 3
Solution 4 jpiversen
Solution 5
Solution 6