'How to print non repeating elements with original list
given a list of integers nums, return a list of all the element but the repeating number should not be printed more than twice
example
input: nums = [1,1,2,3,3,4,4,4,5]
output: [1,1,2,3,3,4,4,5]
Solution 1:[1]
A more flexible implementation using itertools:
from itertools import islice, groupby, chain
nums = [1,1,2,3,3,4,4,4,5]
output = (islice(g, 2) for _, g in groupby(nums))
output = list(chain.from_iterable(output))
print(output) # [1, 1, 2, 3, 3, 4, 4, 5]
You can replace 2 in islice(g, 2) to tune the max repeats you want.
Solution 2:[2]
The easiest and I guess most straight forward way to use unique collections is with a set:
list(set(nums)) -> [1, 2, 3, 4, 5]
The downside of this approuch is that sets are unordered. And we cannot really depend on how the list will be sorted after the conversion.
If order is important in your case you can do this:
list(dict.fromkeys(nums))
[1, 2, 3, 4, 5]
dicts are ordered since python3 came out, and their keys are unique. So with this small trick we get a list of the unique keys of a dictionary, but still maitain the original order!
Solution 3:[3]
Using data.table & comparison with base and dplyr
data.table method
This is a data.table based version of the answer which is quicker than both the base and dplyr versions.
set.seed(65L)
df <- data.table(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0)
df[, v2 := cumsum(v1 > 0)]
head(df, 12)
v1 v2
1: 2 1
2: 1 2
3: 3 3
4: 0 3
5: 0 3
6: 4 4
7: 2 5
8: 4 6
9: 4 7
10: 0 7
11: 4 8
12: 2 9
Three-method comparison: equivalence
set.seed(65L)
df <- data.frame(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0)
df2 <- df
dt <- as.data.table(df)
# data.table
dt[, v2 := cumsum(v1 > 0)]
# base R
if (df$v1[1L] > 0) {df$v2[1L] <- 1}
for (i in 2:length(df$v1)) {
df$v2[i] <- df$v2[i - 1] + if (df$v1[i] > 0) {1} else {0}
}
# dplyr
if (df2$v1[1L] > 0) {df2$v2[1L] <- 1}
df2 <- df2 %>% mutate(v2 = cumsum(v1>0))
all.equal(dt, df, check.attributes = FALSE)
[1] TRUE
all.equal(dt, df2, check.attributes = FALSE)
[1] TRUE
all.equal(df, df2, check.attributes = FALSE)
[1] TRUE
Three-method comparison: speed
library(microbenchmark)
microbenchmark(DT = dt[, v2 := cumsum(v1 > 0)],
Base = {if (df$v1[1L] > 0) {df$v2[1L] <- 1};for (i in 2:length(df$v1)) {df$v2[i] <- df$v2[i - 1] + if (df$v1[i] > 0) {1} else {0}}},
DP = {if (df2$v1[1L] > 0) {df2$v2[1L] <- 1};df2 <- df2 %>% mutate(v2 = cumsum(v1>0))},
setup = 'set.seed(65L);df <- data.table(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0); df2 <- df; dt <- as.data.table(df)',
control = list(order = 'block'), times = 1000L)
Unit: microseconds
expr min lq mean median uq max neval cld
DT 204.1 210.20 216.6067 212.0 216.80 382.9 1000 a
Base 7956.1 8322.85 8936.3439 8457.6 8702.25 22219.4 1000 c
DP 916.0 930.50 994.4782 939.8 977.60 6157.4 1000 b
So the dplyr method is about 9 times faster than the base loop and the data.table method is about 4.5 times faster than dplyr and over 40 times faster than base!
Solution 4:[4]
For these cases I typically replace the 0s with NA values, and use tidyr::fill() to copy the last non-missing (i.e. non-zero) value forward.
Here is an example:
df <- data.frame(
value1 = c(1, 0, 0, 0, 2, 0, 0, 3, 4, 0)
)
library(dplyr)
df %>%
mutate(
value2 = ifelse(value1 == 0, NA_real_, value1)
) %>%
tidyr::fill(value2, .direction = "down")
and the result:
value1 value2
1 1 1
2 0 1
3 0 1
4 0 1
5 2 2
6 0 2
7 0 2
8 3 3
9 4 4
10 0 4
This works even when the values are increasing by more/less than 1, which is not the case with e.g. cumsum().
Solution 5:[5]
No need for using a loop. One option using base R would be:
df <- data.frame(value1 = c(1,0,0,0,2,0,0,3,4,0))
df$value2 <- cumsum(ifelse(df$value1 > 0, 1, 0))
Which yields:
> df
value1 value2
1 1 1
2 0 1
3 0 1
4 0 1
5 2 2
6 0 2
7 0 2
8 3 3
9 4 4
10 0 4
Solution 6:[6]
Using Base R
There may be more elegant ways to do this but assuming that the column "value2" is already in the dataframe you can do something like the following. This answer relies solely on base R and also it does not matter if v1 is increasing or decreasing, just that it is non-zero. I'll create a data frame as an example.
set.seed(65L)
df <- data.frame(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0)
head(df, 12)
v1 v2
1 2 0
2 1 0
3 3 0
4 0 0
5 0 0
6 4 0
7 2 0
8 4 0
9 4 0
10 0 0
11 4 0
12 2 0
# Handle the first row seperately to get rid of i - 1 headaches
if (df$v1[1L] > 0) {df$v2[1L] <- 1}
# Now the loop. Safer to do seq_len(length(df$v1) - 1) + 1 but that's more confusing
for (i in 2:length(df$v1)) {
df$v2[i] <- df$v2[i - 1] + if (df$v1[i] > 0) {1} else {0}
}
head(df, 12)
v1 v2
1 2 1
2 1 2
3 3 3
4 0 3
5 0 3
6 4 4
7 2 5
8 4 6
9 4 7
10 0 7
11 4 8
12 2 9
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | j1-lee |
| Solution 2 | Liron Berger |
| Solution 3 | |
| Solution 4 | jpiversen |
| Solution 5 | |
| Solution 6 |

