'Solving the "firstDuplicate" question in Python

I'm trying to solve the following challenge from codesignal.com:

Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.

Example

For a = [2, 1, 3, 5, 3, 2], the output should be firstDuplicate(a) = 3.

There are 2 duplicates: numbers 2 and 3. The second occurrence of 3 has a smaller index than the second occurrence of 2 does, so the answer is 3.

For a = [2, 4, 3, 5, 1], the output should be firstDuplicate(a) = -1.

The execution time limit is 4 seconds.

The guaranteed constraints were:

1 ≤ a.length ≤ 10^5, and

1 ≤ a[i] ≤ a.length

So my code was:

def firstDuplicate(a):
    b = a
    if len(list(set(a))) == len(a):
        return -1

    n = 0
    answer = -1
    starting_distance = float("inf")

    while n!=len(a):
        value = a[n]

        if a.count(value) > 1:

            place_of_first_number = a.index(value)

            a[place_of_first_number] = 'string'

            place_of_second_number = a.index(value)

            if place_of_second_number < starting_distance:

                starting_distance = place_of_second_number
                answer = value

            a=b
        n+=1
        if n == len(a)-1:
            return answer 
    return answer

Out of the 22 tests the site had, I passed all of them up to #21, because the test list was large and the execution time exceeded 4 seconds. What are some tips for reducing the execution time, while keeping the the code more or less the same?



Solution 1:[1]

As @erip has pointed out in the comments, you can iterate through the list, add items to a set, and if the item is already in a set, it is a duplicate that has the lowest index, so you can simply return the item; or return -1 if you get to the end of the loop without finding a duplicate:

def firstDuplicate(a):
    seen = set()
    for i in a:
        if i in seen:
            return i
        seen.add(i)
    return -1

Solution 2:[2]

Create a new set and find its already in the new list, if its there return the element:

def firstDuplicate(a):
    dup = set()
    for i in range(len(a)):
        if a[i] in dup:
            return a[i]
        else:
            dup.add(a[i])
    return -1

Solution 3:[3]

This is just an idea, I didn't verify it but it should work. It seems there's no memory limit but just a time limit. Therefore using space to trade time is probably a practical way to do this. The computation complexity is O(n). This algorithm also depends on the condition that the number range is between 1 to len(a).

def first_duplicate(a):
    len_a = len(a)
    b = [len_a + 1] * len_a
    for i, n in enumerate(a):
        n0 = n - 1
        if b[n0] == len_a + 1:
            b[n0] = len_a
        elif b[n0] == len_a:
            b[n0] = i
    min_i = len_a
    min_n = -1
    for n0, i in enumerate(b):
        if i < min_i:
            min_i = i
            min_n = n0 + 1
    return min_n

Update:

This solution is not as fast as the set() solution by @blhsing. However, it may not be the same if it was implemented in C - it's kinda unfair since set() is a built-in function which was implemented in C as other core functions of CPython.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 blhsing
Solution 2 Velkumaran A P
Solution 3