Python – Detect Duplicates in an Array of N+1 Integers

This tutorial will help you detect duplicates in an array of N+1 integers, where the integers are in the range [1, N]. Since there are N+1 numbers and only N possible unique values, at least one duplicate must exist.

Problem Statement

You are given an array arr of size N+1 containing integers in the range [1, N]. Your task is to detect the duplicate number in the array. Note that the array may contain only one duplicate repeated one or more times.

Sample Input and Output

Example 1:

</>
Copy
Input: arr = [3, 1, 3, 4, 2]
Output: Duplicate: 3

Example 2:

</>
Copy
Input: arr = [1, 4, 4, 2, 5, 3]
Output: Duplicate: 4

Solution Steps

There are several ways to detect a duplicate in the array. Here are two common approaches:

  1. Hash Set Method: Iterate through the array, keeping track of seen elements using a set. When an element is encountered that is already in the set, it is the duplicate.
  2. Floyd’s Tortoise and Hare Algorithm: Use two pointers moving at different speeds to detect a cycle in the array. The cycle indicates the presence of a duplicate, and the entry point of the cycle is the duplicate number.

For beginners, the Hash Set method is straightforward and easy to understand. We will implement both methods below.

Python Program

Method 1: Using a Hash Set

</>
Copy
# Function to detect duplicate using a hash set
def find_duplicate(arr):
    seen = set()  # Initialize an empty set to store seen numbers
    for num in arr:
        if num in seen:
            return num  # Duplicate found
        seen.add(num)
    return None  # As per problem constraints, this should not happen

# Testing the function with sample inputs
arr1 = [3, 1, 3, 4, 2]
print("Duplicate in arr1:", find_duplicate(arr1))  # Expected output: 3

arr2 = [1, 4, 4, 2, 5, 3]
print("Duplicate in arr2:", find_duplicate(arr2))  # Expected output: 4

This method uses extra space proportional to the number of elements in the array, making it simple and effective for beginners.

Method 2: Floyd’s Tortoise and Hare Algorithm

This method uses two pointers (slow and fast) to detect a cycle in the array. It is more space-efficient since it does not require extra memory.

</>
Copy
# Function to detect duplicate using Floyd's Tortoise and Hare algorithm
def find_duplicate_floyd(arr):
    # Phase 1: Finding the intersection point of the two pointers
    tortoise = arr[0]
    hare = arr[0]
    while True:
        tortoise = arr[tortoise]
        hare = arr[arr[hare]]
        if tortoise == hare:
            break

    # Phase 2: Finding the entrance to the cycle (duplicate number)
    tortoise = arr[0]
    while tortoise != hare:
        tortoise = arr[tortoise]
        hare = arr[hare]
    return hare

# Testing the function with sample inputs
print("Duplicate using Floyd's method (arr1):", find_duplicate_floyd(arr1))  # Expected output: 3
print("Duplicate using Floyd's method (arr2):", find_duplicate_floyd(arr2))  # Expected output: 4

The Floyd’s Tortoise and Hare algorithm is efficient with O(n) time complexity and O(1) space complexity, making it ideal for large inputs.

Conclusion

In this tutorial, we explored two methods to detect a duplicate in an array of N+1 integers:

  1. Hash Set Method: Simple and intuitive, though it uses extra space.
  2. Floyd’s Tortoise and Hare Algorithm: More efficient in terms of space, suitable for larger arrays.