Contains Duplicate – A Warm Up

mosesg1123
Apr 16
3 min read

Updated: Apr 22

In today’s session, I’m tackling the Contains Duplicate problem, a common warm‑up that helps demonstrate your grasp of array manipulation and hash-based techniques.

Problem Statement

Given an integer array nums, return True if any value appears at least twice, and False if every element is distinct.

1. Clarify Requirements Before Jumping Into Code

I need to verify a few details. First, I assume that nums is an array of integers. I need to return True as soon as I find a duplicate; otherwise, I return False. I understand that I’m not supposed to modify the array, and I’m allowed to use extra space as needed. These questions ensure I have the complete picture before diving in.

2. Identify the Category of the Question

This question is fundamentally about array traversal and frequency counting. It immediately suggests the use of a hash set or hash table since I need to quickly check for the presence of an element that I’ve seen before. Recognizing that pattern points me toward an O(n) time solution using a hash set.

3. Consider a Brute‑Force Approach to Understand the Problem

For a brute‑force approach, I could use two nested loops, comparing every element with every other element. This solution would look something like this:

for i from 0 to length(nums) - 1:
    for j from i+1 to length(nums) - 1:
        if nums[i] equals nums[j]:
            return True
return False

This approach is simple but runs in O(n²) time, which isn’t scalable for large arrays. It helps me understand the problem’s basics, but it’s too inefficient.

4. Brainstorm More Solutions

I can improve on the brute‑force solution by leveraging a hash set. I’ll iterate through the array once and store each element in the hash set. Before adding an element, I’ll check if it’s already in the set. If it is, I return True right away. This method cuts down the time complexity to O(n).

Another approach is to sort the array first. Then, I can compare adjacent elements to detect duplicates. However, sorting takes O(n log n) time and may require additional space if I need to preserve the original array.

5. Discuss Trade‑Offs Between Your Solutions

Brute‑Force:
- Time: O(n²)
- Space: O(1)
- Pros: Straightforward and easy to implement.
- Cons: Inefficient for large datasets.
Hash Set:
- Time: O(n)
- Space: O(n)
- Pros: Fast; scales well with input size.
- Cons: Requires extra space.
Sorting:
- Time: O(n log n)
- Space: O(1) if done in-place, O(n) otherwise.
- Pros: Can avoid extra data structure if in-place sort is acceptable.
- Cons: Slower than the hash set approach, and sorting might modify the array.

Given the requirements, the hash set approach is the most optimal.

6. Write Pseudocode to Structure Your Thoughts

function containsDuplicate(nums):
    initialize empty set seen
    for each number in nums:
        if number is in seen:
            return True
        add number to seen
    return False

7. Consider Edge Cases

Empty array ([]) – Should return False.
Array with one element ([x]) – Should return False.
Array with all unique elements – Should return False.
Array with duplicates (e.g. [1,2,3,1]) – Should return True.
Array with multiple duplicates (e.g. [1,1,1,2,2]) – Should return True immediately upon detecting the first duplicate.

8. Write Full Code Syntax

def contains_duplicate(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

9. Test Your Code

# Test 1: Duplicate present
nums1 = [1, 2, 3, 1]
assert contains_duplicate(nums1) == True

# Test 2: All unique elements
nums2 = [1, 2, 3, 4]
assert contains_duplicate(nums2) == False

# Test 3: Multiple duplicates
nums3 = [1, 1, 1, 2, 2]
assert contains_duplicate(nums3) == True

# Test 4: Single element
nums4 = [5]
assert contains_duplicate(nums4) == False

# Test 5: Empty array
nums5 = []
assert contains_duplicate(nums5) == False

print("All tests passed!")

10. Key Lessons to Remember for Future Questions

In problems that involve checking for duplicates or frequencies, a hash set is typically the fastest and most straightforward approach.
Since this question is pretty straight-forward, you may know the optimal solution off the top of your head. While that's impressive, it's still important to communicate the other possibilities! Even if one solution is obviously superior, it's still important to show the interviewer you know how to identify multiple solutions and consider the trade-offs.

Happy coding!