Needle In a Haystack - Part 1

mosesg1123
Apr 17
3 min read

Updated: Apr 22

Searching for a substring within a larger string is a fundamental task in many real‑world applications—think text editors, search engines, or log analysis tools. The strStr() problem, also known as “needle in a haystack,” is a perfect warm‑up that tests your ability to implement basic string search efficiently.

Problem Statement

Implement a function strStr(haystack, needle) that returns the index of the first occurrence of needle in haystack, or –1 if needle is not part of haystack. If needle is an empty string, return 0.

1. Clarify Requirements Before Jumping Into Code

I need to confirm a few details first

Empty needle returns 0 by definition
Case matters—search is exact, not case‑insensitive
Only simple substring match, not regex or wildcards
Aim for optimal time if possible, but a correct brute‑force is acceptable as a first step

2. Identify the Category of the Question

This is a string search problem. At its simplest it’s a nested‑loop comparison, but it can also be optimized using pattern‑matching algorithms like Knuth–Morris–Pratt (KMP) or Rabin–Karp for linear or average‑linear time complexity.

3. Consider a Brute‑Force Approach to Understand the Problem

Basic approach: for each position i in haystack up to len(haystack) – len(needle), compare haystack[i:i+len(needle)] to needle character by character.

First match → return i; if none, return –1.

Time complexity O(n × m), space O(1).

4. Brainstorm More Solutions

Brute‑force substring compare: easy to write, O(n × m) time
KMP algorithm: preprocess needle to build lps array, then single-pass search in O(n + m) time, O(m) extra space
Rabin–Karp: rolling hash for average O(n + m), but worst‑case O(n × m) and possible hash collisions
Python built‑in: not allowed in interview but underscores the simplicity of brute‑force

5. Discuss Trade‑Offs Between Your Solutions

Approach	Time	Space	Pros	Cons
Brute‑Force	O(n·m)	O(1)	Very simple	Too slow for large inputs
Rabin–Karp	O(n + m) avg	O(1)	Average linear, simple logic	Collision handling, worst-case O(n·m)
Knuth–Morris–Pratt	O(n + m)	O(m)	Guaranteed linear	More complex to implement

KMP is best for guaranteed performance, but for now let's assume you don't remember the other algorithms and just need to get a solution out. We'll implement the brute force solution here and follow-up with a second post to talk through how you might work out the other algorithms in real time, even if you don't remember them.

6. Write Pseudocode to Structure Your Thoughts

function strStr(haystack, needle):
  if needle is empty:
    return 0

  n = length(haystack)
  m = length(needle)

  for i from 0 to n - m:
    j = 0
    while j < m and haystack[i + j] == needle[j]:
      j += 1
    if j == m:
      return i

  return -1

7. Consider Edge Cases

Empty needle: returns 0
Empty haystack but non‑empty needle: returns –1
Needle longer than haystack: returns –1
Needle equals haystack exactly: returns 0
Multiple occurrences: should return the first index

8. Write Full Code Syntax

def str_str(haystack: str, needle: str) -> int:
    if not needle:
        return 0

    n, m = len(haystack), len(needle)
    for i in range(n - m + 1):
        match = True
        for j in range(m):
            if haystack[i + j] != needle[j]:
                match = False
                break
        if match:
            return i
    return -1

9. Test Your Code

assert str_str("hello", "ll") == 2
assert str_str("aaaaa", "bba") == -1
assert str_str("", "") == 0
assert str_str("", "a") == -1
assert str_str("abc", "abc") == 0
assert str_str("abcabc", "cab") == 2
assert str_str("mississippi", "issip") == 4

print("All tests passed!")

10. Key Lessons to Remember for Future Questions

For simple substring search, brute‑force nested loops often suffice and are quick to code. If you're in a time crunch and need to get a solution out, implementing the brute-force algorithm is better than nothing.
For top tech companies however, your best shot of getting offer would be if you remembered the optimized algorithms and be able to implement them in real time.
In a follow-up post, we'll talk through how you might go about working out the KMP algorithm in real time when you can only remember high level details but not the specifics.

Good Luck and Happy Coding!