top of page

Needle In a Haystack - Part 1

  • mosesg1123
  • Apr 17
  • 3 min read

Updated: Apr 22

Searching for a substring within a larger string is a fundamental task in many real‑world applications—think text editors, search engines, or log analysis tools. The strStr() problem, also known as “needle in a haystack,” is a perfect warm‑up that tests your ability to implement basic string search efficiently.


Problem Statement

Implement a function strStr(haystack, needle) that returns the index of the first occurrence of needle in haystack, or –1 if needle is not part of haystack. If needle is an empty string, return 0.


1. Clarify Requirements Before Jumping Into Code

I need to confirm a few details first

  • Empty needle returns 0 by definition

  • Case matters—search is exact, not case‑insensitive

  • Only simple substring match, not regex or wildcards

  • Aim for optimal time if possible, but a correct brute‑force is acceptable as a first step


2. Identify the Category of the Question

This is a string search problem. At its simplest it’s a nested‑loop comparison, but it can also be optimized using pattern‑matching algorithms like Knuth–Morris–Pratt (KMP) or Rabin–Karp for linear or average‑linear time complexity.


3. Consider a Brute‑Force Approach to Understand the Problem

Basic approach: for each position i in haystack up to len(haystack) – len(needle), compare haystack[i:i+len(needle)] to needle character by character.

First match → return i; if none, return –1.

Time complexity O(n × m), space O(1).


4. Brainstorm More Solutions

  • Brute‑force substring compare: easy to write, O(n × m) time

  • KMP algorithm: preprocess needle to build lps array, then single-pass search in O(n + m) time, O(m) extra space

  • Rabin–Karp: rolling hash for average O(n + m), but worst‑case O(n × m) and possible hash collisions

  • Python built‑in: not allowed in interview but underscores the simplicity of brute‑force


5. Discuss Trade‑Offs Between Your Solutions

Approach

Time

Space

Pros

Cons

Brute‑Force

O(n·m)

O(1)

Very simple

Too slow for large inputs

Rabin–Karp

O(n + m) avg

O(1)

Average linear, simple logic

Collision handling, worst-case O(n·m)

Knuth–Morris–Pratt

O(n + m)

O(m)

Guaranteed linear

More complex to implement

KMP is best for guaranteed performance, but for now let's assume you don't remember the other algorithms and just need to get a solution out. We'll implement the brute force solution here and follow-up with a second post to talk through how you might work out the other algorithms in real time, even if you don't remember them.


6. Write Pseudocode to Structure Your Thoughts

function strStr(haystack, needle):
  if needle is empty:
    return 0

  n = length(haystack)
  m = length(needle)

  for i from 0 to n - m:
    j = 0
    while j < m and haystack[i + j] == needle[j]:
      j += 1
    if j == m:
      return i

  return -1

7. Consider Edge Cases

  • Empty needle: returns 0

  • Empty haystack but non‑empty needle: returns –1

  • Needle longer than haystack: returns –1

  • Needle equals haystack exactly: returns 0

  • Multiple occurrences: should return the first index


8. Write Full Code Syntax

def str_str(haystack: str, needle: str) -> int:
    if not needle:
        return 0

    n, m = len(haystack), len(needle)
    for i in range(n - m + 1):
        match = True
        for j in range(m):
            if haystack[i + j] != needle[j]:
                match = False
                break
        if match:
            return i
    return -1

9. Test Your Code

assert str_str("hello", "ll") == 2
assert str_str("aaaaa", "bba") == -1
assert str_str("", "") == 0
assert str_str("", "a") == -1
assert str_str("abc", "abc") == 0
assert str_str("abcabc", "cab") == 2
assert str_str("mississippi", "issip") == 4

print("All tests passed!")

10. Key Lessons to Remember for Future Questions

  • For simple substring search, brute‑force nested loops often suffice and are quick to code. If you're in a time crunch and need to get a solution out, implementing the brute-force algorithm is better than nothing.

  • For top tech companies however, your best shot of getting offer would be if you remembered the optimized algorithms and be able to implement them in real time.

  • In a follow-up post, we'll talk through how you might go about working out the KMP algorithm in real time when you can only remember high level details but not the specifics.


Good Luck and Happy Coding!

Recent Posts

See All
Deriving KMP in Real Time

I’m staring at my whiteboard, fresh off writing the brute‑force substring search: two nested loops, compare needle at every haystack...

 
 
 

Comments


Drop Me a Line, Let Me Know What You Think

Thanks for submitting!

© 2023 by Train of Thoughts. Proudly created with Wix.com

bottom of page