{x}
blog image

Find the Index of the First Occurrence in a String

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

 

Example 1:

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.

Example 2:

Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.

 

Constraints:

  • 1 <= haystack.length, needle.length <= 104
  • haystack and needle consist of only lowercase English characters.

Solution Explanation for Find the Index of the First Occurrence in a String

This problem asks to find the index of the first occurrence of a needle string within a haystack string. If the needle is not found, return -1.

Several approaches can solve this:

Solution 1: Brute-Force Traversal

This is the most straightforward approach. We iterate through the haystack string, checking at each position if the needle string starts there.

Algorithm:

  1. Iterate through haystack using a sliding window of the size of needle.
  2. At each position, compare the current window in haystack with needle.
  3. If a match is found, return the starting index of the match.
  4. If the loop completes without a match, return -1.

Time Complexity: O((n-m+1) * m), where n is the length of haystack and m is the length of needle. In the worst case, we might compare almost all possible substrings of length m in haystack.

Space Complexity: O(1), as we use only a few variables to store indices and perform comparisons.

Code (Python):

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n, m = len(haystack), len(needle)
        for i in range(n - m + 1):
            if haystack[i : i + m] == needle:
                return i
        return -1

Solution 2: Rabin-Karp Algorithm

Rabin-Karp uses hashing to optimize the comparison process. It calculates a hash value for the needle and compares it with the hash values of substrings of haystack. If the hash values match, it then performs a character-by-character comparison to confirm a true match (to handle hash collisions).

Algorithm:

  1. Calculate the hash value of the needle.
  2. Use a sliding window to calculate the hash value of substrings of haystack of the same length as needle.
  3. Compare the hash values. If they match, verify the strings character by character.
  4. If a match is confirmed, return the starting index.
  5. If the loop completes without a match, return -1.

Time Complexity: On average, O(n+m). The worst-case time complexity can be O(n*m) if there are many hash collisions.

Space Complexity: O(1).

Code (Go): (Note: This implementation uses a simple hash function; more sophisticated hash functions can improve performance and reduce collisions)

func strStr(haystack string, needle string) int {
	n, m := len(haystack), len(needle)
	sha, target, left, right, mod := 0, 0, 0, 0, 1<<31-1
	multi := 1
	for i := 0; i < m; i++ {
		target = (target*256%mod + int(needle[i])) % mod
	}
	for i := 1; i < m; i++ {
		multi = multi * 256 % mod
	}
 
	for ; right < n; right++ {
		sha = (sha*256%mod + int(haystack[right])) % mod
		if right-left+1 < m {
			continue
		}
		if sha == target && haystack[left:right+1] == needle {
			return left
		}
		sha = (sha - (int(haystack[left])*multi)%mod + mod) % mod
		left++
	}
	return -1
}

Solution 3: Knuth-Morris-Pratt (KMP) Algorithm

KMP is a sophisticated string matching algorithm that utilizes a "partial match table" (or "failure function") to avoid redundant comparisons. This table preprocesses the needle string to identify overlapping patterns.

Algorithm:

  1. Build the partial match table for needle. This table indicates how far to shift the window when a mismatch occurs.
  2. Iterate through haystack.
  3. When a mismatch happens, use the partial match table to efficiently shift the window instead of starting from scratch.
  4. If a match is found, return the index.
  5. If the loop completes without a match, return -1.

Time Complexity: O(n+m). KMP achieves linear time complexity due to its efficient handling of mismatches.

Space Complexity: O(m) to store the partial match table.

(KMP implementations are provided in several languages in the original response; they are more complex to explain concisely here.)

Summary Table:

| Algorithm | Time Complexity | Space Complexity | Notes | |-----------------|-----------------|-----------------|---------------------------------------------| | Brute-Force | O((n-m+1)*m) | O(1) | Simple, but can be slow for large inputs | | Rabin-Karp | O(n+m) (avg) | O(1) | Prone to hash collisions | | Knuth-Morris-Pratt | O(n+m) | O(m) | Most efficient for general cases |

For most cases, KMP provides the best performance, although Rabin-Karp can be competitive in practice, particularly if a good hash function is used. The Brute-Force method is simplest to understand and implement but is less efficient for larger inputs.