Given two strings needle
and haystack
, return the index of the first occurrence of needle
in haystack
, or -1
if needle
is not part of haystack
.
Example 1:
Input: haystack = "sadbutsad", needle = "sad" Output: 0 Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0, so we return 0.
Example 2:
Input: haystack = "leetcode", needle = "leeto" Output: -1 Explanation: "leeto" did not occur in "leetcode", so we return -1.
Constraints:
1 <= haystack.length, needle.length <= 104
haystack
and needle
consist of only lowercase English characters.This problem asks to find the index of the first occurrence of a needle
string within a haystack
string. If the needle
is not found, return -1.
Several approaches can solve this:
This is the most straightforward approach. We iterate through the haystack
string, checking at each position if the needle
string starts there.
Algorithm:
haystack
using a sliding window of the size of needle
.haystack
with needle
.Time Complexity: O((n-m+1) * m), where n is the length of haystack
and m is the length of needle
. In the worst case, we might compare almost all possible substrings of length m
in haystack
.
Space Complexity: O(1), as we use only a few variables to store indices and perform comparisons.
Code (Python):
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
n, m = len(haystack), len(needle)
for i in range(n - m + 1):
if haystack[i : i + m] == needle:
return i
return -1
Rabin-Karp uses hashing to optimize the comparison process. It calculates a hash value for the needle
and compares it with the hash values of substrings of haystack
. If the hash values match, it then performs a character-by-character comparison to confirm a true match (to handle hash collisions).
Algorithm:
needle
.haystack
of the same length as needle
.Time Complexity: On average, O(n+m). The worst-case time complexity can be O(n*m) if there are many hash collisions.
Space Complexity: O(1).
Code (Go): (Note: This implementation uses a simple hash function; more sophisticated hash functions can improve performance and reduce collisions)
func strStr(haystack string, needle string) int {
n, m := len(haystack), len(needle)
sha, target, left, right, mod := 0, 0, 0, 0, 1<<31-1
multi := 1
for i := 0; i < m; i++ {
target = (target*256%mod + int(needle[i])) % mod
}
for i := 1; i < m; i++ {
multi = multi * 256 % mod
}
for ; right < n; right++ {
sha = (sha*256%mod + int(haystack[right])) % mod
if right-left+1 < m {
continue
}
if sha == target && haystack[left:right+1] == needle {
return left
}
sha = (sha - (int(haystack[left])*multi)%mod + mod) % mod
left++
}
return -1
}
KMP is a sophisticated string matching algorithm that utilizes a "partial match table" (or "failure function") to avoid redundant comparisons. This table preprocesses the needle
string to identify overlapping patterns.
Algorithm:
needle
. This table indicates how far to shift the window when a mismatch occurs.haystack
.Time Complexity: O(n+m). KMP achieves linear time complexity due to its efficient handling of mismatches.
Space Complexity: O(m) to store the partial match table.
(KMP implementations are provided in several languages in the original response; they are more complex to explain concisely here.)
Summary Table:
| Algorithm | Time Complexity | Space Complexity | Notes | |-----------------|-----------------|-----------------|---------------------------------------------| | Brute-Force | O((n-m+1)*m) | O(1) | Simple, but can be slow for large inputs | | Rabin-Karp | O(n+m) (avg) | O(1) | Prone to hash collisions | | Knuth-Morris-Pratt | O(n+m) | O(m) | Most efficient for general cases |
For most cases, KMP provides the best performance, although Rabin-Karp can be competitive in practice, particularly if a good hash function is used. The Brute-Force method is simplest to understand and implement but is less efficient for larger inputs.