Given a string s
and an array of strings words
, return the number of words[i]
that is a subsequence of s
.
A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
"ace"
is a subsequence of "abcde"
.
Example 1:
Input: s = "abcde", words = ["a","bb","acd","ace"] Output: 3 Explanation: There are three strings in words that are a subsequence of s: "a", "acd", "ace".
Example 2:
Input: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"] Output: 2
Constraints:
1 <= s.length <= 5 * 104
1 <= words.length <= 5000
1 <= words[i].length <= 50
s
and words[i]
consist of only lowercase English letters.Given a string s
and an array of strings words
, we need to find the number of strings in words
that are subsequences of s
. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements.
Several approaches can solve this problem, each with varying time and space complexities. We'll explore three efficient solutions:
Solution 1: Using Queues
This approach uses queues to efficiently track the remaining parts of each word in words
as we iterate through s
.
Algorithm:
d
to store queues of strings. Each key represents a starting character of a word.d
with words starting with the corresponding character.s
:
c
in s
, process the queue associated with c
.w
in the queue:
w
has only one character, increment the count of matching subsequences.w
and add the remaining part to the queue associated with its new starting character.Time Complexity: O(m*n), where m is the length of s and n is the total length of all words. Each word is processed at most once for each character. In the worst case, we process each character of each word only once.
Space Complexity: O(n), where n is the total length of all words. This is because we store the remaining parts of each word.
Code (Python):
from collections import defaultdict, deque
class Solution:
def numMatchingSubseq(self, s: str, words: List[str]) -> int:
d = defaultdict(deque)
for w in words:
d[w[0]].append(w)
ans = 0
for c in s:
for _ in range(len(d[c])):
t = d[c].popleft()
if len(t) == 1:
ans += 1
else:
d[t[1]].append(t[1:])
return ans
Similar implementations can be done in Java, C++, and Go, leveraging their respective queue data structures.
Solution 2: Optimized Queue Approach with Indices
This improves upon Solution 1 by storing indices instead of entire string slices to reduce memory usage and improve performance.
Algorithm: Similar to Solution 1, but instead of adding substrings to the queues, we add tuples (word_index, next_char_index)
. This avoids creating many substrings and directly tracks the word's progress.
Time Complexity: O(m*n), same as Solution 1.
Space Complexity: O(n), but potentially smaller than Solution 1 due to storing only indices.
Code (Python):
from collections import defaultdict, deque
class Solution:
def numMatchingSubseq(self, s: str, words: List[str]) -> int:
d = defaultdict(deque)
for i, w in enumerate(words):
d[w[0]].append((i, 0)) # Store (word_index, current_char_index)
ans = 0
for c in s:
for _ in range(len(d[c])):
i, j = d[c].popleft()
j += 1
if j == len(words[i]):
ans += 1
else:
d[words[i][j]].append((i, j))
return ans
Solution 3: Binary Search
This approach utilizes binary search for efficient subsequence checking.
Algorithm:
d
where keys are characters and values are lists of their indices in s
.w
in words
:
w
within the lists stored in d
.w
can be found in s
in the correct order, increment the count of matching subsequences.Time Complexity: O(m log m + n log k), where m is the length of s, n is the number of words, and k is the maximum length of a word in words
. Building the index is O(m log m), and each word check is O(k log m).
Space Complexity: O(m), to store the indices in the hash map.
Code (Python):
from collections import defaultdict
from bisect import bisect_right
class Solution:
def numMatchingSubseq(self, s: str, words: List[str]) -> int:
def check(w):
i = -1
for c in w:
j = bisect_right(d[c], i)
if j == len(d[c]):
return False
i = d[c][j]
return True
d = defaultdict(list)
for i, c in enumerate(s):
d[c].append(i)
return sum(check(w) for w in words)
| Solution | Time Complexity | Space Complexity | |---|---|---| | Solution 1 (Queues) | O(mn) | O(n) | | Solution 2 (Optimized Queues) | O(mn) | O(n) (potentially smaller) | | Solution 3 (Binary Search) | O(m log m + n log k) | O(m) |
The best choice depends on the expected size of inputs. For smaller inputs, the queue-based approaches might be simpler to implement and have comparable performance. For larger inputs, especially with many long words, the binary search approach can be significantly faster. Solution 2 offers a good balance between efficiency and simplicity.