{x}
blog image

Number of Matching Subsequences

Given a string s and an array of strings words, return the number of words[i] that is a subsequence of s.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

  • For example, "ace" is a subsequence of "abcde".

 

Example 1:

Input: s = "abcde", words = ["a","bb","acd","ace"]
Output: 3
Explanation: There are three strings in words that are a subsequence of s: "a", "acd", "ace".

Example 2:

Input: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]
Output: 2

 

Constraints:

  • 1 <= s.length <= 5 * 104
  • 1 <= words.length <= 5000
  • 1 <= words[i].length <= 50
  • s and words[i] consist of only lowercase English letters.

Problem: Number of Matching Subsequences

Given a string s and an array of strings words, we need to find the number of strings in words that are subsequences of s. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements.

Solution Approaches and Code Explanations

Several approaches can solve this problem, each with varying time and space complexities. We'll explore three efficient solutions:

Solution 1: Using Queues

This approach uses queues to efficiently track the remaining parts of each word in words as we iterate through s.

  • Algorithm:

    1. Create a hash map (or array) d to store queues of strings. Each key represents a starting character of a word.
    2. Initialize each queue in d with words starting with the corresponding character.
    3. Iterate through s:
      • For each character c in s, process the queue associated with c.
      • For each word w in the queue:
        • If w has only one character, increment the count of matching subsequences.
        • Otherwise, remove the first character of w and add the remaining part to the queue associated with its new starting character.
    4. Return the count of matching subsequences.
  • Time Complexity: O(m*n), where m is the length of s and n is the total length of all words. Each word is processed at most once for each character. In the worst case, we process each character of each word only once.

  • Space Complexity: O(n), where n is the total length of all words. This is because we store the remaining parts of each word.

  • Code (Python):

from collections import defaultdict, deque
 
class Solution:
    def numMatchingSubseq(self, s: str, words: List[str]) -> int:
        d = defaultdict(deque)
        for w in words:
            d[w[0]].append(w)
        ans = 0
        for c in s:
            for _ in range(len(d[c])):
                t = d[c].popleft()
                if len(t) == 1:
                    ans += 1
                else:
                    d[t[1]].append(t[1:])
        return ans

Similar implementations can be done in Java, C++, and Go, leveraging their respective queue data structures.

Solution 2: Optimized Queue Approach with Indices

This improves upon Solution 1 by storing indices instead of entire string slices to reduce memory usage and improve performance.

  • Algorithm: Similar to Solution 1, but instead of adding substrings to the queues, we add tuples (word_index, next_char_index). This avoids creating many substrings and directly tracks the word's progress.

  • Time Complexity: O(m*n), same as Solution 1.

  • Space Complexity: O(n), but potentially smaller than Solution 1 due to storing only indices.

  • Code (Python):

from collections import defaultdict, deque
 
class Solution:
    def numMatchingSubseq(self, s: str, words: List[str]) -> int:
        d = defaultdict(deque)
        for i, w in enumerate(words):
            d[w[0]].append((i, 0))  # Store (word_index, current_char_index)
        ans = 0
        for c in s:
            for _ in range(len(d[c])):
                i, j = d[c].popleft()
                j += 1
                if j == len(words[i]):
                    ans += 1
                else:
                    d[words[i][j]].append((i, j))
        return ans

Solution 3: Binary Search

This approach utilizes binary search for efficient subsequence checking.

  • Algorithm:

    1. Create a hash map d where keys are characters and values are lists of their indices in s.
    2. For each word w in words:
      • Use binary search to find the indices of characters in w within the lists stored in d.
      • If all characters of w can be found in s in the correct order, increment the count of matching subsequences.
  • Time Complexity: O(m log m + n log k), where m is the length of s, n is the number of words, and k is the maximum length of a word in words. Building the index is O(m log m), and each word check is O(k log m).

  • Space Complexity: O(m), to store the indices in the hash map.

  • Code (Python):

from collections import defaultdict
from bisect import bisect_right
 
class Solution:
    def numMatchingSubseq(self, s: str, words: List[str]) -> int:
        def check(w):
            i = -1
            for c in w:
                j = bisect_right(d[c], i)
                if j == len(d[c]):
                    return False
                i = d[c][j]
            return True
 
        d = defaultdict(list)
        for i, c in enumerate(s):
            d[c].append(i)
        return sum(check(w) for w in words)

Complexity Analysis Summary

| Solution | Time Complexity | Space Complexity | |---|---|---| | Solution 1 (Queues) | O(mn) | O(n) | | Solution 2 (Optimized Queues) | O(mn) | O(n) (potentially smaller) | | Solution 3 (Binary Search) | O(m log m + n log k) | O(m) |

The best choice depends on the expected size of inputs. For smaller inputs, the queue-based approaches might be simpler to implement and have comparable performance. For larger inputs, especially with many long words, the binary search approach can be significantly faster. Solution 2 offers a good balance between efficiency and simplicity.