{x}
blog image

Substring with Concatenation of All Words

You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated string is a string that exactly contains all the strings of any permutation of words concatenated.

  • For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated string because it is not the concatenation of any permutation of words.

Return an array of the starting indices of all the concatenated substrings in s. You can return the answer in any order.

 

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]

Output: [0,9]

Explanation:

The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

Output: []

Explanation:

There is no concatenated substring.

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]

Output: [6,9,12]

Explanation:

The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"].
The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"].
The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"].

 

Constraints:

  • 1 <= s.length <= 104
  • 1 <= words.length <= 5000
  • 1 <= words[i].length <= 30
  • s and words[i] consist of lowercase English letters.

Solution Explanation: Substring with Concatenation of All Words

This problem asks to find all starting indices of substrings within a given string s that are concatenations of all words from an input array words. All words in words have the same length.

Approach: The optimal approach uses a sliding window technique combined with hash tables (or dictionaries) for efficient word counting.

Algorithm:

  1. Preprocessing: Create a hash table (cnt) to store the frequency of each word in the words array. This allows for O(1) lookup of word counts.

  2. Sliding Window: Iterate through the string s using a sliding window of size n * k, where n is the number of words in words and k is the length of each word. The window slides one character at a time.

  3. Inner Window (Word-by-Word): Within the main sliding window, we maintain a smaller, inner sliding window of size k. This inner window iterates through the words within the larger window.

  4. Counting Words: As the inner window moves, we use a second hash table (cnt1) to count the occurrences of words within the current larger window.

  5. Validation: After processing the complete larger window, we compare cnt1 with cnt. If they are identical (meaning all words from words are present in the correct frequencies), we add the starting index of the large window to the results.

  6. Handling Mismatches: If, at any point during the inner window's traversal, a word is encountered that is not in cnt, or its count in cnt1 exceeds its count in cnt, we reset the larger window and the cnt1 counter.

Time Complexity: O(m*k), where m is the length of string s and k is the length of each word. The outer loop iterates through s at most m times. The inner loop iterates over at most m characters, divided by k. In the worst case, the inner while loop which shrinks the window, might run up to m/k times. The overall time complexity is dominated by these nested loops.

Space Complexity: O(n*k), where n is the number of words and k is the length of each word. This comes from storing the word counts in the hash tables cnt and cnt1.

Code Examples (Python):

from collections import Counter
 
def findSubstring(s, words):
    word_len = len(words[0])
    num_words = len(words)
    total_len = word_len * num_words
    result = []
    word_counts = Counter(words) #Preprocessing word counts
 
    for i in range(len(s) - total_len + 1):
        substring = s[i:i + total_len]
        current_counts = Counter()
        valid = True
        for j in range(0, total_len, word_len):
            word = substring[j:j + word_len]
            if word not in word_counts:
                valid = False
                break
            current_counts[word] += 1
            if current_counts[word] > word_counts[word]:
                valid = False
                break
        if valid and current_counts == word_counts:
            result.append(i)
    return result
 

This Python code efficiently implements the sliding window and hash table approach, offering a clearer, more concise representation than the multi-language version. The time and space complexities remain the same.