You are given a string s
and an array of strings words
. All the strings of words
are of the same length.
A concatenated string is a string that exactly contains all the strings of any permutation of words
concatenated.
words = ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, "cdabef"
, "cdefab"
, "efabcd"
, and "efcdab"
are all concatenated strings. "acdbef"
is not a concatenated string because it is not the concatenation of any permutation of words
.Return an array of the starting indices of all the concatenated substrings in s
. You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation:
The substring starting at 0 is "barfoo"
. It is the concatenation of ["bar","foo"]
which is a permutation of words
.
The substring starting at 9 is "foobar"
. It is the concatenation of ["foo","bar"]
which is a permutation of words
.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation:
There is no concatenated substring.
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation:
The substring starting at 6 is "foobarthe"
. It is the concatenation of ["foo","bar","the"]
.
The substring starting at 9 is "barthefoo"
. It is the concatenation of ["bar","the","foo"]
.
The substring starting at 12 is "thefoobar"
. It is the concatenation of ["the","foo","bar"]
.
Constraints:
1 <= s.length <= 104
1 <= words.length <= 5000
1 <= words[i].length <= 30
s
and words[i]
consist of lowercase English letters.This problem asks to find all starting indices of substrings within a given string s
that are concatenations of all words from an input array words
. All words in words
have the same length.
Approach: The optimal approach uses a sliding window technique combined with hash tables (or dictionaries) for efficient word counting.
Algorithm:
Preprocessing: Create a hash table (cnt
) to store the frequency of each word in the words
array. This allows for O(1) lookup of word counts.
Sliding Window: Iterate through the string s
using a sliding window of size n * k
, where n
is the number of words in words
and k
is the length of each word. The window slides one character at a time.
Inner Window (Word-by-Word): Within the main sliding window, we maintain a smaller, inner sliding window of size k
. This inner window iterates through the words within the larger window.
Counting Words: As the inner window moves, we use a second hash table (cnt1
) to count the occurrences of words within the current larger window.
Validation: After processing the complete larger window, we compare cnt1
with cnt
. If they are identical (meaning all words from words
are present in the correct frequencies), we add the starting index of the large window to the results.
Handling Mismatches: If, at any point during the inner window's traversal, a word is encountered that is not in cnt
, or its count in cnt1
exceeds its count in cnt
, we reset the larger window and the cnt1
counter.
Time Complexity: O(m*k), where m is the length of string s
and k is the length of each word. The outer loop iterates through s
at most m
times. The inner loop iterates over at most m
characters, divided by k. In the worst case, the inner while loop which shrinks the window, might run up to m/k
times. The overall time complexity is dominated by these nested loops.
Space Complexity: O(n*k), where n is the number of words and k is the length of each word. This comes from storing the word counts in the hash tables cnt
and cnt1
.
Code Examples (Python):
from collections import Counter
def findSubstring(s, words):
word_len = len(words[0])
num_words = len(words)
total_len = word_len * num_words
result = []
word_counts = Counter(words) #Preprocessing word counts
for i in range(len(s) - total_len + 1):
substring = s[i:i + total_len]
current_counts = Counter()
valid = True
for j in range(0, total_len, word_len):
word = substring[j:j + word_len]
if word not in word_counts:
valid = False
break
current_counts[word] += 1
if current_counts[word] > word_counts[word]:
valid = False
break
if valid and current_counts == word_counts:
result.append(i)
return result
This Python code efficiently implements the sliding window and hash table approach, offering a clearer, more concise representation than the multi-language version. The time and space complexities remain the same.