{x}
blog image

Word Subsets

You are given two string arrays words1 and words2.

A string b is a subset of string a if every letter in b occurs in a including multiplicity.

  • For example, "wrr" is a subset of "warrior" but is not a subset of "world".

A string a from words1 is universal if for every string b in words2, b is a subset of a.

Return an array of all the universal strings in words1. You may return the answer in any order.

 

Example 1:

Input: words1 = ["amazon","apple","facebook","google","leetcode"], words2 = ["e","o"]

Output: ["facebook","google","leetcode"]

Example 2:

Input: words1 = ["amazon","apple","facebook","google","leetcode"], words2 = ["lc","eo"]

Output: ["leetcode"]

Example 3:

Input: words1 = ["acaac","cccbb","aacbb","caacc","bcbbb"], words2 = ["c","cc","b"]

Output: ["cccbb"]

 

Constraints:

  • 1 <= words1.length, words2.length <= 104
  • 1 <= words1[i].length, words2[i].length <= 10
  • words1[i] and words2[i] consist only of lowercase English letters.
  • All the strings of words1 are unique.

Solution Explanation:

This problem asks us to find words in words1 that are "universal". A universal word contains all the letters of every word in words2, considering the multiplicity of each letter. The solution uses a counting approach for efficiency.

Approach:

  1. Find the maximum frequency of each character in words2: We iterate through each word in words2. For each word, we count the occurrences of each character. We maintain a running count (cnt) that stores the maximum frequency seen so far for each character across all words in words2.

  2. Check universality in words1: We iterate through each word in words1. For each word, we count character frequencies (t). We then check if the frequency of every character in cnt (maximum frequencies from words2) is less than or equal to the corresponding frequency in t (frequencies from the current word in words1). If this condition holds true for all characters, the word from words1 is universal, and we add it to the result list.

Time Complexity Analysis:

  • Step 1: Iterating through words2 takes O(∑|w|, where w ∈ words2) time, which is the sum of the lengths of all words in words2. Counting character frequencies within each word takes O(|w|) time.
  • Step 2: Iterating through words1 takes O(∑|w|, where w ∈ words1) time. Counting character frequencies in each word takes O(|w|) time. The comparison of frequencies takes O(26) time (constant time because the alphabet is fixed).

Therefore, the overall time complexity is O(∑|w|), where the summation is over all words in both words1 and words2. This can be simplified to O(L), where L is the total length of all words in both input arrays.

Space Complexity Analysis:

The space complexity is dominated by the cnt array (which stores character frequencies, size 26) and the temporary t array (also size 26). Additionally, the result list ans can store at most all words from words1. So, the space complexity is O(1 + 26 + |words1|), which simplifies to O(|words1|) in the worst-case scenario. Since |words1| is given as a constraint (<= 104), space complexity is effectively linear and bounded by the input size.

Code Implementations:

The code implementations provided in the problem description demonstrate this approach efficiently in Python3, Java, C++, and Go. They all follow the same algorithmic structure described above. The differences are primarily due to syntactic variations between the languages. The use of Counter (Python) or built-in array operations simplifies the frequency counting aspect. The all() function in Python makes the universality check concise. Other languages use explicit loops for the same purpose.