Given a string paragraph
and a string array of the banned words banned
, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.
The words in paragraph
are case-insensitive and the answer should be returned in lowercase.
Example 1:
Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] Output: "ball" Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.
Example 2:
Input: paragraph = "a.", banned = [] Output: "a"
Constraints:
1 <= paragraph.length <= 1000
' '
, or one of the symbols: "!?',;."
.0 <= banned.length <= 100
1 <= banned[i].length <= 10
banned[i]
consists of only lowercase English letters.This problem involves finding the most frequent word in a given paragraph, excluding a list of banned words. The solution involves several steps:
Preprocessing: Clean the input paragraph by converting it to lowercase and removing punctuation. This ensures case-insensitivity and avoids counting variations of the same word differently.
Word Counting: Count the occurrences of each word in the cleaned paragraph. A hash map (dictionary in Python) is ideal for this, storing words as keys and their counts as values.
Banned Word Filtering: Exclude words from the count that appear in the banned
list. A set
(or HashSet
) is efficient for checking if a word is banned.
Finding the Most Frequent Word: Iterate through the word counts and identify the word with the highest count.
Time and Space Complexity Analysis:
banned
list. The preprocessing step (cleaning the paragraph) takes O(N) time. Building the word count takes O(N) time in the average case (hash table lookups are typically O(1)). Checking if a word is banned takes O(1) on average using a set
. Finally, finding the most frequent word takes O(N) in the worst case if we need to iterate over all words.banned
words set is O(M). Therefore, the overall space complexity is dominated by O(N).Code Examples:
The solutions below demonstrate different approaches using several programming languages. Note that some optimizations are used to improve efficiency, but the core logic remains consistent.
Python:
import re
from collections import Counter
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
# Preprocessing: Convert to lowercase and remove punctuation
cleaned_paragraph = re.sub(r'[^\w\s]', '', paragraph).lower()
# Word Counting and Filtering
word_counts = Counter(cleaned_paragraph.split())
# Finding the Most Frequent Word (Efficiently using next)
banned_set = set(banned)
for word, count in word_counts.most_common():
if word not in banned_set:
return word
Java:
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Solution {
public String mostCommonWord(String paragraph, String[] banned) {
// Preprocessing
paragraph = paragraph.toLowerCase();
Pattern p = Pattern.compile("[a-zA-Z]+"); // Matches only alphabetic words
Matcher m = p.matcher(paragraph);
Map<String, Integer> wordCounts = new HashMap<>();
Set<String> bannedSet = new HashSet<>(Arrays.asList(banned));
// Word Counting and Filtering
while (m.find()) {
String word = m.group();
if (!bannedSet.contains(word)) {
wordCounts.put(word, wordCounts.getOrDefault(word, 0) + 1);
}
}
// Finding Most Frequent Word
String mostFrequent = "";
int maxCount = 0;
for (Map.Entry<String, Integer> entry : wordCounts.entrySet()) {
if (entry.getValue() > maxCount) {
maxCount = entry.getValue();
mostFrequent = entry.getKey();
}
}
return mostFrequent;
}
}
Other Languages: The core logic is similar across other languages (C++, Go, TypeScript, Rust) with variations in library usage and syntax. The provided solutions in the original response demonstrate those variations effectively. The key is to use efficient data structures (hash maps/dictionaries and sets) and appropriate string manipulation techniques.