{x}
blog image

Substring With Largest Variance

The variance of a string is defined as the largest difference between the number of occurrences of any 2 characters present in the string. Note the two characters may or may not be the same.

Given a string s consisting of lowercase English letters only, return the largest variance possible among all substrings of s.

A substring is a contiguous sequence of characters within a string.

 

Example 1:

Input: s = "aababbb"
Output: 3
Explanation:
All possible variances along with their respective substrings are listed below:
- Variance 0 for substrings "a", "aa", "ab", "abab", "aababb", "ba", "b", "bb", and "bbb".
- Variance 1 for substrings "aab", "aba", "abb", "aabab", "ababb", "aababbb", and "bab".
- Variance 2 for substrings "aaba", "ababbb", "abbb", and "babb".
- Variance 3 for substring "babbb".
Since the largest possible variance is 3, we return it.

Example 2:

Input: s = "abcde"
Output: 0
Explanation:
No letter occurs more than once in s, so the variance of every substring is 0.

 

Constraints:

  • 1 <= s.length <= 104
  • s consists of lowercase English letters.

Solution Explanation

The problem asks for the largest variance among all substrings of a given string s. The variance is defined as the maximum difference between the counts of any two characters within a substring.

The solution uses a brute-force approach iterating through all possible pairs of characters (a, b) and calculating the maximum variance achievable using those characters. For each pair, it iterates through the string:

  • If the current character is a, it increments the count of a.
  • If the current character is b, it updates the difference between counts of a and b, ensuring the difference doesn't become negative (meaning more bs than as).

The algorithm cleverly keeps track of the maximum difference (variance) encountered so far. The outer loops iterate through all possible pairs of characters (a, b).

Time Complexity Analysis:

  • The outer loop iterates through all pairs of characters from 'a' to 'z', resulting in O(26*26) = O(1) iterations. The number of character pairs is constant and not related to the string length.
  • The inner loop iterates through the string s of length n once for each pair (a, b). This contributes O(n) time complexity.
  • Therefore, the overall time complexity is O(n) dominated by the inner loop's iterations through the string. It's linear with respect to the length of the input string.

Space Complexity Analysis:

The algorithm uses a constant amount of extra space to store the counts of characters a and b and the maximum variance. The space complexity is O(1), constant, as it doesn't depend on the input string size.

Code Explanation (Python)

from itertools import permutations
from math import inf
 
class Solution:
    def largestVariance(self, s: str) -> int:
        ans = 0
        for a, b in permutations(ascii_lowercase, 2): # Iterate through all pairs of lowercase letters
            if a == b:
                continue #Skip if same character
            f = [0, -inf] # f[0]: count of a, f[1]: max difference (variance)
            for c in s: #Iterate through the string
                if c == a:
                    f[0] += 1 # increment count of a
                    f[1] += 1 # increment variance
                elif c == b:
                    f[1] = max(f[1] - 1, f[0] - 1) #Adjust variance ensuring it doesn't go negative
                    f[0] = 0 #Reset count of a (since we encountered b)
                if ans < f[1]: #Update max variance if necessary
                    ans = f[1]
        return ans

The Python code directly implements the algorithm described above. The permutations function from the itertools module efficiently generates all possible character pairs. The rest of the code follows the step-by-step logic of counting characters, updating the variance, and tracking the maximum variance. The use of -inf ensures that the initial variance is smaller than any possible positive variance.

The Java, C++, and Go code are very similar in structure and logic, reflecting the same algorithm. They differ only in syntax and specific library functions used. The core idea and time/space complexity remain the same across all implementations.