{x}
blog image

Count Unique Characters of All Substrings of a Given String

Let's define a function countUniqueChars(s) that returns the number of unique characters in s.

  • For example, calling countUniqueChars(s) if s = "LEETCODE" then "L", "T", "C", "O", "D" are the unique characters since they appear only once in s, therefore countUniqueChars(s) = 5.

Given a string s, return the sum of countUniqueChars(t) where t is a substring of s. The test cases are generated such that the answer fits in a 32-bit integer.

Notice that some substrings can be repeated so in this case you have to count the repeated ones too.

 

Example 1:

Input: s = "ABC"
Output: 10
Explanation: All possible substrings are: "A","B","C","AB","BC" and "ABC".
Every substring is composed with only unique letters.
Sum of lengths of all substring is 1 + 1 + 1 + 2 + 2 + 3 = 10

Example 2:

Input: s = "ABA"
Output: 8
Explanation: The same as example 1, except countUniqueChars("ABA") = 1.

Example 3:

Input: s = "LEETCODE"
Output: 92

 

Constraints:

  • 1 <= s.length <= 105
  • s consists of uppercase English letters only.

Solution Explanation: Count Unique Characters of All Substrings of a Given String

This problem asks to calculate the sum of unique characters across all substrings of a given string. A naive approach of generating all substrings and then counting unique characters for each would be computationally expensive (O(n³)). The optimal solution leverages a more efficient approach with linear time complexity.

Optimal Approach: Contribution of Each Character

The core idea is to analyze the contribution of each individual character to the total count of unique characters across all substrings. Instead of generating all substrings, we directly count how many substrings each character contributes a unique character to.

Algorithm:

  1. Store Character Positions: Create a data structure (like a dictionary or list of lists in Python, or a HashMap in Java) to store the indices (positions) of each character in the input string s. This allows for quick access to all occurrences of a particular character.

  2. Calculate Contribution: For each character c and each of its occurrences at index i:

    • Find the index of the previous occurrence of c (let's call it prev) and the index of the next occurrence of c (let's call it next). If there is no previous or next occurrence, set prev to -1 and next to the length of the string.
    • The number of substrings where c is unique at position i is calculated as (i - prev) * (next - i). This is because the substring must start after prev and end before next to contain only one instance of c at position i.
  3. Sum Contributions: Sum up the contributions of all characters across all their occurrences. This final sum represents the total number of unique characters across all substrings.

Time and Space Complexity:

  • Time Complexity: O(n), where n is the length of the input string. We iterate through the string once to store character positions and then iterate through the positions of each character (in total still O(n) operations).
  • Space Complexity: O(n) in the worst case, as the data structure storing character positions could potentially store all indices for a string with only unique characters.

Code Examples (Python, Java, C++)

The following code examples demonstrate the solution in different programming languages. The core logic remains the same across all of them.

Python:

from collections import defaultdict
 
def uniqueLetterString(s: str) -> int:
    d = defaultdict(list)  # Store indices for each character
    for i, c in enumerate(s):
        d[c].append(i)
 
    ans = 0
    for char_indices in d.values():
        indices = [-1] + char_indices + [len(s)]  # Add sentinel values for edge cases
        for i in range(1, len(indices) - 1):
            ans += (indices[i] - indices[i - 1]) * (indices[i + 1] - indices[i])
 
    return ans

Java:

import java.util.*;
 
class Solution {
    public int uniqueLetterString(String s) {
        Map<Character, List<Integer>> charIndices = new HashMap<>();
        for (int i = 0; i < s.length(); i++) {
            charIndices.computeIfAbsent(s.charAt(i), k -> new ArrayList<>()).add(i);
        }
 
        int ans = 0;
        for (List<Integer> indices : charIndices.values()) {
            List<Integer> extendedIndices = new ArrayList<>(List.of(-1));
            extendedIndices.addAll(indices);
            extendedIndices.add(s.length());
 
            for (int i = 1; i < extendedIndices.size() - 1; i++) {
                ans += (extendedIndices.get(i) - extendedIndices.get(i - 1)) * (extendedIndices.get(i + 1) - extendedIndices.get(i));
            }
        }
        return ans;
    }
}

C++:

#include <iostream>
#include <string>
#include <vector>
#include <map>
 
using namespace std;
 
int uniqueLetterString(string s) {
    map<char, vector<int>> charIndices;
    for (int i = 0; i < s.length(); i++) {
        charIndices[s[i]].push_back(i);
    }
 
    int ans = 0;
    for (auto const& [key, val] : charIndices) {
        vector<int> extendedIndices = {-1};
        extendedIndices.insert(extendedIndices.end(), val.begin(), val.end());
        extendedIndices.push_back(s.length());
 
        for (size_t i = 1; i < extendedIndices.size() - 1; i++) {
            ans += (extendedIndices[i] - extendedIndices[i - 1]) * (extendedIndices[i + 1] - extendedIndices[i]);
        }
    }
    return ans;
}

These examples implement the efficient algorithm described above, achieving a linear time complexity. Remember to handle edge cases (first and last occurrences of characters) appropriately as shown in the code.