{x}
blog image

Smallest Subsequence of Distinct Characters

Given a string s, return the lexicographically smallest subsequence of s that contains all the distinct characters of s exactly once.

 

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

 

Constraints:

  • 1 <= s.length <= 1000
  • s consists of lowercase English letters.

 

Note: This question is the same as 316: https://leetcode.com/problems/remove-duplicate-letters/

Solution Explanation for Smallest Subsequence of Distinct Characters

This problem asks to find the lexicographically smallest subsequence of a given string that contains all distinct characters exactly once. The optimal approach uses a monotonic stack and a greedy strategy.

Core Idea:

The solution iterates through the string. For each character, it checks if it's already in the result (using a set or bitmask). If not, it compares it with the characters currently on the stack. If the character is smaller than the top of the stack, and there are more occurrences of the top of the stack later in the string, the top of the stack is popped. This process continues until either the stack is empty, or the current character is not smaller than the top of the stack, or there are no more occurrences of the top of the stack later. The current character is then pushed onto the stack. This ensures that the resulting subsequence is lexicographically smallest.

Algorithm:

  1. Record Last Occurrences: Create a map (or array) to store the last index of each character in the input string. This helps determine if a character needs to be kept on the stack or can be removed.

  2. Initialize Stack and Visited Set: Use a stack to build the subsequence. Use a set (or bitmask) to keep track of characters already included in the subsequence.

  3. Iterate and Process: Iterate through the input string.

    • If the character is already in the visited set, skip it.
    • While the stack is not empty, and the top of the stack is greater than the current character, and there are more occurrences of the top of the stack later in the string (check using the last map/array):
      • Pop the top element from the stack and remove it from the visited set.
    • Push the current character onto the stack and add it to the visited set.
  4. Construct Result: After iterating through the entire string, the stack contains the lexicographically smallest subsequence. Convert the stack into a string and return it.

Time Complexity Analysis:

The algorithm iterates through the string once. The while loop inside the main loop performs at most O(n) operations in the worst case, where n is the length of the string (e.g., if the string is sorted in descending order). Therefore, the overall time complexity is O(n).

Space Complexity Analysis:

The space complexity is determined by the stack and the last map/array (or visited set/bitmask). In the worst case, the stack can hold up to 26 characters (all unique lowercase English letters), and the last map/array uses O(26) space. The visited set uses O(26) space. Therefore, the space complexity is O(1) (constant). It does not scale with the input string length.

Code Examples (with detailed comments):

Python:

def smallestSubsequence(s: str) -> str:
    last_occurrence = {c: i for i, c in enumerate(s)} # Dictionary to store last indices
    stack = []
    visited = set()  # Set to track already included characters
 
    for i, c in enumerate(s):
        if c in visited:  # Skip if already in subsequence
            continue
        while stack and stack[-1] > c and last_occurrence[stack[-1]] > i: #Greedy step: remove larger chars that appear later
            visited.remove(stack.pop())
        stack.append(c)
        visited.add(c)
 
    return "".join(stack)
 

Java:

import java.util.ArrayDeque;
import java.util.Deque;
import java.util.HashSet;
import java.util.Set;
 
class Solution {
    public String smallestSubsequence(String s) {
        int[] lastOccurrence = new int[26]; // Array to store last indices
        for (int i = 0; i < s.length(); i++) {
            lastOccurrence[s.charAt(i) - 'a'] = i;
        }
        Deque<Character> stack = new ArrayDeque<>();
        Set<Character> visited = new HashSet<>(); //Set to track characters
 
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (visited.contains(c)) {
                continue;
            }
            while (!stack.isEmpty() && stack.peek() > c && lastOccurrence[stack.peek() - 'a'] > i) {
                visited.remove(stack.pop());
            }
            stack.push(c);
            visited.add(c);
        }
 
        StringBuilder sb = new StringBuilder();
        while (!stack.isEmpty()) {
            sb.append(stack.pop()); //Pop in reverse order to get the correct order
        }
        return sb.reverse().toString();
    }
}

The other languages (C++, Go, TypeScript) follow a similar structure, leveraging their respective data structures (stacks and sets/bitmasks). The core logic of the greedy approach remains the same across all implementations.