{x}
blog image

Remove Duplicate Letters

Given a string s, remove duplicate letters so that every letter appears once and only once. You must make sure your result is the smallest in lexicographical order among all possible results.

 

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

 

Constraints:

  • 1 <= s.length <= 104
  • s consists of lowercase English letters.

 

Note: This question is the same as 1081: https://leetcode.com/problems/smallest-subsequence-of-distinct-characters/

Solution Explanation for Remove Duplicate Letters

This problem requires finding the lexicographically smallest subsequence of a given string s that contains all unique characters. The optimal approach utilizes a stack-based greedy algorithm.

Algorithm:

  1. Last Occurrence: First, we create a dictionary (or array) last to store the index of the last occurrence of each character in the string. This is crucial for determining whether we can safely remove a character from consideration.

  2. Stack for Result: A stack stk will store the characters of our resulting lexicographically smallest subsequence.

  3. Iteration and Decision Making: We iterate through the input string s. For each character c:

    • Already in Subsequence: If c is already in the stack (vis set in Solution 1 or in_stack array in Solution 2), we skip it.
    • Not in Subsequence: If c is not in the stack, we compare it with the top of the stack (stk[-1]):
      • Top is Larger and Can be Removed: If the top of the stack is larger than c AND it appears again later in the string (i.e., last[stk[-1]] > i where i is the current index), we pop the top element from the stack (because we can find it later, and a smaller character is now available). We remove it from the vis set or in_stack array. We repeat this process as long as the condition holds.
      • Top is Smaller or Cannot be Removed: Once the top of the stack is smaller than c or the condition for removal is false, we push c onto the stack and mark it as being in the subsequence (vis or in_stack).
  4. Result: Finally, the stack stk contains the characters of the lexicographically smallest subsequence. We convert it to a string and return it.

Time and Space Complexity:

  • Time Complexity: O(N), where N is the length of the input string. We iterate through the string once, and the stack operations (push and pop) take amortized constant time.
  • Space Complexity: O(1) in the best case (if the alphabet size is limited). However, it's effectively O(N) in the worst case if we consider space used by the last dictionary, vis set, or in_stack array, as they can grow to store up to 26 (or the number of distinct characters in the alphabet) entries.

Code Examples (Python):

Solution 1 (using set):

class Solution:
    def removeDuplicateLetters(self, s: str) -> str:
        last = {c: i for i, c in enumerate(s)}
        stk = []
        vis = set()
        for i, c in enumerate(s):
            if c in vis:
                continue
            while stk and stk[-1] > c and last[stk[-1]] > i:
                vis.remove(stk.pop())
            stk.append(c)
            vis.add(c)
        return ''.join(stk)
 

Solution 2 (using array):

class Solution:
    def removeDuplicateLetters(self, s: str) -> str:
        count, in_stack = [0] * 128, [False] * 128
        stack = []
        for c in s:
            count[ord(c)] += 1
        for c in s:
            count[ord(c)] -= 1
            if in_stack[ord(c)]:
                continue
            while len(stack) and stack[-1] > c and count[ord(stack[-1])] > 0:
                peek = stack[-1]
                in_stack[ord(peek)] = False
                stack.pop()
            stack.append(c)
            in_stack[ord(c)] = True
        return ''.join(stack)

Both solutions achieve the same result; the choice depends on preference for using sets or arrays to track character inclusion in the stack. The code in other languages (Java, C++, Go) follows similar logic.