{x}
blog image

String Compression II

Run-length encoding is a string compression method that works by replacing consecutive identical characters (repeated 2 or more times) with the concatenation of the character and the number marking the count of the characters (length of the run). For example, to compress the string "aabccc" we replace "aa" by "a2" and replace "ccc" by "c3". Thus the compressed string becomes "a2bc3".

Notice that in this problem, we are not adding '1' after single characters.

Given a string s and an integer k. You need to delete at most k characters from s such that the run-length encoded version of s has minimum length.

Find the minimum length of the run-length encoded version of s after deleting at most k characters.

 

Example 1:

Input: s = "aaabcccd", k = 2
Output: 4
Explanation: Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length of the compressed string to 5, for instance delete 2 'a' then we will have s = "abcccd" which compressed is abc3d. Therefore, the optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" of length 4.

Example 2:

Input: s = "aabbaa", k = 2
Output: 2
Explanation: If we delete both 'b' characters, the resulting compressed string would be "a4" of length 2.

Example 3:

Input: s = "aaaaaaaaaaa", k = 0
Output: 3
Explanation: Since k is zero, we cannot delete anything. The compressed string is "a11" of length 3.

 

Constraints:

  • 1 <= s.length <= 100
  • 0 <= k <= s.length
  • s contains only lowercase English letters.

Solution Explanation for String Compression II

This problem asks to find the minimum length of the run-length encoded version of a string s after deleting at most k characters. We can solve this using dynamic programming.

Approach:

The core idea is to build a dynamic programming solution where dp[i][k] represents the minimum length of the compressed string for the substring s[i:] after deleting at most k characters. We iterate through the string, considering each character as a potential starting point for a run of identical characters. For each starting point, we try extending the run until we encounter a different character or reach the end of the string. We then recursively solve the problem for the remaining substring, accounting for the deletions made.

Algorithm:

  1. Base Cases:

    • If k is negative (more deletions than allowed), return a large value (e.g., K_MAX).
    • If we've reached the end of the string or have deleted all remaining characters, the compressed length is 0.
  2. Recursive Step:

    • For each starting index i, we iterate to find the maximum frequency maxFreq of consecutive identical characters from i to some j.
    • We calculate the length of representing this run (getLength(maxFreq) which accounts for 1-digit, 2-digit, and 3-digit frequencies).
    • We recursively call the function compression(s, j + 1, k - (j - i + 1 - maxFreq)) for the remaining substring s[j+1:]. We subtract (j - i + 1 - maxFreq) from k because we are deleting characters not included in the max frequency run.
    • The minimum length for this starting position is the sum of the run length and the recursively calculated length of the remaining substring.
  3. Memoization:

    • We use a 2D array dp to store the results of subproblems. This avoids redundant calculations, significantly improving efficiency.
  4. getLength(maxFreq) function: This helper function efficiently determines the length of the representation of a character's frequency in the run-length encoding.

Time and Space Complexity:

  • Time Complexity: O(n^2 * k), where n is the length of the string and k is the maximum number of deletions allowed. The nested loops iterate through all possible starting positions and run lengths, and the recursive calls add another factor of n.
  • Space Complexity: O(n * k) due to the memoization table dp.

Code (Java):

class Solution {
    public int getLengthOfOptimalCompression(String s, int k) {
        int n = s.length();
        int[][] dp = new int[n + 1][k + 1];
        for (int[] row : dp) Arrays.fill(row, Integer.MAX_VALUE);
        dp[n][0] = 0;
 
        for (int i = n - 1; i >= 0; --i) {
            for (int j = 0; j <= k; ++j) {
                int count[] = new int[26];
                int maxCount = 0;
                for (int l = i; l < n; ++l) {
                    count[s.charAt(l) - 'a']++;
                    maxCount = Math.max(maxCount, count[s.charAt(l) - 'a']);
                    int deleted = (l - i + 1) - maxCount;
                    if (deleted <= j && dp[l + 1][j - deleted] != Integer.MAX_VALUE) {
                        dp[i][j] = Math.min(dp[i][j], dp[l + 1][j - deleted] + getLength(maxCount));
                    }
                }
            }
        }
        return dp[0][k];
    }
 
    private int getLength(int count) {
        return count == 0 ? 0 : 1 + (count >= 10 ? (count >= 100 ? 2 : 1) : 0);
    }
}

This Java code uses a slightly optimized bottom-up DP approach, directly filling the DP table rather than using recursion. This can offer improved performance in practice. The fundamental algorithmic strategy remains the same. The getLength() function efficiently calculates the length needed to encode the frequency of a character.