Run-length encoding is a string compression method that works by replacing consecutive identical characters (repeated 2 or more times) with the concatenation of the character and the number marking the count of the characters (length of the run). For example, to compress the string "aabccc"
we replace "aa"
by "a2"
and replace "ccc"
by "c3"
. Thus the compressed string becomes "a2bc3"
.
Notice that in this problem, we are not adding '1'
after single characters.
Given a string s
and an integer k
. You need to delete at most k
characters from s
such that the run-length encoded version of s
has minimum length.
Find the minimum length of the run-length encoded version of s
after deleting at most k
characters.
Example 1:
Input: s = "aaabcccd", k = 2 Output: 4 Explanation: Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length of the compressed string to 5, for instance delete 2 'a' then we will have s = "abcccd" which compressed is abc3d. Therefore, the optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" of length 4.
Example 2:
Input: s = "aabbaa", k = 2 Output: 2 Explanation: If we delete both 'b' characters, the resulting compressed string would be "a4" of length 2.
Example 3:
Input: s = "aaaaaaaaaaa", k = 0 Output: 3 Explanation: Since k is zero, we cannot delete anything. The compressed string is "a11" of length 3.
Constraints:
1 <= s.length <= 100
0 <= k <= s.length
s
contains only lowercase English letters.This problem asks to find the minimum length of the run-length encoded version of a string s
after deleting at most k
characters. We can solve this using dynamic programming.
Approach:
The core idea is to build a dynamic programming solution where dp[i][k]
represents the minimum length of the compressed string for the substring s[i:]
after deleting at most k
characters. We iterate through the string, considering each character as a potential starting point for a run of identical characters. For each starting point, we try extending the run until we encounter a different character or reach the end of the string. We then recursively solve the problem for the remaining substring, accounting for the deletions made.
Algorithm:
Base Cases:
k
is negative (more deletions than allowed), return a large value (e.g., K_MAX
).Recursive Step:
i
, we iterate to find the maximum frequency maxFreq
of consecutive identical characters from i
to some j
.getLength(maxFreq)
which accounts for 1-digit, 2-digit, and 3-digit frequencies).compression(s, j + 1, k - (j - i + 1 - maxFreq))
for the remaining substring s[j+1:]
. We subtract (j - i + 1 - maxFreq)
from k
because we are deleting characters not included in the max frequency run.Memoization:
dp
to store the results of subproblems. This avoids redundant calculations, significantly improving efficiency.getLength(maxFreq)
function: This helper function efficiently determines the length of the representation of a character's frequency in the run-length encoding.
Time and Space Complexity:
dp
.Code (Java):
class Solution {
public int getLengthOfOptimalCompression(String s, int k) {
int n = s.length();
int[][] dp = new int[n + 1][k + 1];
for (int[] row : dp) Arrays.fill(row, Integer.MAX_VALUE);
dp[n][0] = 0;
for (int i = n - 1; i >= 0; --i) {
for (int j = 0; j <= k; ++j) {
int count[] = new int[26];
int maxCount = 0;
for (int l = i; l < n; ++l) {
count[s.charAt(l) - 'a']++;
maxCount = Math.max(maxCount, count[s.charAt(l) - 'a']);
int deleted = (l - i + 1) - maxCount;
if (deleted <= j && dp[l + 1][j - deleted] != Integer.MAX_VALUE) {
dp[i][j] = Math.min(dp[i][j], dp[l + 1][j - deleted] + getLength(maxCount));
}
}
}
}
return dp[0][k];
}
private int getLength(int count) {
return count == 0 ? 0 : 1 + (count >= 10 ? (count >= 100 ? 2 : 1) : 0);
}
}
This Java code uses a slightly optimized bottom-up DP approach, directly filling the DP table rather than using recursion. This can offer improved performance in practice. The fundamental algorithmic strategy remains the same. The getLength()
function efficiently calculates the length needed to encode the frequency of a character.