{x}
blog image

Encode String with Shortest Length

Solution Explanation

This problem asks to find the shortest encoded version of a given string. The encoding rule is k[encoded_string], where encoded_string is repeated k times. If encoding doesn't shorten the string, it shouldn't be encoded.

The optimal solution utilizes dynamic programming. We build a 2D array f where f[i][j] represents the shortest encoding of the substring s[i:j+1].

Approach:

  1. Base Case: For substrings shorter than 5 characters, the shortest encoding is the string itself. This is because the encoding overhead (k[ and ]) makes it longer.

  2. Recursive Relation: For longer substrings, we consider two possibilities:

    • Repeating Substring: Check if the substring contains repeating patterns. If s[i:j+1] can be expressed as k * s[i:k], where k is the length of the repeating pattern, then the encoding is k[s[i:k]].
    • Concatenation: Otherwise, the shortest encoding might be obtained by splitting the substring at some point k and recursively encoding the two parts: f[i][k] + f[k+1][j].
  3. Optimization: The algorithm compares the lengths of the encoding found through repeating substrings and concatenation, selecting the shorter one.

  4. Result: f[0][n-1] will contain the shortest encoding of the entire string.

Time Complexity: O(n^3), where n is the length of the input string. This is because of the three nested loops: the outer two loops iterate over all possible substrings, and the inner loop iterates over all possible split points for concatenation.

Space Complexity: O(n^2), dominated by the f array which stores the shortest encodings for all substrings.

Code Explanation (Python):

The Python code directly implements the approach described above. The g function handles the repeating substring check. The main part iterates through all substrings, calculating the shortest encoding for each substring using dynamic programming. f[i][j] stores the result, with the final answer in f[0][n-1].

class Solution:
    def encode(self, s: str) -> str:
        def g(i: int, j: int) -> str:
            t = s[i : j + 1]
            if len(t) < 5:
                return t
            k = (t + t).index(t, 1) # Efficiently finds repeating substring
            if k < len(t):
                cnt = len(t) // k
                return f"{cnt}[{f[i][i + k - 1]}]"
            return t
 
        n = len(s)
        f = [[None] * n for _ in range(n)] # DP table
        for i in range(n - 1, -1, -1):
            for j in range(i, n):
                f[i][j] = g(i, j)
                if j - i + 1 > 4:
                    for k in range(i, j):
                        t = f[i][k] + f[k + 1][j]
                        if len(f[i][j]) > len(t):
                            f[i][j] = t
        return f[0][-1]

The Java, C++, Go, and TypeScript code implementations follow the same logic, adapting the syntax and data structures to each respective language. They all maintain the same time and space complexity.