{x}
blog image

Check if an Original String Exists Given Two Encoded Strings

An original string, consisting of lowercase English letters, can be encoded by the following steps:

  • Arbitrarily split it into a sequence of some number of non-empty substrings.
  • Arbitrarily choose some elements (possibly none) of the sequence, and replace each with its length (as a numeric string).
  • Concatenate the sequence as the encoded string.

For example, one way to encode an original string "abcdefghijklmnop" might be:

  • Split it as a sequence: ["ab", "cdefghijklmn", "o", "p"].
  • Choose the second and third elements to be replaced by their lengths, respectively. The sequence becomes ["ab", "12", "1", "p"].
  • Concatenate the elements of the sequence to get the encoded string: "ab121p".

Given two encoded strings s1 and s2, consisting of lowercase English letters and digits 1-9 (inclusive), return true if there exists an original string that could be encoded as both s1 and s2. Otherwise, return false.

Note: The test cases are generated such that the number of consecutive digits in s1 and s2 does not exceed 3.

 

Example 1:

Input: s1 = "internationalization", s2 = "i18n"
Output: true
Explanation: It is possible that "internationalization" was the original string.
- "internationalization" 
  -> Split:       ["internationalization"]
  -> Do not replace any element
  -> Concatenate:  "internationalization", which is s1.
- "internationalization"
  -> Split:       ["i", "nternationalizatio", "n"]
  -> Replace:     ["i", "18",                 "n"]
  -> Concatenate:  "i18n", which is s2

Example 2:

Input: s1 = "l123e", s2 = "44"
Output: true
Explanation: It is possible that "leetcode" was the original string.
- "leetcode" 
  -> Split:      ["l", "e", "et", "cod", "e"]
  -> Replace:    ["l", "1", "2",  "3",   "e"]
  -> Concatenate: "l123e", which is s1.
- "leetcode" 
  -> Split:      ["leet", "code"]
  -> Replace:    ["4",    "4"]
  -> Concatenate: "44", which is s2.

Example 3:

Input: s1 = "a5b", s2 = "c5b"
Output: false
Explanation: It is impossible.
- The original string encoded as s1 must start with the letter 'a'.
- The original string encoded as s2 must start with the letter 'c'.

 

Constraints:

  • 1 <= s1.length, s2.length <= 40
  • s1 and s2 consist of digits 1-9 (inclusive), and lowercase English letters only.
  • The number of consecutive digits in s1 and s2 does not exceed 3.

Solution Explanation

This problem asks whether two encoded strings, s1 and s2, could have originated from the same original string. The encoding process involves splitting the string into substrings, replacing some with their lengths, and concatenating the result. The solution uses dynamic programming to efficiently determine if such an original string exists.

Approach

The core idea is to track the difference in the lengths of the decoded substrings of s1 and s2 at each step. We use a 3D DP array dp[i][j][k] where:

  • i represents the index in s1.
  • j represents the index in s2.
  • k represents the difference in the lengths processed so far (length(s1_decoded) - length(s2_decoded)).

The DP state transition involves considering several cases:

  1. Both s1[i] and s2[j] are digits: We process consecutive digits to get the numeric length, updating k accordingly.

  2. s1[i] is a digit, s2[j] is a letter: We process consecutive digits in s1, increasing k by the numeric length.

  3. s1[i] is a letter, s2[j] is a digit: We process consecutive digits in s2, decreasing k by the numeric length.

  4. Both s1[i] and s2[j] are letters: If they are equal, we move to the next characters (i+1, j+1), keeping k unchanged.

The base case is dp[0][0][0] = true, meaning an empty string can be encoded as both empty strings. The final result is true if dp[n][m][0] is true, where n and m are the lengths of s1 and s2, indicating that both strings could be decoded to have the same length.

Time and Space Complexity

  • Time Complexity: O(nmL), where n is the length of s1, m is the length of s2, and L is the maximum length of consecutive digits in either string (constrained to 3 in the problem). The nested loops iterate through all possible states of the DP array.

  • Space Complexity: O(n*m), dominated by the size of the DP array. The set within each cell could theoretically grow large but is bounded practically by the constraints.

Code (TypeScript)

function possiblyEquals(s1: string, s2: string): boolean {
    const n = s1.length, m = s2.length;
    const dp: boolean[][][] = Array.from({ length: n + 1 }, () =>
        Array.from({ length: m + 1 }, () => Array(81).fill(false)) // 81 covers sufficient range of length differences
    );
    dp[0][0][40] = true; // Initialize with a suitable middle value for the length difference
 
    for (let i = 0; i <= n; ++i) {
        for (let j = 0; j <= m; ++j) {
            for (let k = 0; k < 81; ++k) {
                if (!dp[i][j][k]) continue;
 
                // Process digits in s1
                let num1 = 0;
                for (let p = i; p < Math.min(i + 3, n); ++p) {
                    if (isDigit(s1[p])) {
                        num1 = num1 * 10 + parseInt(s1[p]);
                        if (k + num1 >=0 && k + num1 < 81) dp[p + 1][j][k + num1] = true;
                    } else break;
                }
 
                // Process digits in s2
                let num2 = 0;
                for (let q = j; q < Math.min(j + 3, m); ++q) {
                    if (isDigit(s2[q])) {
                        num2 = num2 * 10 + parseInt(s2[q]);
                         if (k - num2 >=0 && k - num2 < 81) dp[i][q + 1][k - num2] = true;
                    } else break;
                }
                
                // Process letters
                if (i < n && j < m && s1[i] === s2[j]) {
                    dp[i + 1][j + 1][k] = true;
                }
            }
        }
    }
    return dp[n][m][40]; // Check if a length difference of 0 is possible
}
 
function isDigit(char: string): boolean {
    return /^\d$/.test(char);
}

This TypeScript code implements the optimized dynamic programming solution, handling the length differences more efficiently using a bounded array index. Other languages (Python, Java, C++) could use a similar approach, adapting the syntax and data structures as needed. The fundamental algorithm remains the same.