{x}
blog image

Distinct Echo Substrings

Return the number of distinct non-empty substrings of text that can be written as the concatenation of some string with itself (i.e. it can be written as a + a where a is some string).

 

Example 1:

Input: text = "abcabcabc"
Output: 3
Explanation: The 3 substrings are "abcabc", "bcabca" and "cabcab".

Example 2:

Input: text = "leetcodeleetcode"
Output: 2
Explanation: The 2 substrings are "ee" and "leetcodeleetcode".

 

Constraints:

  • 1 <= text.length <= 2000
  • text has only lowercase English letters.

Solution Explanation:

This problem asks to find the number of distinct non-empty substrings of a given string that can be written as the concatenation of some string with itself (e.g., "abcabc" is "abc" + "abc"). A brute-force approach would have a very high time complexity. The optimal solution uses a rolling hash technique to efficiently identify such substrings.

Approach: Rolling Hash

The core idea is to use a rolling hash to efficiently compare substrings. A rolling hash function allows us to compute the hash of a substring in O(1) time after computing the hash of the previous substring. This avoids repeatedly computing hashes for overlapping substrings, significantly improving efficiency.

The algorithm works as follows:

  1. Preprocessing: Calculate a rolling hash for each substring of the input text. We use a base (e.g., 131) and a large prime modulus to minimize hash collisions. h[i] stores the hash of the substring from index 0 to i. p[i] stores the i-th power of the base modulo the prime.

  2. Iteration: Iterate through all possible substring pairs: Outer loop from i = 0 to n-1, inner loop from j = i+1 to n, incrementing by 2 (to ensure even-length substrings which are twice the length of the repeating substring). For each pair, find the midpoint k = (i + j) / 2. Then check if the hash of the first half (from i+1 to k+1) is equal to the hash of the second half (from k+2 to j+1).

  3. Hash Calculation: The rolling hash for a substring from index l to r is calculated as: (h[r] - h[l-1] * p[r - l + 1]) % mod. This formula effectively isolates the hash of the substring by subtracting the hash of the prefix before it, properly accounting for the base.

  4. Distinct Substrings: Keep track of distinct hashes using a HashSet (or similar data structure) to avoid counting the same substring multiple times.

  5. Return: The size of the HashSet represents the number of distinct echo substrings.

Time Complexity Analysis:

  • The preprocessing step takes O(n) time, where n is the length of the input string.
  • The nested loops iterate through all possible pairs of indices. The outer loop runs n times, and the inner loop, on average, runs n/2 times. This is O(n^2).
  • The hash calculation within the inner loop is O(1) due to rolling hash.
  • The insertion into the HashSet is O(1) on average.

Therefore, the overall time complexity of the solution is dominated by the nested loops and is O(n^2).

Space Complexity Analysis:

  • The space complexity is O(n) to store the rolling hash values (h and p arrays) and the HashSet to track distinct substrings.

Code Examples (Python, Java, C++, Go, Rust):

The code examples provided in the original prompt demonstrate the rolling hash approach using different programming languages. Each example follows the same algorithmic steps outlined above. The key differences lie in the syntax and specific data structures used in each language (e.g., HashSet in Java/C++/Rust, set in Python, map in Go). All have the same time and space complexity.