Return the number of distinct non-empty substrings of text
that can be written as the concatenation of some string with itself (i.e. it can be written as a + a
where a
is some string).
Example 1:
Input: text = "abcabcabc" Output: 3 Explanation: The 3 substrings are "abcabc", "bcabca" and "cabcab".
Example 2:
Input: text = "leetcodeleetcode" Output: 2 Explanation: The 2 substrings are "ee" and "leetcodeleetcode".
Constraints:
1 <= text.length <= 2000
text
has only lowercase English letters.This problem asks to find the number of distinct non-empty substrings of a given string that can be written as the concatenation of some string with itself (e.g., "abcabc" is "abc" + "abc"). A brute-force approach would have a very high time complexity. The optimal solution uses a rolling hash technique to efficiently identify such substrings.
The core idea is to use a rolling hash to efficiently compare substrings. A rolling hash function allows us to compute the hash of a substring in O(1) time after computing the hash of the previous substring. This avoids repeatedly computing hashes for overlapping substrings, significantly improving efficiency.
The algorithm works as follows:
Preprocessing: Calculate a rolling hash for each substring of the input text. We use a base (e.g., 131) and a large prime modulus to minimize hash collisions. h[i]
stores the hash of the substring from index 0 to i
. p[i]
stores the i-th power of the base modulo the prime.
Iteration: Iterate through all possible substring pairs: Outer loop from i = 0
to n-1
, inner loop from j = i+1
to n
, incrementing by 2 (to ensure even-length substrings which are twice the length of the repeating substring). For each pair, find the midpoint k = (i + j) / 2
. Then check if the hash of the first half (from i+1
to k+1
) is equal to the hash of the second half (from k+2
to j+1
).
Hash Calculation: The rolling hash for a substring from index l
to r
is calculated as: (h[r] - h[l-1] * p[r - l + 1]) % mod
. This formula effectively isolates the hash of the substring by subtracting the hash of the prefix before it, properly accounting for the base.
Distinct Substrings: Keep track of distinct hashes using a HashSet
(or similar data structure) to avoid counting the same substring multiple times.
Return: The size of the HashSet
represents the number of distinct echo substrings.
n
times, and the inner loop, on average, runs n/2
times. This is O(n^2).HashSet
is O(1) on average.Therefore, the overall time complexity of the solution is dominated by the nested loops and is O(n^2).
h
and p
arrays) and the HashSet
to track distinct substrings.The code examples provided in the original prompt demonstrate the rolling hash approach using different programming languages. Each example follows the same algorithmic steps outlined above. The key differences lie in the syntax and specific data structures used in each language (e.g., HashSet
in Java/C++/Rust, set
in Python, map
in Go). All have the same time and space complexity.