Given a string s
, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s
does not have a duplicated substring, the answer is ""
.
Example 1:
Input: s = "banana" Output: "ana"
Example 2:
Input: s = "abcd" Output: ""
Constraints:
2 <= s.length <= 3 * 104
s
consists of lowercase English letters.This problem asks to find the longest duplicated substring within a given string. The solutions presented use a combination of binary search and a rolling hash to efficiently solve this problem.
Core Idea:
The solutions employ binary search to find the length of the longest duplicated substring. The search space is the length of the substrings, ranging from 1 to the length of the input string. For each potential length, a rolling hash is used to check for duplicates efficiently.
Algorithm:
Binary Search: The algorithm starts by performing a binary search on the possible lengths of duplicated substrings. The search space is [1, n], where n is the length of the input string.
Rolling Hash (check function): For each length len
tested during the binary search, the check
function utilizes a rolling hash to efficiently identify substrings of length len
. A set (vis
in Python, HashSet
in Java, unordered_set
in C++, and map
in Go) is used to store the hash values of encountered substrings. If a hash value is already present in the set, it indicates a duplicate substring of the tested length. The actual substring is then extracted and returned. If no duplicates are found for a given length, an empty string is returned.
Update Answer: If a duplicated substring is found for a given length, the ans
variable is updated to store the longest duplicated substring found so far.
Binary Search Iteration: The binary search continues until the left and right pointers converge. The final value of ans
represents the longest duplicated substring.
Time Complexity Analysis:
Binary Search: The binary search takes O(log n) iterations, where n is the length of the input string.
Rolling Hash: The rolling hash function in check
has a time complexity of O(n) in the worst case (where n is the length of the input string) for each length tested. This is because it iterates through the string once to calculate and check hash values.
Overall: The overall time complexity is O(n log n) because the rolling hash function (O(n)) is called O(log n) times due to the binary search.
Space Complexity Analysis:
The space complexity is dominated by the hash set (vis
) used to store hash values. In the worst case, this set could store up to O(n) entries, where n is the length of the input string. Thus, the overall space complexity is O(n).
Code Explanation (Python Example):
The Python solution clearly illustrates the binary search and rolling hash approach:
class Solution:
def longestDupSubstring(self, s: str) -> str:
def check(l): #checks for duplicates of length l
vis = set()
for i in range(n - l + 1):
t = s[i : i + l] #substring of length l
if t in vis:
return t #found a duplicate
vis.add(t)
return '' #no duplicates found
n = len(s)
left, right = 0, n
ans = '' #stores longest duplicate found so far
while left < right:
mid = (left + right + 1) >> 1 #binary search mid
t = check(mid) #check for duplicates of length mid
ans = t or ans #update ans if a longer duplicate is found
if t: #if duplicate found, increase search range
left = mid
else: #if no duplicate, decrease search range
right = mid - 1
return ans
The other language implementations follow the same logic with minor syntax differences to accommodate the respective language's features. The C++, Java, and Go examples use explicit rolling hash implementations to handle potential hash collisions more effectively than the Python example's direct string comparison.