{x}
blog image

Longest Duplicate Substring

Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.

Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".

 

Example 1:

Input: s = "banana"
Output: "ana"

Example 2:

Input: s = "abcd"
Output: ""

 

Constraints:

  • 2 <= s.length <= 3 * 104
  • s consists of lowercase English letters.

Solution Explanation:

This problem asks to find the longest duplicated substring within a given string. The solutions presented use a combination of binary search and a rolling hash to efficiently solve this problem.

Core Idea:

The solutions employ binary search to find the length of the longest duplicated substring. The search space is the length of the substrings, ranging from 1 to the length of the input string. For each potential length, a rolling hash is used to check for duplicates efficiently.

Algorithm:

  1. Binary Search: The algorithm starts by performing a binary search on the possible lengths of duplicated substrings. The search space is [1, n], where n is the length of the input string.

  2. Rolling Hash (check function): For each length len tested during the binary search, the check function utilizes a rolling hash to efficiently identify substrings of length len. A set (vis in Python, HashSet in Java, unordered_set in C++, and map in Go) is used to store the hash values of encountered substrings. If a hash value is already present in the set, it indicates a duplicate substring of the tested length. The actual substring is then extracted and returned. If no duplicates are found for a given length, an empty string is returned.

  3. Update Answer: If a duplicated substring is found for a given length, the ans variable is updated to store the longest duplicated substring found so far.

  4. Binary Search Iteration: The binary search continues until the left and right pointers converge. The final value of ans represents the longest duplicated substring.

Time Complexity Analysis:

  • Binary Search: The binary search takes O(log n) iterations, where n is the length of the input string.

  • Rolling Hash: The rolling hash function in check has a time complexity of O(n) in the worst case (where n is the length of the input string) for each length tested. This is because it iterates through the string once to calculate and check hash values.

  • Overall: The overall time complexity is O(n log n) because the rolling hash function (O(n)) is called O(log n) times due to the binary search.

Space Complexity Analysis:

The space complexity is dominated by the hash set (vis) used to store hash values. In the worst case, this set could store up to O(n) entries, where n is the length of the input string. Thus, the overall space complexity is O(n).

Code Explanation (Python Example):

The Python solution clearly illustrates the binary search and rolling hash approach:

class Solution:
    def longestDupSubstring(self, s: str) -> str:
        def check(l): #checks for duplicates of length l
            vis = set()
            for i in range(n - l + 1):
                t = s[i : i + l]  #substring of length l
                if t in vis:
                    return t  #found a duplicate
                vis.add(t)
            return ''  #no duplicates found
 
        n = len(s)
        left, right = 0, n
        ans = '' #stores longest duplicate found so far
        while left < right:
            mid = (left + right + 1) >> 1 #binary search mid
            t = check(mid) #check for duplicates of length mid
            ans = t or ans #update ans if a longer duplicate is found
            if t: #if duplicate found, increase search range
                left = mid
            else: #if no duplicate, decrease search range
                right = mid - 1
        return ans

The other language implementations follow the same logic with minor syntax differences to accommodate the respective language's features. The C++, Java, and Go examples use explicit rolling hash implementations to handle potential hash collisions more effectively than the Python example's direct string comparison.