This problem asks to find the number of distinct substrings within a given string. We'll explore two approaches: brute force and string hashing.
This approach directly implements the definition of a substring. We iterate through all possible starting and ending positions of substrings, generating each substring and storing it in a set (to automatically handle uniqueness). The size of the set at the end represents the number of distinct substrings.
Code (Python):
class Solution:
def countDistinct(self, s: str) -> int:
n = len(s)
return len({s[i:j] for i in range(n) for j in range(i + 1, n + 1)})
Time Complexity: O(n^3) - Generating all substrings takes O(n^2), and checking for uniqueness in a set can take O(n) in the worst case for each substring (though on average it's O(1)).
Space Complexity: O(n^2) - In the worst case (all substrings are unique), the set will store all O(n^2) substrings.
Brute force is inefficient for larger strings. String hashing offers a significant optimization. We use a rolling hash to efficiently compute the hash value for each substring without recomputing it from scratch each time. This drastically reduces the time complexity.
Core Idea:
Hash Function: We treat each substring as a number in a chosen base (e.g., 131) and calculate its hash value. This maps substrings to integers. The probability of collision (two different substrings having the same hash) is very low with a good base choice.
Rolling Hash: Instead of recalculating the hash for each substring, we use a rolling hash technique. When moving from one substring to the next (e.g., from "abc" to "bcd"), we can update the hash value efficiently using the previous hash and the characters being added and removed. This avoids redundant computations.
Code (Python):
class Solution:
def countDistinct(self, s: str) -> int:
base = 131
n = len(s)
p = [0] * (n + 10) # powers of base
h = [0] * (n + 10) # hash values
p[0] = 1
for i, c in enumerate(s):
p[i + 1] = p[i] * base
h[i + 1] = h[i] * base + ord(c)
ss = set()
for i in range(1, n + 1):
for j in range(i, n + 1):
t = h[j] - h[i - 1] * p[j - i + 1] #Rolling hash calculation
ss.add(t)
return len(ss)
Time Complexity: O(n^2) - Although we iterate through all possible substrings, the hash calculation for each substring is O(1) thanks to the rolling hash.
Space Complexity: O(n) - We store the hash values and powers of the base, which are linear in the string length. The set ss
storing unique hashes could also reach O(n^2) in worst case (though unlikely with a good hash function), but it's generally considered O(n) in practice.
Note: While string hashing improves significantly over brute force, there are more sophisticated algorithms (like suffix arrays or tries) that can solve this problem in O(n) time. However, those algorithms are more complex to implement. The string hashing solution presents a good balance between efficiency and implementation complexity. The other languages' code examples follow a similar structure, adapting the data structures and syntax accordingly.