{x}
blog image

Find Distance in a Binary Tree

Solution Explanation

This problem asks to find the distance between two nodes in a binary tree. The distance is defined as the number of edges in the path connecting the two nodes. The solution leverages two helper functions: lca (lowest common ancestor) and dfs (depth-first search).

1. Finding the Lowest Common Ancestor (LCA):

The lca function finds the lowest common ancestor of nodes with values p and q. The LCA is the deepest node that is an ancestor of both p and q.

  • Base Case: If the current node is null or its value is either p or q, it's the LCA, so we return the node.
  • Recursive Step: Otherwise, recursively call lca on the left and right subtrees. If one subtree returns null, it means the LCA is in the other subtree. If both subtrees return a non-null node, then the LCA is the current node itself (because both p and q are in different subtrees).

2. Depth-First Search (DFS) to Calculate Distance:

The dfs function calculates the distance from the LCA to a given node (v).

  • Base Case: If the current node is null, we return -1, indicating that the node v is not in the subtree rooted at the current node. If the current node's value is v, we return 0 (distance is 0 if we've found the node).
  • Recursive Step: Otherwise, recursively call dfs on the left and right subtrees. If both return -1, then the node v is not found in the subtree. Otherwise, the distance is 1 plus the maximum of the distances found in the left and right subtrees. This accounts for the edge connecting the current node to the node v.

3. Combining the Results:

The findDistance function orchestrates the process:

  1. It calls lca(root, p, q) to find the lowest common ancestor g of nodes p and q.
  2. It then calculates the distance from g to p using dfs(g, p) and the distance from g to q using dfs(g, q).
  3. Finally, it returns the sum of these two distances, which represents the total distance between nodes p and q.

Time Complexity Analysis:

  • The lca function visits each node at most once, so its time complexity is O(N), where N is the number of nodes in the tree.
  • The dfs function also visits each node at most once in each call, so its time complexity is O(N) for each call. Since we call it twice (once for p and once for q), the overall time complexity of this part is still O(N).
  • Therefore, the overall time complexity of the findDistance function is O(N).

Space Complexity Analysis:

  • The space complexity is dominated by the recursive calls in lca and dfs. In the worst case (a skewed tree), the recursion depth can be O(N). Hence, the space complexity is O(N).

This approach efficiently solves the problem by first finding the LCA, which allows us to break down the problem into two smaller subproblems, significantly reducing the search space.