{x}
blog image

Find Median from Data Stream

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.

 

Example 1:

Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

 

Constraints:

  • -105 <= num <= 105
  • There will be at least one element in the data structure before calling findMedian.
  • At most 5 * 104 calls will be made to addNum and findMedian.

 

Follow up:

  • If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
  • If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?

Solution: Min Heap and Max Heap

This problem can be efficiently solved using two heaps: a min-heap and a max-heap. The min-heap (minQ) will store the larger half of the numbers added, and the max-heap (maxQ) will store the smaller half. This setup allows us to quickly find the median.

Algorithm:

  1. Initialization: Create an empty min-heap (minQ) and an empty max-heap (maxQ).

  2. addNum(num):

    • Add the new number num to the max-heap (maxQ).
    • If the size of maxQ becomes greater than the size of minQ by more than 1, move the largest element from maxQ to minQ.
    • If the size of minQ becomes greater than the size of maxQ, move the smallest element from minQ to maxQ. This ensures that the heaps remain balanced, with their sizes differing by at most 1.
  3. findMedian():

    • If the sizes of minQ and maxQ are equal, the median is the average of the top elements of both heaps.
    • If the size of minQ is greater than the size of maxQ, the median is the top element of minQ.

Time Complexity:

  • addNum(num): O(log n), where n is the number of elements added so far. Heap operations (insertion and deletion) take logarithmic time.
  • findMedian(): O(1). Accessing the top element of a heap is a constant-time operation.

Space Complexity: O(n), as we store all the numbers added in the heaps.

Code Examples:

The code examples below demonstrate the implementation in various programming languages. Note that the specific heap implementation might vary depending on the language's standard library (e.g., PriorityQueue in Java, heapq in Python).

Python:

import heapq
 
class MedianFinder:
    def __init__(self):
        self.min_heap = []  # Min-heap to store larger half
        self.max_heap = []  # Max-heap to store smaller half
 
    def addNum(self, num: int) -> None:
        heapq.heappush(self.max_heap, -num)  # Use negative for max-heap
        if len(self.max_heap) > len(self.min_heap) + 1:
            heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
        if len(self.min_heap) > len(self.max_heap):
            heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))
 
    def findMedian(self) -> float:
        if len(self.min_heap) == len(self.max_heap):
            return (-self.max_heap[0] + self.min_heap[0]) / 2
        else:
            return -self.max_heap[0]

Java:

import java.util.PriorityQueue;
import java.util.Collections;
 
class MedianFinder {
    private PriorityQueue<Integer> minHeap;
    private PriorityQueue<Integer> maxHeap;
 
    public MedianFinder() {
        minHeap = new PriorityQueue<>();
        maxHeap = new PriorityQueue<>(Collections.reverseOrder());
    }
 
    public void addNum(int num) {
        maxHeap.offer(num);
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.offer(maxHeap.poll());
        }
        if (minHeap.size() > maxHeap.size()) {
            maxHeap.offer(minHeap.poll());
        }
    }
 
    public double findMedian() {
        if (maxHeap.size() == minHeap.size()) {
            return (double)(maxHeap.peek() + minHeap.peek()) / 2;
        } else {
            return (double)maxHeap.peek();
        }
    }
}

(Other languages would follow a similar structure, using their respective heap implementations.)

This approach provides an efficient and elegant solution to the problem of finding the median from a data stream. The use of two heaps keeps the data balanced and allows for logarithmic time complexity for addNum operations.