{x}
blog image

Data Stream as Disjoint Intervals

Given a data stream input of non-negative integers a1, a2, ..., an, summarize the numbers seen so far as a list of disjoint intervals.

Implement the SummaryRanges class:

  • SummaryRanges() Initializes the object with an empty stream.
  • void addNum(int value) Adds the integer value to the stream.
  • int[][] getIntervals() Returns a summary of the integers in the stream currently as a list of disjoint intervals [starti, endi]. The answer should be sorted by starti.

 

Example 1:

Input
["SummaryRanges", "addNum", "getIntervals", "addNum", "getIntervals", "addNum", "getIntervals", "addNum", "getIntervals", "addNum", "getIntervals"]
[[], [1], [], [3], [], [7], [], [2], [], [6], []]
Output
[null, null, [[1, 1]], null, [[1, 1], [3, 3]], null, [[1, 1], [3, 3], [7, 7]], null, [[1, 3], [7, 7]], null, [[1, 3], [6, 7]]]

Explanation
SummaryRanges summaryRanges = new SummaryRanges();
summaryRanges.addNum(1);      // arr = [1]
summaryRanges.getIntervals(); // return [[1, 1]]
summaryRanges.addNum(3);      // arr = [1, 3]
summaryRanges.getIntervals(); // return [[1, 1], [3, 3]]
summaryRanges.addNum(7);      // arr = [1, 3, 7]
summaryRanges.getIntervals(); // return [[1, 1], [3, 3], [7, 7]]
summaryRanges.addNum(2);      // arr = [1, 2, 3, 7]
summaryRanges.getIntervals(); // return [[1, 3], [7, 7]]
summaryRanges.addNum(6);      // arr = [1, 2, 3, 6, 7]
summaryRanges.getIntervals(); // return [[1, 3], [6, 7]]

 

Constraints:

  • 0 <= value <= 104
  • At most 3 * 104 calls will be made to addNum and getIntervals.
  • At most 102 calls will be made to getIntervals.

 

Follow up: What if there are lots of merges and the number of disjoint intervals is small compared to the size of the data stream?

Solution Explanation

This problem requires designing a data structure that efficiently handles adding numbers to a stream and returning the summarized disjoint intervals. The optimal approach utilizes a TreeMap (or SortedDict in Python) to store the intervals. A TreeMap maintains sorted order, making it easy to find overlapping intervals.

Algorithm:

  1. Initialization: The SummaryRanges class is initialized with an empty TreeMap (or SortedDict). The key in the map is the starting number of an interval, and the value is a 2-element array/list representing the [start, end] of the interval.

  2. addNum(val): This method adds a number val to the stream. The core logic involves finding the appropriate position of val within the existing intervals in the TreeMap:

    • Find Adjacent Intervals: Using floorKey() and ceilingKey() (or equivalent methods in Python's SortedDict), we locate the intervals immediately before and after val in the sorted map.

    • Merge Overlapping Intervals: If val falls between existing intervals (l and r) and creates an overlap, merge them into a single, larger interval.

    • Extend Existing Intervals: If val is adjacent to an existing interval (either before or after), extend that interval to include val.

    • Add New Interval: If val doesn't overlap or extend any existing interval, add a new entry [val, val] to the TreeMap.

  3. getIntervals(): This method simply retrieves and returns the intervals stored in the TreeMap as a 2D array (or list of lists). Because the TreeMap is sorted by the start of each interval, the output is already sorted as required.

Time Complexity Analysis:

  • addNum(val): The floorKey() and ceilingKey() operations on a TreeMap have a time complexity of O(log n), where n is the number of intervals. All other operations within addNum (comparisons, merges, updates) are O(1). Therefore, the overall time complexity of addNum is O(log n).

  • getIntervals(): Iterating through the TreeMap and copying its values takes O(n) time, where n is the number of intervals. Therefore, the time complexity of getIntervals is O(n).

Space Complexity Analysis:

The space complexity is O(n) in the worst case, where n is the number of intervals. This is because we store all the intervals in the TreeMap. In the best-case scenario (where the numbers are added such that no merging happens), the space complexity would be O(n).

Code Examples:

The provided code examples demonstrate the implementation in Python, Java, and C++. They all follow the same algorithm using their respective sorted map implementations. The use of SortedDict in python adds some complexity because it is a third party library and not the in built library. If you prefer to use built-in libraries, you can use a list and sort it before returning the intervals but this would change the time complexity of the getIntervals function to O(nlogn).

The core idea is to leverage the efficient sorted map data structures to minimize the time spent searching for intervals, leading to a logarithmic time complexity for adding numbers. The space used scales linearly with the number of disjoint intervals.