{x}
blog image

Design Excel Sum Formula

Solution Explanation for Design Excel Sum Formula

This problem requires designing an Excel-like spreadsheet functionality, specifically implementing set, get, and sum operations. The sum operation is the most complex, as it involves handling both single cell references and ranges of cells. The challenge lies in efficiently updating the spreadsheet when a cell's value changes due to a set operation or when a sum formula overlaps with a set operation.

Approach

We can represent the spreadsheet as a 2D array (matrix). To handle sum formulas efficiently, we'll use a separate data structure to track dependencies. This will help us avoid recalculating the entire spreadsheet every time a cell's value changes.

  1. Data Structures:

    • A 2D array (mat) to store the cell values.
    • A dictionary (formulas) to store sum formulas. The keys will be tuples representing cell coordinates (row, col), and values will be lists of cell references (strings).
    • A dictionary (dependencies) to store dependencies between cells. This is a crucial part for efficiently handling updates.
  2. __init__ (Constructor):

    • Initializes the mat array with zeros.
    • Initializes formulas and dependencies as empty dictionaries.
  3. set(row, column, val):

    • Updates mat[row][column] with val.
    • If there's a formula in formulas for this cell, remove it (this cell is now a direct value).
    • Update all cells dependent on (row, column) by recalculating their values using their associated formulas.
  4. get(row, column):

    • Returns mat[row][column].
  5. sum(row, column, numbers):

    • Stores the list of cell references numbers in formulas[(row, column)].
    • Parses the cell references to identify individual cells and ranges.
    • Calculates the sum of the cells and updates mat[row][column].
    • Adds dependencies from cells involved in the sum formula to (row, column).
  6. Helper Functions:

    • parse_cell_ref(cell_ref): Parses a cell reference (e.g., "A1" or "A1:B2") into a list of cell coordinates.
    • calculate_sum(cell_coords): Calculates the sum of values of specified cells.
  7. Dependency Tracking: When a sum formula is added for a cell (e.g., sum(3, 'C', ['A1', 'A1:B2'])), it adds dependencies from A1, A1:B2 to (3, 'C'). This allows for efficient updates because, when A1 or a cell in the range A1:B2 is changed, only the cell (3, 'C') and its dependents need to be recalculated.

Code (Python)

class Excel:
    def __init__(self, height: int, width: str):
        self.height = height
        self.width = ord(width) - ord('A') + 1
        self.mat = [[0] * self.width for _ in range(height)]
        self.formulas = {}
        self.dependencies = {}
 
    def parse_cell_ref(self, cell_ref):
        if ':' not in cell_ref:
            col = ord(cell_ref[0]) - ord('A')
            row = int(cell_ref[1:]) - 1
            return [(row, col)]
        else:
            start, end = cell_ref.split(':')
            start_col = ord(start[0]) - ord('A')
            start_row = int(start[1:]) - 1
            end_col = ord(end[0]) - ord('A')
            end_row = int(end[1:]) - 1
            coords = []
            for r in range(start_row, end_row + 1):
                for c in range(start_col, end_col + 1):
                    coords.append((r, c))
            return coords
 
    def calculate_sum(self, cell_coords):
        total = 0
        for r, c in cell_coords:
            total += self.mat[r][c]
        return total
 
    def set(self, row: int, column: str, val: int) -> None:
        col = ord(column) - ord('A')
        row -= 1
        self.mat[row][col] = val
        if (row, col) in self.formulas:
            del self.formulas[(row, col)]
            
        #Update dependents:
        if (row, col) in self.dependencies:
            for dep_row, dep_col in self.dependencies[(row, col)]:
                self.calculate_and_set_sum(dep_row,dep_col)
 
 
    def get(self, row: int, column: str) -> int:
        col = ord(column) - ord('A')
        row -= 1
        return self.mat[row][col]
 
    def calculate_and_set_sum(self, row, col):
        if (row, col) in self.formulas:
            cell_coords = []
            for cell_ref in self.formulas[(row, col)]:
                cell_coords.extend(self.parse_cell_ref(cell_ref))
            self.mat[row][col] = self.calculate_sum(cell_coords)
            
 
    def sum(self, row: int, column: str, numbers: list[str]) -> int:
        col = ord(column) - ord('A')
        row -= 1
        self.formulas[(row, col)] = numbers
        
        cell_coords = []
        for cell_ref in numbers:
            cell_coords.extend(self.parse_cell_ref(cell_ref))
        
        self.mat[row][col] = self.calculate_sum(cell_coords)
        
        #Update dependencies:
        self.dependencies[(row, col)] = set()
        for r, c in cell_coords:
            if (r, c) not in self.dependencies:
                self.dependencies[(r, c)] = set()
            self.dependencies[(r,c)].add((row, col))
 
        return self.mat[row][col]
 
 

Time and Space Complexity

  • Time Complexity:
    • set: O(N), where N is the number of cells affected by the update (worst case O(H*W), H - height, W - width) due to formula recalculation.
    • get: O(1)
    • sum: O(M), where M is the total number of cells referenced in the sum formula. Dependency tracking helps limit the number of updates needed during subsequent changes.
  • Space Complexity: O(HW) to store the matrix and potentially O(HW) for the formulas and dependencies dictionaries in the worst case. In practice, it will depend on the number of formulas and dependencies.

This detailed explanation and the Python code provide a comprehensive solution to the Design Excel Sum Formula problem. The use of a dependency tracking system is key to achieving reasonable performance for large spreadsheets. Adaptations to other languages like Java, C++, or Go would follow a similar approach with corresponding data structure implementations.