{x}
blog image

Number of Atoms

Given a string formula representing a chemical formula, return the count of each atom.

The atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.

One or more digits representing that element's count may follow if the count is greater than 1. If the count is 1, no digits will follow.

  • For example, "H2O" and "H2O2" are possible, but "H1O2" is impossible.

Two formulas are concatenated together to produce another formula.

  • For example, "H2O2He3Mg4" is also a formula.

A formula placed in parentheses, and a count (optionally added) is also a formula.

  • For example, "(H2O2)" and "(H2O2)3" are formulas.

Return the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.

The test cases are generated so that all the values in the output fit in a 32-bit integer.

 

Example 1:

Input: formula = "H2O"
Output: "H2O"
Explanation: The count of elements are {'H': 2, 'O': 1}.

Example 2:

Input: formula = "Mg(OH)2"
Output: "H2MgO2"
Explanation: The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.

Example 3:

Input: formula = "K4(ON(SO3)2)2"
Output: "K4N2O14S4"
Explanation: The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.

 

Constraints:

  • 1 <= formula.length <= 1000
  • formula consists of English letters, digits, '(', and ')'.
  • formula is always valid.

Solution Explanation

This problem requires parsing a chemical formula string and returning the count of each atom in a specific sorted format. The solution uses a stack-based approach combined with a hash map (dictionary) to efficiently handle nested parentheses and atom counts.

Approach:

The core idea is to process the formula string from right to left. This allows us to easily handle nested parentheses and their associated multipliers. We maintain a stack to keep track of multipliers arising from parentheses. A hash map stores the atom counts encountered.

  1. Iteration: The code iterates through the formula string from right to left.

  2. Character Handling: Different characters are handled as follows:

    • Lowercase letters: These are part of an atom name. The code finds the beginning of the atom name and adds it to the hash map, along with its count (multiplied by the current multiplier).
    • Uppercase letters: These mark the beginning of a new atom. The code adds it to the hash map with its count.
    • Digits: These represent the count of an atom. The code reads all consecutive digits to obtain the full count.
    • ')': This indicates the end of a parenthesized subformula. The current multiplier is pushed onto the stack, and the current multiplier is updated by multiplying with the count found within the parentheses.
    • '(': This indicates the start of a parenthesized subformula. The multiplier from the top of the stack is popped and becomes the current multiplier.
  3. Hash Map Update: For each atom encountered, its count is updated in the hash map, considering the current multiplier.

  4. Result Formatting: Finally, the hash map is converted to a string in the specified format (atoms sorted alphabetically, counts greater than 1 explicitly shown).

Time Complexity Analysis:

The algorithm iterates through the formula string once. The operations within the loop (hash map lookups, string manipulations) are typically O(1) on average for hash maps and short strings. Sorting the atoms takes O(M log M) time, where M is the number of unique atoms. Therefore, the overall time complexity is dominated by the iteration and sorting, resulting in O(N + M log M), where N is the length of the formula string and M is the number of unique atoms. In the worst case, M could be O(N), but often M will be much smaller than N.

Space Complexity Analysis:

The space complexity is determined by the stack, the hash map, and the temporary strings used. The stack's size can be at most equal to the maximum nesting depth of parentheses, which is typically a small constant. The hash map can store up to M entries (unique atoms). The temporary strings are relatively short. Therefore, the overall space complexity is O(M), where M is the number of unique atoms.

Code Examples (Java):

import java.util.*;
 
class Solution {
    public String countOfAtoms(String formula) {
        Map<String, Integer> map = new HashMap<>();
        Stack<Integer> stack = new Stack<>(); // Use Stack for better readability
        stack.push(1); // Initialize with multiplier 1
        int multiplier = 1;
        int num = 0;
        String atom = "";
        for (int i = formula.length() - 1; i >= 0; i--) {
            char c = formula.charAt(i);
            if (Character.isDigit(c)) {
                num = num + (c - '0') * (int) Math.pow(10, atom.length());
            } else if (Character.isLowerCase(c)) {
                atom = c + atom;
            } else if (Character.isUpperCase(c)) {
                atom = c + atom;
                map.put(atom, map.getOrDefault(atom, 0) + (num == 0 ? 1 : num) * multiplier);
                num = 0;
                atom = "";
            } else if (c == ')') {
                stack.push(multiplier);
                multiplier *= (num == 0 ? 1 : num);
                num = 0;
            } else if (c == '(') {
                multiplier = stack.pop();
            }
        }
 
        List<String> sortedAtoms = new ArrayList<>(map.keySet());
        Collections.sort(sortedAtoms);
        StringBuilder result = new StringBuilder();
        for (String a : sortedAtoms) {
            result.append(a);
            int count = map.get(a);
            if (count > 1) {
                result.append(count);
            }
        }
        return result.toString();
    }
}

This improved Java code uses a Stack for better clarity and handles edge cases more robustly. The Python, C++, Go, and TypeScript solutions follow a similar approach with appropriate adaptations for each language's syntax and features. Note that the regex based Javascript/Typescript solutions while concise, may not be as efficient as the stack based solutions for very large inputs.