{x}
blog image

Valid Phone Numbers

Given a text file file.txt that contains a list of phone numbers (one per line), write a one-liner bash script to print all valid phone numbers.

You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx. (x means a digit)

You may also assume each line in the text file must not contain leading or trailing white spaces.

Example:

Assume that file.txt has the following content:

987-123-4567
123 456 7890
(123) 456-7890

Your script should output the following valid phone numbers:

987-123-4567
(123) 456-7890

Solution Explanation for Valid Phone Numbers

This problem requires filtering lines from a text file (file.txt) to only include those representing valid phone numbers. Valid numbers adhere to either of two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx, where x represents a digit. The solution uses awk, a powerful text processing tool in Unix-like systems.

Approach

The core of the solution lies in the awk command's regular expression matching. The script uses a single awk command to achieve the desired filtering.

The regular expression /^([0-9]{3}-|\([0-9]{3}\) )[0-9]{3}-[0-9]{4}$/ is the key component. Let's break it down:

  • ^: Matches the beginning of the line. This ensures the entire line matches the pattern, preventing partial matches.
  • ([0-9]{3}-|\([0-9]{3}\) ): This part matches the first three digits, either with a hyphen (-) or enclosed in parentheses with a space.
    • [0-9]{3}: Matches exactly three digits (0-9).
    • -: Matches a literal hyphen.
    • |: Acts as an "or" operator.
    • \(\): Matches literal parentheses (escaped because parentheses have special meaning in regular expressions).
    • : Matches a space (only after the parenthesized version).
  • [0-9]{3}-: Matches three digits followed by a hyphen.
  • [0-9]{4}: Matches exactly four digits.
  • $: Matches the end of the line.

The entire expression matches lines that start with either xxx- or (xxx) , followed by xxx-xxxx.

awk '/pattern/' file.txt prints only the lines of file.txt that match the given pattern. In this case the pattern is the regular expression above, filtering for valid phone numbers.

Code (Shell - awk)

awk '/^([0-9]{3}-|\([0-9]{3}\) )[0-9]{3}-[0-9]{4}$/' file.txt

Time and Space Complexity

  • Time Complexity: O(n), where n is the number of lines in the file. awk iterates through each line once to check for the pattern match. The regular expression matching itself has a time complexity that depends on the regex engine, but in practice, it's generally efficient for relatively simple expressions like this one.
  • Space Complexity: O(1). The space used by awk is relatively constant regardless of the file size, as it processes each line individually. The memory used is primarily for storing the regular expression and a few variables. It doesn't store the entire file in memory.

Alternative Approaches (Conceptual)

While awk provides a concise solution, other approaches are possible, albeit less efficient or elegant:

  • Using grep with multiple expressions: You could use two separate grep commands, one for each phone number format, and combine the output. This would be less concise than the awk solution.
  • Custom scripting (e.g., Python, Perl): You could write a script in a language like Python or Perl to read the file line by line, using regular expression matching for validation. This would likely be more verbose than the awk solution. However, more complex validation logic would be easier to implement in a higher-level language.

The awk one-liner is the most efficient and elegant solution for this problem due to its built-in regular expression capabilities and streamlined text processing.