Given a text file file.txt
that contains a list of phone numbers (one per line), write a one-liner bash script to print all valid phone numbers.
You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx. (x means a digit)
You may also assume each line in the text file must not contain leading or trailing white spaces.
Example:
Assume that file.txt
has the following content:
987-123-4567 123 456 7890 (123) 456-7890
Your script should output the following valid phone numbers:
987-123-4567 (123) 456-7890
This problem requires filtering lines from a text file (file.txt
) to only include those representing valid phone numbers. Valid numbers adhere to either of two formats: (xxx) xxx-xxxx
or xxx-xxx-xxxx
, where x
represents a digit. The solution uses awk
, a powerful text processing tool in Unix-like systems.
The core of the solution lies in the awk
command's regular expression matching. The script uses a single awk
command to achieve the desired filtering.
The regular expression /^([0-9]{3}-|\([0-9]{3}\) )[0-9]{3}-[0-9]{4}$/
is the key component. Let's break it down:
^
: Matches the beginning of the line. This ensures the entire line matches the pattern, preventing partial matches.([0-9]{3}-|\([0-9]{3}\) )
: This part matches the first three digits, either with a hyphen (-
) or enclosed in parentheses with a space.
[0-9]{3}
: Matches exactly three digits (0-9).-
: Matches a literal hyphen.|
: Acts as an "or" operator.\(\)
: Matches literal parentheses (escaped because parentheses have special meaning in regular expressions).
: Matches a space (only after the parenthesized version).[0-9]{3}-
: Matches three digits followed by a hyphen.[0-9]{4}
: Matches exactly four digits.$
: Matches the end of the line.The entire expression matches lines that start with either xxx-
or (xxx)
, followed by xxx-xxxx
.
awk '/pattern/' file.txt
prints only the lines of file.txt
that match the given pattern. In this case the pattern is the regular expression above, filtering for valid phone numbers.
awk '/^([0-9]{3}-|\([0-9]{3}\) )[0-9]{3}-[0-9]{4}$/' file.txt
awk
iterates through each line once to check for the pattern match. The regular expression matching itself has a time complexity that depends on the regex engine, but in practice, it's generally efficient for relatively simple expressions like this one.awk
is relatively constant regardless of the file size, as it processes each line individually. The memory used is primarily for storing the regular expression and a few variables. It doesn't store the entire file in memory.While awk
provides a concise solution, other approaches are possible, albeit less efficient or elegant:
grep
with multiple expressions: You could use two separate grep
commands, one for each phone number format, and combine the output. This would be less concise than the awk
solution.awk
solution. However, more complex validation logic would be easier to implement in a higher-level language.The awk
one-liner is the most efficient and elegant solution for this problem due to its built-in regular expression capabilities and streamlined text processing.