Duplicate Emails

Table: Person +-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters. Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL. Return the result table in any order . The result format is in the following example. Example 1: Input: Person table: +----+---------+ | id | email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ Output: +---------+ | Email | +---------+ | a@b.com | +---------+ Explanation: a@b.com is repeated two times.

Solution Explanation for Duplicate Emails The problem asks to find and report all duplicate emails from a Person table. Two efficient approaches are presented: using GROUP BY and HAVING and using a self-join. Approach 1: Using GROUP BY and HAVING This approach leverages SQL's aggregate functions. 1. GROUP BY email: This clause groups the rows in the Person table based on the values in the email column. All rows with the same email address will be grouped together. 2. COUNT(1): This counts the number of rows within each group (i.e., the number of times each email address appears). COUNT(1) is equivalent to COUNT(*) and is slightly more efficient in some database systems. 3. HAVING COUNT(1) > 1: This clause filters the groups. Only groups (email addresses) that have a count greater than 1 (meaning they appear more than once) are included in the result. 4. SELECT email: This selects the email column from the groups that satisfy the HAVING condition. This gives the list of duplicate emails. Code: MySQL: SELECT email FROM Person GROUP BY email HAVING COUNT(*) > 1;:root {--copy-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 48 48'%3E%3Cpath fill='%23adadad' d='M16.187 9.5H12.25a1.75 1.75 0 0 0-1.75 1.75v28.5c0 .967.784 1.75 1.75 1.75h23.5a1.75 1.75 0 0 0 1.75-1.75v-28.5a1.75 1.75 0 0 0-1.75-1.75h-3.937a4.25 4.25 0 0 1-4.063 3h-7.5a4.25 4.25 0 0 1-4.063-3M31.813 7h3.937A4.25 4.25 0 0 1 40 11.25v28.5A4.25 4.25 0 0 1 35.75 44h-23.5A4.25 4.25 0 0 1 8 39.75v-28.5A4.25 4.25 0 0 1 12.25 7h3.937a4.25 4.25 0 0 1 4.063-3h7.5a4.25 4.25 0 0 1 4.063 3M18.5 8.25c0 .966.784 1.75 1.75 1.75h7.5a1.75 1.75 0 1 0 0-3.5h-7.5a1.75 1.75 0 0 0-1.75 1.75'/%3E%3C/svg%3E");--success-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath fill='%2366ff85' d='M9 16.17L5.53 12.7a.996.996 0 1 0-1.41 1.41l4.18 4.18c.39.39 1.02.39 1.41 0L20.29 7.71a.996.996 0 1 0-1.41-1.41z'/%3E%3C/svg%3E");}pre:has(code) {position: relative;}pre button.rehype-pretty-copy {right: 1px;padding: 0;width: 24px;height: 24px;display: flex;margin-top: 2px;margin-right: 8px;position: absolute;border-radius: 25%;backdrop-filter: blur(3px);& span {width: 100%;aspect-ratio: 1 / 1;}& .ready {background-image: var(--copy-icon);}& .success {display: none; background-image: var(--success-icon);}}&.rehype-pretty-copied {& .success {display: block;} & .ready {display: none;}}pre button.rehype-pretty-copy.rehype-pretty-copied {opacity: 1;& .ready { display: none; }& .success { display: block; }} Python (using pandas): import pandas as pd def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame: return person[person.duplicated(subset=['email'], keep=False)]['email'].drop_duplicates() :root {--copy-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 48 48'%3E%3Cpath fill='%23adadad' d='M16.187 9.5H12.25a1.75 1.75 0 0 0-1.75 1.75v28.5c0 .967.784 1.75 1.75 1.75h23.5a1.75 1.75 0 0 0 1.75-1.75v-28.5a1.75 1.75 0 0 0-1.75-1.75h-3.937a4.25 4.25 0 0 1-4.063 3h-7.5a4.25 4.25 0 0 1-4.063-3M31.813 7h3.937A4.25 4.25 0 0 1 40 11.25v28.5A4.25 4.25 0 0 1 35.75 44h-23.5A4.25 4.25 0 0 1 8 39.75v-28.5A4.25 4.25 0 0 1 12.25 7h3.937a4.25 4.25 0 0 1 4.063-3h7.5a4.25 4.25 0 0 1 4.063 3M18.5 8.25c0 .966.784 1.75 1.75 1.75h7.5a1.75 1.75 0 1 0 0-3.5h-7.5a1.75 1.75 0 0 0-1.75 1.75'/%3E%3C/svg%3E");--success-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath fill='%2366ff85' d='M9 16.17L5.53 12.7a.996.996 0 1 0-1.41 1.41l4.18 4.18c.39.39 1.02.39 1.41 0L20.29 7.71a.996.996 0 1 0-1.41-1.41z'/%3E%3C/svg%3E");}pre:has(code) {position: relative;}pre button.rehype-pretty-copy {right: 1px;padding: 0;width: 24px;height: 24px;display: flex;margin-top: 2px;margin-right: 8px;position: absolute;border-radius: 25%;backdrop-filter: blur(3px);& span {width: 100%;aspect-ratio: 1 / 1;}& .ready {background-image: var(--copy-icon);}& .success {display: none; background-image: var(--success-icon);}}&.rehype-pretty-copied {& .success {display: block;} & .ready {display: none;}}pre button.rehype-pretty-copy.rehype-pretty-copied {opacity: 1;& .ready { display: none; }& .success { display: block; }} The pandas solution first identifies all rows with duplicate emails using duplicated(subset=['email'], keep=False). keep=False marks all occurrences of duplicates. Then it selects the 'email' column and removes any remaining duplicates using .drop_duplicates(). Time Complexity: The GROUP BY and HAVING approach has a time complexity of O(N log N) in the worst case, where N is the number of rows in the Person table due to the sorting inherent in the GROUP BY operation. However, many database systems optimize GROUP BY significantly, making the actual performance much faster in practice. Approach 2: Using a Self-Join This approach uses a self-join to compare each row with every other row. 1. SELECT DISTINCT p1.email: This selects the email address from the first instance of the person table, ensuring only unique email addresses are returned. 2. FROM person AS p1, person AS p2: This creates a self-join, aliasing the Person table as p1 and p2. This creates a Cartesian product, effectively comparing every row in p1 with every row in p2. 3. WHERE p1.id != p2.id AND p1.email = p2.email: This is the crucial filtering condition. It ensures that: * p1.id != p2.id: We're not comparing a row to itself. * p1.email = p2.email: We're only considering rows with the same email address. Code (MySQL): SELECT DISTINCT p1.email FROM person AS p1, person AS p2 WHERE p1.id != p2.id AND p1.email = p2.email;:root {--copy-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 48 48'%3E%3Cpath fill='%23adadad' d='M16.187 9.5H12.25a1.75 1.75 0 0 0-1.75 1.75v28.5c0 .967.784 1.75 1.75 1.75h23.5a1.75 1.75 0 0 0 1.75-1.75v-28.5a1.75 1.75 0 0 0-1.75-1.75h-3.937a4.25 4.25 0 0 1-4.063 3h-7.5a4.25 4.25 0 0 1-4.063-3M31.813 7h3.937A4.25 4.25 0 0 1 40 11.25v28.5A4.25 4.25 0 0 1 35.75 44h-23.5A4.25 4.25 0 0 1 8 39.75v-28.5A4.25 4.25 0 0 1 12.25 7h3.937a4.25 4.25 0 0 1 4.063-3h7.5a4.25 4.25 0 0 1 4.063 3M18.5 8.25c0 .966.784 1.75 1.75 1.75h7.5a1.75 1.75 0 1 0 0-3.5h-7.5a1.75 1.75 0 0 0-1.75 1.75'/%3E%3C/svg%3E");--success-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath fill='%2366ff85' d='M9 16.17L5.53 12.7a.996.996 0 1 0-1.41 1.41l4.18 4.18c.39.39 1.02.39 1.41 0L20.29 7.71a.996.996 0 1 0-1.41-1.41z'/%3E%3C/svg%3E");}pre:has(code) {position: relative;}pre button.rehype-pretty-copy {right: 1px;padding: 0;width: 24px;height: 24px;display: flex;margin-top: 2px;margin-right: 8px;position: absolute;border-radius: 25%;backdrop-filter: blur(3px);& span {width: 100%;aspect-ratio: 1 / 1;}& .ready {background-image: var(--copy-icon);}& .success {display: none; background-image: var(--success-icon);}}&.rehype-pretty-copied {& .success {display: block;} & .ready {display: none;}}pre button.rehype-pretty-copy.rehype-pretty-copied {opacity: 1;& .ready { display: none; }& .success { display: block; }} Time Complexity: The self-join approach has a time complexity of O(N^2) in the worst case, where N is the number of rows in the Person table. This is because it compares every row with every other row. This approach is less efficient than the GROUP BY approach for large datasets. Summary: | Approach | Time Complexity | Space Complexity | Notes | |----------------|-----------------|-------------------|----------------------------------------------| | GROUP BY | O(N log N) | O(N) | More efficient for large datasets | | Self-Join | O(N^2) | O(N) | Less efficient, but conceptually simpler | The GROUP BY and HAVING approach is generally preferred for its better time complexity, especially when dealing with large datasets. However, the self-join approach can be easier to understand for those less familiar with aggregate functions.

Also Explore

DSA Questions

Dungeon Game

DSA Questions

Combine Two Tables

DSA Questions

Second Highest Salary

DSA Questions

Nth Highest Salary

DSA Questions

Rank Scores

DSA Questions

Largest Number

DSA Questions

Consecutive Numbers

DSA Questions

Employees Earning More Than Their Managers

DSA Questions

Duplicate Emails

DSA Questions

Customers Who Never Order

DSA Questions

Department Highest Salary

DSA Questions

Department Top Three Salaries

DSA Questions

Reverse Words in a String II

DSA Questions

Repeated DNA Sequences

DSA Questions

Best Time to Buy and Sell Stock IV

DSA Questions

Rotate Array

DSA Questions

Duplicate Emails

Solution Explanation for Duplicate Emails

Approach 1: Using `GROUP BY` and `HAVING`

Approach 2: Using a Self-Join

On This Page

Also Explore

Dungeon Game

Combine Two Tables

Second Highest Salary

Nth Highest Salary

Rank Scores

Largest Number

Consecutive Numbers

Employees Earning More Than Their Managers

Duplicate Emails

Customers Who Never Order

Department Highest Salary

Department Top Three Salaries

Reverse Words in a String II

Repeated DNA Sequences

Best Time to Buy and Sell Stock IV

Rotate Array

Reverse Bits