The problem requires finding candidates who meet two criteria: at least two years of experience and a total interview score greater than 15. The solution uses SQL to efficiently achieve this.
Join Tables: The Candidates
and Rounds
tables are joined using the interview_id
as the common key. This combines the candidate's information (including years of experience) with their interview scores. USING (interview_id)
is a shorthand for ON Candidates.interview_id = Rounds.interview_id
.
Filter by Experience: A WHERE
clause filters the joined data, keeping only rows where years_of_exp
is greater than or equal to 2.
Group and Sum Scores: The GROUP BY 1
(or GROUP BY candidate_id
) clause groups the rows by candidate ID. The SUM(score)
aggregate function calculates the total score for each candidate within their group. 1
is a shorthand for the first column in the SELECT
statement, which is candidate_id
.
Filter by Total Score: A HAVING
clause filters the grouped results, keeping only candidates whose total score (SUM(score)
) is strictly greater than 15.
Select Candidate IDs: Finally, the SELECT candidate_id
statement returns the IDs of the candidates who satisfy both criteria.
The time complexity of this SQL query is dominated by the GROUP BY
operation. In general, GROUP BY
operations have a time complexity of O(N log N) or O(N) depending on the specific database system's optimization techniques, where N is the number of rows in the joined table after the initial filtering. The other operations (joining, filtering) generally have linear time complexity, O(N). Therefore, the overall time complexity is approximately O(N log N) or O(N), depending on the database implementation.
The space complexity depends on the intermediate results generated during the query execution. The space used is proportional to the size of the joined table, the size of the grouped results, and the size of the final output. In the worst-case scenario where all candidates meet the initial criteria, the space complexity could be O(N), where N is the number of rows in the joined table. However, given the filtering steps, the actual space used will generally be less than O(N). Therefore, the space complexity is approximately O(N) in the worst case.
The SQL query provided above works for MySQL. The Pandas implementation offers a similar functionality in Python.
SELECT candidate_id
FROM
Candidates
JOIN Rounds USING (interview_id)
WHERE years_of_exp >= 2
GROUP BY 1
HAVING SUM(score) > 15;
import pandas as pd
def accepted_candidates(candidates: pd.DataFrame, rounds: pd.DataFrame) -> pd.DataFrame:
merged_df = pd.merge(candidates, rounds, on="interview_id")
filtered_df = merged_df[merged_df["years_of_exp"] >= 2]
grouped_df = filtered_df.groupby("candidate_id")["score"].sum().reset_index()
result_df = grouped_df[grouped_df["score"] > 15][["candidate_id"]]
return result_df
The Pandas solution mirrors the SQL approach: it merges the dataframes, filters by experience, groups by candidate ID to sum scores, and then filters by the total score. The time and space complexity analysis for the Pandas solution is similar to the SQL query, with the time complexity being dominated by the groupby
operation (which is approximately O(N log N) in Pandas as well).