This problem requires processing two tables, Keywords
and Posts
, to determine the topic(s) associated with each post based on the keywords present in the post content. The solution involves a database query, specifically using MySQL.
The core idea is to join the Posts
and Keywords
tables based on whether keywords from Keywords
exist (case-insensitively) within the content of the posts from Posts
. Then, we group the results by post_id
to aggregate the topics associated with each post. Finally, we handle the case where no topics are found for a post.
SELECT
post_id,
IFNULL(GROUP_CONCAT(DISTINCT topic_id), 'Ambiguous!') AS topic
FROM
Posts
LEFT JOIN Keywords ON INSTR(CONCAT(' ', content, ' '), CONCAT(' ', word, ' ')) > 0
GROUP BY post_id;
LEFT JOIN Keywords ON INSTR(CONCAT(' ', content, ' '), CONCAT(' ', word, ' ')) > 0
: This is the crucial part. A LEFT JOIN
ensures that all posts are included in the result, even if they don't have matching keywords. The ON
clause uses the INSTR
function to check for the presence of each keyword within the post content. Crucially, we add spaces (' '
) before and after both content
and word
. This handles word boundaries effectively, preventing partial matches. For example, it prevents "war" from matching "warning". INSTR
returns the starting position of the keyword if found; otherwise, it returns 0. The condition > 0
filters for successful matches.
GROUP BY post_id
: This groups the results by post_id
, so we get one row per post.
GROUP_CONCAT(DISTINCT topic_id)
: This aggregates the topic_id
values for each post. The DISTINCT
keyword ensures that we don't have duplicate topic IDs if a post contains multiple keywords belonging to the same topic.
IFNULL(..., 'Ambiguous!')
: This handles the case where no matching keywords are found for a post. IFNULL
checks if GROUP_CONCAT
returned NULL
(meaning no topics were found) and replaces it with 'Ambiguous!' as specified in the problem statement.
The time complexity is dominated by the LEFT JOIN
operation and the INSTR
function calls within it. In the worst case, the INSTR
function might have to compare each keyword against every word in the content of every post. The complexity is therefore roughly O(P * K * C), where:
However, in practice, database optimizers can significantly improve performance. The actual time depends on the size of the datasets and the specific database implementation. The GROUP BY
and GROUP_CONCAT
operations are generally efficient in modern database systems.
While the problem is inherently database-centric, the core logic can be adapted to other programming languages. The basic steps would be:
Keywords
and Posts
data into suitable data structures (e.g., dictionaries or lists of dictionaries in Python).lower()
and in
in Python).topic_id
s.topic_id
s are found, assign "Ambiguous!". Otherwise, sort and format the topic_id
s as required.This approach would have similar time complexity characteristics as the SQL query, depending on the efficiency of the string matching operations used. The exact implementation details would vary based on the chosen language and data structures.