Given an array of integers citations
where citations[i]
is the number of citations a researcher received for their ith
paper, return the researcher's h-index.
According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h
such that the given researcher has published at least h
papers that have each been cited at least h
times.
Example 1:
Input: citations = [3,0,6,1,5] Output: 3 Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.
Example 2:
Input: citations = [1,3,1] Output: 1
Constraints:
n == citations.length
1 <= n <= 5000
0 <= citations[i] <= 1000
This problem asks to find the h-index of a researcher given an array of citations for their papers. The h-index is the maximum value h
such that the researcher has at least h
papers with at least h
citations each.
We'll explore three solutions with varying time and space complexities.
This approach sorts the citations in descending order. Then, it iterates through the sorted array. For each index h
, it checks if the citation at that index (citations[h-1]
) is greater than or equal to h
. If it is, then h
is the h-index because at least h
papers have at least h
citations.
Time Complexity: O(n log n) due to sorting. Space Complexity: O(log n) for sorting (in-place sorting algorithms like merge sort or quicksort have logarithmic space complexity).
Code (Python):
class Solution:
def hIndex(self, citations: List[int]) -> int:
citations.sort(reverse=True)
for h in range(len(citations), 0, -1):
if citations[h - 1] >= h:
return h
return 0
Code (Java):
import java.util.Arrays;
class Solution {
public int hIndex(int[] citations) {
Arrays.sort(citations);
int n = citations.length;
for (int h = n; h > 0; --h) {
if (citations[n - h] >= h) {
return h;
}
}
return 0;
}
}
This method uses a counting array to count the frequency of citations. It then iterates through the counting array from the end, accumulating the count of papers with at least h
citations. If the accumulated count is greater than or equal to h
, then h
is the h-index.
Time Complexity: O(n) - Single pass through citations and a single pass through the counting array. Space Complexity: O(n) to store the counting array.
Code (Python):
class Solution:
def hIndex(self, citations: List[int]) -> int:
n = len(citations)
cnt = [0] * (n + 1)
for x in citations:
cnt[min(x, n)] += 1
s = 0
for h in range(n, -1, -1):
s += cnt[h]
if s >= h:
return h
Code (Java):
class Solution {
public int hIndex(int[] citations) {
int n = citations.length;
int[] cnt = new int[n + 1];
for (int x : citations) {
cnt[Math.min(x, n)]++;
}
int s = 0;
for (int h = n; h >= 0; h--) {
s += cnt[h];
if (s >= h) return h;
}
return 0;
}
}
This approach uses binary search to efficiently find the h-index. It leverages the fact that if there's an h-index of h
, then all values less than h
are also valid h-indices (though not the maximum h-index).
Time Complexity: O(n log n) because of the repeated counting in the binary search loop. Each counting step takes O(n) time. Space Complexity: O(1) - Constant extra space.
Code (Python):
class Solution:
def hIndex(self, citations: List[int]) -> int:
l, r = 0, len(citations)
while l < r:
mid = (l + r + 1) // 2
if sum(1 for x in citations if x >= mid) >= mid:
l = mid
else:
r = mid - 1
return l
Code (Java):
class Solution {
public int hIndex(int[] citations) {
int l = 0, r = citations.length;
while (l < r) {
int mid = (l + r + 1) / 2;
int count = 0;
for (int citation : citations) {
if (citation >= mid) {
count++;
}
}
if (count >= mid) {
l = mid;
} else {
r = mid - 1;
}
}
return l;
}
}
In summary, the counting sort approach (Solution 2) offers the best time complexity (O(n)) if space is not a major constraint. If space is a concern or the input size is very large, the sorting approach (Solution 1) might be preferable due to its better space efficiency. Binary Search (Solution 3) provides a compromise but doesn't offer significant performance advantages over the other methods in practice.