Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return the researcher's h-index.
According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.
Example 1:
Input: citations = [3,0,6,1,5] Output: 3 Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.
Example 2:
Input: citations = [1,3,1] Output: 1
Constraints:
n == citations.length1 <= n <= 50000 <= citations[i] <= 1000Problem Overview: You receive an array where citations[i] represents how many times the i-th paper was cited. The H-Index is the maximum value h such that the researcher has at least h papers with h or more citations. The task is to compute that value efficiently from an unsorted list.
Approach 1: Sorting the Citations (O(n log n) time, O(1) extra space)
The most direct strategy is to sort the citation counts in descending order using a standard sorting algorithm. After sorting, iterate through the array and check the largest index i where citations[i] ≥ i + 1. The insight is that once the papers are ordered by citation count, the position in the array corresponds to how many papers have at least that many citations. During the scan, keep updating the potential H-index until the condition breaks. This approach is easy to implement and reliable in interviews when optimal linear-time tricks are not required.
Approach 2: Counting Sort / Bucket Counting (O(n) time, O(n) space)
The H-index value can never exceed the number of papers n. A paper cited more than n times behaves the same as one cited exactly n times for the purpose of the metric. Use this observation to build a frequency array (bucket counts) of size n + 1. For each citation value, increment count[min(citation, n)]. Then iterate from n down to 0, maintaining a running total of papers that have at least that many citations. The first index where the cumulative count becomes ≥ the index is the H-index. This technique leverages ideas from counting sort and avoids full sorting while scanning the data only a few times.
The bucket method works well because we only care about counts relative to n, not the exact ordering of every element. Each paper contributes to a bucket, and a backward pass determines the largest feasible h. The algorithm performs linear passes through the input and the bucket array, giving true O(n) time complexity.
Both solutions operate on a simple array and rely on either ordering the values or counting their distribution. Sorting favors readability and quick implementation, while counting sort is more algorithmically optimal.
Recommended for interviews: Start with the sorting approach since it clearly demonstrates understanding of the H-index definition and takes only a few lines of code. Once that works, mention the counting bucket optimization that reduces the complexity to O(n). Interviewers typically expect candidates to recognize the bound that the H-index cannot exceed the number of papers and use that to build the linear-time solution.
This approach uses sorting to calculate the h-index. The idea is to sort the array of citations in descending order. Then, find the maximum number h such that there are h papers with at least h citations. This can be efficiently determined by iterating over the sorted array.
The code sorts the array in descending order, then iterates through it to find the largest h such that citations[h] >= h. If no such h is found, it returns the size of the array.
Time Complexity: O(n log n) due to sorting, Space Complexity: O(1) since the sorting is in place.
Given the constraints where citation counts do not exceed 1000 and the number of papers is at most 5000, a counting sort or bucket sort can be used. This approach involves creating a frequency array to count citations. Then traverse the frequency array to compute the h-index efficiently.
This C implementation uses a frequency array to count papers for citation values. It accumulates from the back (high values) to find the point where the count matches or exceeds the index.
Time Complexity: O(n + m) where n is citationsSize and m is the maximum citation value, Space Complexity: O(m).
We can sort the array citations in descending order. Then we enumerate the value h from large to small, if there is an h value satisfying citations[h-1] geq h, it means that there are at least h papers that have been cited at least h times, just return h directly. If we cannot find such an h value, it means that all the papers have not been cited, return 0.
Time complexity O(n times log n), space complexity O(log n). Here n is the length of the array citations.
We can use an array cnt of length n+1, where cnt[i] represents the number of papers with the reference count of i. We traverse the array citations and treat the papers with the reference count greater than n as papers with a reference count of n. Then we use the reference count as the index and add 1 to the corresponding element of cnt for each paper. In this way, we have counted the number of papers for each reference count.
Then we enumerate the value h from large to small, and add the element value of cnt with the index of h to the variable s, where s represents the number of papers with a reference count greater than or equal to h. If s geq h, it means that at least h papers have been cited at least h times, just return h directly.
Time complexity O(n), space complexity O(n). Here n is the length of the array citations.
Python
Java
C++
Go
TypeScript
We notice that if there is a h value that satisfies at least h papers are cited at least h times, then for any h'<h, at least h' papers are cited at least h' times. Therefore, we can use the binary search method to find the largest h such that at least h papers are cited at least h times.
We define the left boundary of binary search l=0 and the right boundary r=n. Each time we take mid = \lfloor \frac{l + r + 1}{2} \rfloor, where \lfloor x \rfloor represents floor x. Then we count the number of elements in array citations that are greater than or equal to mid, and denote it as s. If s geq mid, it means that at least mid papers are cited at least mid times. In this case, we change the left boundary l to mid. Otherwise, we change the right boundary r to mid-1. When the left boundary l is equal to the right boundary r, we find the largest h value, which is l or r.
Time complexity O(n times log n), where n is the length of array citations. Space complexity O(1).
Python
Java
C++
Go
TypeScript
| Approach | Complexity |
|---|---|
| Sorting Approach | Time Complexity: O(n log n) due to sorting, Space Complexity: O(1) since the sorting is in place. |
| Counting Sort Approach | Time Complexity: O(n + m) where n is citationsSize and m is the maximum citation value, Space Complexity: O(m). |
| Sorting | — |
| Counting + Sum | — |
| Binary Search | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Sorting the citations | O(n log n) | O(1) or O(log n) | Simple implementation and clear logic when optimal linear time is not required |
| Counting Sort / Bucket Counting | O(n) | O(n) | Best performance when citation counts are large but number of papers is limited |
H index | Leetcode #274 • Techdose • 58,727 views views
Watch 9 more video solutions →Practice H-Index with our built-in code editor and test cases.
Practice on FleetCode