Watch 10 video solutions for H-Index, a medium level problem involving Array, Sorting, Counting Sort. This walkthrough by Techdose has 53,608 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return the researcher's h-index.
According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.
Example 1:
Input: citations = [3,0,6,1,5] Output: 3 Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.
Example 2:
Input: citations = [1,3,1] Output: 1
Constraints:
n == citations.length1 <= n <= 50000 <= citations[i] <= 1000Problem Overview: You receive an array where citations[i] represents how many times the i-th paper was cited. The H-Index is the maximum value h such that the researcher has at least h papers with h or more citations. The task is to compute that value efficiently from an unsorted list.
Approach 1: Sorting the Citations (O(n log n) time, O(1) extra space)
The most direct strategy is to sort the citation counts in descending order using a standard sorting algorithm. After sorting, iterate through the array and check the largest index i where citations[i] ≥ i + 1. The insight is that once the papers are ordered by citation count, the position in the array corresponds to how many papers have at least that many citations. During the scan, keep updating the potential H-index until the condition breaks. This approach is easy to implement and reliable in interviews when optimal linear-time tricks are not required.
Approach 2: Counting Sort / Bucket Counting (O(n) time, O(n) space)
The H-index value can never exceed the number of papers n. A paper cited more than n times behaves the same as one cited exactly n times for the purpose of the metric. Use this observation to build a frequency array (bucket counts) of size n + 1. For each citation value, increment count[min(citation, n)]. Then iterate from n down to 0, maintaining a running total of papers that have at least that many citations. The first index where the cumulative count becomes ≥ the index is the H-index. This technique leverages ideas from counting sort and avoids full sorting while scanning the data only a few times.
The bucket method works well because we only care about counts relative to n, not the exact ordering of every element. Each paper contributes to a bucket, and a backward pass determines the largest feasible h. The algorithm performs linear passes through the input and the bucket array, giving true O(n) time complexity.
Both solutions operate on a simple array and rely on either ordering the values or counting their distribution. Sorting favors readability and quick implementation, while counting sort is more algorithmically optimal.
Recommended for interviews: Start with the sorting approach since it clearly demonstrates understanding of the H-index definition and takes only a few lines of code. Once that works, mention the counting bucket optimization that reduces the complexity to O(n). Interviewers typically expect candidates to recognize the bound that the H-index cannot exceed the number of papers and use that to build the linear-time solution.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Sorting the citations | O(n log n) | O(1) or O(log n) | Simple implementation and clear logic when optimal linear time is not required |
| Counting Sort / Bucket Counting | O(n) | O(n) | Best performance when citation counts are large but number of papers is limited |