H-Index - Solution & Explanation

Q: Is H-Index easy or hard?

H-Index is generally classified as a medium-level problem. Understanding the definition and implementing the sorting approach is straightforward, but recognizing the bucket counting optimization requires stronger algorithmic insight.

Q: How to solve H-Index in O(n)?

Use a bucket counting strategy. Create an array of size n+1 where index i stores how many papers have exactly i citations (values greater than n are capped at n). Traverse the bucket array from n down to 0 while maintaining a cumulative count of papers. The first index where the cumulative count becomes at least the index is the H-index.

Q: What is the best approach for H-Index?

The most optimal approach uses a counting sort or bucket counting technique with O(n) time and O(n) space. It works by grouping citation counts into buckets from 0 to n (where n is the number of papers) and scanning backward to find the largest h where at least h papers have ≥ h citations. This avoids sorting the entire array.

Q: What data structure is used in H-Index?

The core data structure is an array. The basic method sorts the array of citation counts, while the optimal method uses an additional bucket array to store frequency counts. No complex structures like trees or heaps are required.

Q: What is the time complexity of H-Index?

Two common complexities appear in solutions. The straightforward sorting approach runs in O(n log n) time because the citations array must be ordered first. The optimized counting sort approach achieves O(n) time by using a frequency bucket array and scanning it once.

Q: Is there a Python or Java solution for H-Index?

Yes. The sorting approach can be implemented in a few lines using built-in sort functions in Python, Java, C++, or JavaScript. The linear-time version uses an integer array for buckets and works similarly across languages, making it easy to translate between Python, Java, and C++.

Q: Is H-Index asked at Google, Amazon, or Meta interviews?

H-Index appears in interview preparation sets for companies like Amazon, Google, and Meta because it tests reasoning about arrays, counting strategies, and translating a mathematical definition into code. The optimal bucket approach also evaluates whether candidates can improve beyond a straightforward sorting solution.

MediumArray Sorting Counting Sort26 min readAsked at: Amazon, Microsoft, Apple +8

Practice this problem

Problem Statement

Given an array of integers citations where citations[i] is the number of citations a researcher received for their i^th paper, return the researcher's h-index.

According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.

Example 1:

Input: citations = [3,0,6,1,5]
Output: 3
Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively.
Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.

Example 2:

Input: citations = [1,3,1]
Output: 1

Constraints:

n == citations.length
1 <= n <= 5000
0 <= citations[i] <= 1000

Approach Overview

Problem Overview: You receive an array where citations[i] represents how many times the i-th paper was cited. The H-Index is the maximum value h such that the researcher has at least h papers with h or more citations. The task is to compute that value efficiently from an unsorted list.

Approach 1: Sorting the Citations (O(n log n) time, O(1) extra space)

The most direct strategy is to sort the citation counts in descending order using a standard sorting algorithm. After sorting, iterate through the array and check the largest index i where citations[i] ≥ i + 1. The insight is that once the papers are ordered by citation count, the position in the array corresponds to how many papers have at least that many citations. During the scan, keep updating the potential H-index until the condition breaks. This approach is easy to implement and reliable in interviews when optimal linear-time tricks are not required.

Approach 2: Counting Sort / Bucket Counting (O(n) time, O(n) space)

The H-index value can never exceed the number of papers n. A paper cited more than n times behaves the same as one cited exactly n times for the purpose of the metric. Use this observation to build a frequency array (bucket counts) of size n + 1. For each citation value, increment count[min(citation, n)]. Then iterate from n down to 0, maintaining a running total of papers that have at least that many citations. The first index where the cumulative count becomes ≥ the index is the H-index. This technique leverages ideas from counting sort and avoids full sorting while scanning the data only a few times.

The bucket method works well because we only care about counts relative to n, not the exact ordering of every element. Each paper contributes to a bucket, and a backward pass determines the largest feasible h. The algorithm performs linear passes through the input and the bucket array, giving true O(n) time complexity.

Both solutions operate on a simple array and rely on either ordering the values or counting their distribution. Sorting favors readability and quick implementation, while counting sort is more algorithmically optimal.

Recommended for interviews: Start with the sorting approach since it clearly demonstrates understanding of the H-index definition and takes only a few lines of code. Once that works, mention the counting bucket optimization that reduces the complexity to O(n). Interviewers typically expect candidates to recognize the bound that the H-index cannot exceed the number of papers and use that to build the linear-time solution.

Approach 1: Sorting Approach

This approach uses sorting to calculate the h-index. The idea is to sort the array of citations in descending order. Then, find the maximum number h such that there are h papers with at least h citations. This can be efficiently determined by iterating over the sorted array.

The code sorts the array in descending order, then iterates through it to find the largest h such that citations[h] >= h. If no such h is found, it returns the size of the array.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n log n) due to sorting, Space Complexity: O(1) since the sorting is in place.

Try this approach in the editor →

Approach 2: Counting Sort Approach

Given the constraints where citation counts do not exceed 1000 and the number of papers is at most 5000, a counting sort or bucket sort can be used. This approach involves creating a frequency array to count citations. Then traverse the frequency array to compute the h-index efficiently.

This C implementation uses a frequency array to count papers for citation values. It accumulates from the back (high values) to find the point where the count matches or exceeds the index.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n + m) where n is citationsSize and m is the maximum citation value, Space Complexity: O(m).

Try this approach in the editor →

Approach 3: Sorting

We can sort the array citations in descending order. Then we enumerate the value h from large to small, if there is an h value satisfying citations[h-1] geq h, it means that there are at least h papers that have been cited at least h times, just return h directly. If we cannot find such an h value, it means that all the papers have not been cited, return 0.

Time complexity O(n times log n), space complexity O(log n). Here n is the length of the array citations.

Code

Python Java C++Go TypeScript Rust

Python

Java

C++

TypeScript

Rust

Try this approach in the editor →

Approach 4: Counting + Sum

We can use an array cnt of length n+1, where cnt[i] represents the number of papers with the reference count of i. We traverse the array citations and treat the papers with the reference count greater than n as papers with a reference count of n. Then we use the reference count as the index and add 1 to the corresponding element of cnt for each paper. In this way, we have counted the number of papers for each reference count.

Then we enumerate the value h from large to small, and add the element value of cnt with the index of h to the variable s, where s represents the number of papers with a reference count greater than or equal to h. If s geq h, it means that at least h papers have been cited at least h times, just return h directly.

Time complexity O(n), space complexity O(n). Here n is the length of the array citations.

Code

Python Java C++Go TypeScript

Python

Java

C++

TypeScript

Try this approach in the editor →

Approach 5: Binary Search

We notice that if there is a h value that satisfies at least h papers are cited at least h times, then for any h'<h, at least h' papers are cited at least h' times. Therefore, we can use the binary search method to find the largest h such that at least h papers are cited at least h times.

We define the left boundary of binary search l=0 and the right boundary r=n. Each time we take mid = \lfloor \frac{l + r + 1}{2} \rfloor, where \lfloor x \rfloor represents floor x. Then we count the number of elements in array citations that are greater than or equal to mid, and denote it as s. If s geq mid, it means that at least mid papers are cited at least mid times. In this case, we change the left boundary l to mid. Otherwise, we change the right boundary r to mid-1. When the left boundary l is equal to the right boundary r, we find the largest h value, which is l or r.

Time complexity O(n times log n), where n is the length of array citations. Space complexity O(1).

Code

Python Java C++Go TypeScript

Python

Java

C++

TypeScript

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Sorting Approach	Time Complexity: O(n log n) due to sorting, Space Complexity: O(1) since the sorting is in place.
Counting Sort Approach	Time Complexity: O(n + m) where n is citationsSize and m is the maximum citation value, Space Complexity: O(m).
Sorting	—
Counting + Sum	—
Binary Search	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Sorting the citations	O(n log n)	O(1) or O(log n)	Simple implementation and clear logic when optimal linear time is not required
Counting Sort / Bucket Counting	O(n)	O(n)	Best performance when citation counts are large but number of papers is limited

Video Solution

H index | Leetcode #274 • Techdose • 58,727 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is H-Index easy or hard?

H-Index is generally classified as a medium-level problem. Understanding the definition and implementing the sorting approach is straightforward, but recognizing the bucket counting optimization requires stronger algorithmic insight.

How to solve H-Index in O(n)?

Use a bucket counting strategy. Create an array of size n+1 where index i stores how many papers have exactly i citations (values greater than n are capped at n). Traverse the bucket array from n down to 0 while maintaining a cumulative count of papers. The first index where the cumulative count becomes at least the index is the H-index.

What is the best approach for H-Index?

The most optimal approach uses a counting sort or bucket counting technique with O(n) time and O(n) space. It works by grouping citation counts into buckets from 0 to n (where n is the number of papers) and scanning backward to find the largest h where at least h papers have ≥ h citations. This avoids sorting the entire array.

What data structure is used in H-Index?

The core data structure is an array. The basic method sorts the array of citation counts, while the optimal method uses an additional bucket array to store frequency counts. No complex structures like trees or heaps are required.

What is the time complexity of H-Index?

Two common complexities appear in solutions. The straightforward sorting approach runs in O(n log n) time because the citations array must be ordered first. The optimized counting sort approach achieves O(n) time by using a frequency bucket array and scanning it once.

Is there a Python or Java solution for H-Index?

Yes. The sorting approach can be implemented in a few lines using built-in sort functions in Python, Java, C++, or JavaScript. The linear-time version uses an integer array for buckets and works similarly across languages, making it easy to translate between Python, Java, and C++.

Is H-Index asked at Google, Amazon, or Meta interviews?

H-Index appears in interview preparation sets for companies like Amazon, Google, and Meta because it tests reasoning about arrays, counting strategies, and translating a mathematical definition into code. The optimal bucket approach also evaluates whether candidates can improve beyond a straightforward sorting solution.

Ready to solve this problem?

Practice H-Index with our built-in code editor and test cases.

Practice on FleetCode

H-Index II

Problem Info

DifficultyMedium

Acceptance41.2%

Approaches5

Reading time26 min

Asked at

Amazon Microsoft Apple Meta NVIDIA

Practice this problem

Open in Editor

Problem Statement

Approach Overview

Approach 1: Sorting Approach

Code

Complexity

Approach 2: Counting Sort Approach

Code

Complexity

Approach 3: Sorting

Code

Approach 4: Counting + Sum

Code

Approach 5: Binary Search

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Sorting Approach

Code

Complexity

Approach 2: Counting Sort Approach

Code

Complexity

Approach 3: Sorting

Code

Approach 4: Counting + Sum

Code

Approach 5: Binary Search

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents