DataFrame weather
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| city | object |
| month | object |
| temperature | int |
+-------------+--------+
Write a solution to pivot the data so that each row represents temperatures for a specific month, and each city is a separate column.
The result format is in the following example.
Example 1:
Input:
+--------------+----------+-------------+
| city | month | temperature |
+--------------+----------+-------------+
| Jacksonville | January | 13 |
| Jacksonville | February | 23 |
| Jacksonville | March | 38 |
| Jacksonville | April | 5 |
| Jacksonville | May | 34 |
| ElPaso | January | 20 |
| ElPaso | February | 6 |
| ElPaso | March | 26 |
| ElPaso | April | 2 |
| ElPaso | May | 43 |
+--------------+----------+-------------+
Output:
+----------+--------+--------------+
| month | ElPaso | Jacksonville |
+----------+--------+--------------+
| April | 2 | 5 |
| February | 6 | 23 |
| January | 20 | 13 |
| March | 26 | 38 |
| May | 43 | 34 |
+----------+--------+--------------+
Explanation:
The table is pivoted, each column represents a city, and each row represents a specific month.
The key idea in #2889 Reshape Data: Pivot is to transform a dataset from a long format into a wide format by turning unique values from one column into separate columns. This restructuring is commonly known as a pivot operation. Instead of iterating manually through rows, modern data libraries provide built-in utilities that handle this transformation efficiently.
The typical strategy is to identify three components: the index (rows to group by), the columns that should become new headers, and the values that populate the resulting table. Using a function such as pivot(), the dataset is reorganized so each unique column value becomes a separate column while preserving the row index.
This approach avoids manual reshaping logic and relies on optimized internal operations. Since each record is processed once to determine its final position in the reshaped table, the overall complexity is generally linear relative to the number of rows. The space complexity depends on the number of unique index–column combinations created during the pivot.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| DataFrame Pivot Operation | O(n) | O(n) |
NeetCode
Use these hints if you're stuck. Try solving on your own first.
Consider using a built-in function in pandas library to transform the data
This approach involves using the Pandas library in Python to pivot the data. Pandas offers a built-in pivot function that can easily transform your dataframe, setting the index, columns, and values appropriately.
Time Complexity: O(n) - where n is the number of records in the dataframe, as we essentially have to process each record once.
Space Complexity: O(n) - the space used by the output structure is linear relative to the input.
1import pandas as pd
2
3def pivot_weather(weather):
4 pivoted = weather.pivot(index='month', columns='city', values='temperature')
5 return pivoted.reset_index()
6
7weather = pd.DataFrame({
8 'city': ['Jacksonville', 'Jacksonville', 'Jacksonville', 'Jacksonville', 'Jacksonville',
9 'ElPaso', 'ElPaso', 'ElPaso', 'ElPaso', 'ElPaso'],
10 'month': ['January', 'February', 'March', 'April', 'May',
11 'January', 'February', 'March', 'April', 'May'],
12 'temperature': [13, 23, 38, 5, 34, 20, 6, 26, 2, 43]
13})
14
15print(pivot_weather(weather))This Python solution uses the Pandas library to pivot the data. The pivot function is called with the index set to 'month', columns set to 'city', and values set to 'temperature'. This means that each row is grouped by the 'month' field, and temperature data is organized into columns corresponding to each 'city'. Finally, reset_index() is applied to convert the row indices back into a column format.
This approach involves manually restructuring the input data using arrays and dictionaries to simulate the pivot operation. This example is demonstrated in the C programming language.
Time Complexity: O(m * n) - where m is the number of unique months and n is the number of unique cities, primarily driven by the nested looping.
Space Complexity: O(m*n) - the result table's space use is directly proportional to the input size, considering unique month and city combinations.
1#include <stdio.h>
2#include <string.h>
3
4
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
Direct pivot problems are less common in algorithmic interviews, but data manipulation questions frequently appear in data engineering and analytics roles. Understanding pivot operations is valuable for working with structured datasets and data transformation tasks.
The optimal approach is to use a built-in pivot operation such as DataFrame.pivot() in pandas. It efficiently converts unique values from one column into separate columns while aligning them with the specified index and values.
A tabular data structure like a pandas DataFrame is best suited for this problem. It provides high-level functions such as pivot and pivot_table that simplify reshaping operations without manual row processing.
The pivot function reshapes data assuming each index–column pair has a single value. In contrast, pivot_table supports aggregation, allowing multiple values for the same pair by applying functions like sum, mean, or count.
This C code manually aggregates the input data into a new structure that acts as a pivot table. It uses arrays to store unique cities and a static array to hold the results. The cities are dynamically identified, and their index is used to populate the 'temperature' attribute of each month's data with appropriate values. The transformation is effectively a 'pivot' operation achieved by nested loops and simple data structures in C.