DataFrame report
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| product | object |
| quarter_1 | int |
| quarter_2 | int |
| quarter_3 | int |
| quarter_4 | int |
+-------------+--------+
Write a solution to reshape the data so that each row represents sales data for a product in a specific quarter.
The result format is in the following example.
Example 1:
Input: +-------------+-----------+-----------+-----------+-----------+ | product | quarter_1 | quarter_2 | quarter_3 | quarter_4 | +-------------+-----------+-----------+-----------+-----------+ | Umbrella | 417 | 224 | 379 | 611 | | SleepingBag | 800 | 936 | 93 | 875 | +-------------+-----------+-----------+-----------+-----------+ Output: +-------------+-----------+-------+ | product | quarter | sales | +-------------+-----------+-------+ | Umbrella | quarter_1 | 417 | | SleepingBag | quarter_1 | 800 | | Umbrella | quarter_2 | 224 | | SleepingBag | quarter_2 | 936 | | Umbrella | quarter_3 | 379 | | SleepingBag | quarter_3 | 93 | | Umbrella | quarter_4 | 611 | | SleepingBag | quarter_4 | 875 | +-------------+-----------+-------+ Explanation: The DataFrame is reshaped from wide to long format. Each row represents the sales of a product in a quarter.
The goal of #2890 Reshape Data: Melt is to convert a dataset from a wide format into a long format. In wide datasets, multiple columns represent values of the same variable (for example, quarterly or yearly measurements). The task is to restructure these columns into two new columns representing the variable name and its value.
The most direct approach is to use the pandas.melt() function. First, identify the column that should remain unchanged (the identifier column) using id_vars. Then specify the columns that should be unpivoted using value_vars. The melt operation stacks these columns into rows, producing a normalized structure with a variable column and a value column.
This transformation is efficient because it processes each cell once while reorganizing the structure. The resulting DataFrame is easier to analyze, aggregate, or visualize.
Time Complexity: O(n × m), where n is the number of rows and m is the number of melted columns. Space Complexity: O(n × m) due to the reshaped output.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Pandas melt transformation | O(n × m) | O(n × m) |
codebasics
Use these hints if you're stuck. Try solving on your own first.
Consider using a built-in function in pandas library to transform the data
The pandas library offers a built-in method called melt which can directly transform data frames from wide to long format. This approach is particularly useful and efficient for reshaping data.
Time Complexity: O(n), where n is the number of elements in the DataFrame because the function iterates through each element.
Space Complexity: O(n) as a new DataFrame is created to store the reshaped data.
1import pandas as pd
2
3# Sample DataFrame
4data = {
5 'product': ['Umbrella', 'SleepingBag'],
6 'quarter_1': [417, 800],
7 'quarter_2': [224, 936],
8 'quarter_3': [379, 93],
9 'quarter_4': [611, 875]
10}
11
12df = pd.DataFrame(data)
13
14# Reshape using melt
15reshaped_df = pd.melt(df, id_vars=['product'], var_name='quarter', value_name='sales')
16
17print(reshaped_df)This solution uses the pandas.melt function to reshape the data from wide to long format. The id_vars parameter is used to identify the columns that should remain steady, while the var_name and value_name specify the names for the new dimension and its corresponding values.
Here, we manually loop through the data, restructuring it by iterating over each record. This method provides a more granular approach compared to using built-in functions.
Time Complexity: O(p * q), where p is the number of products and q is the number of quarters in the data.
Space Complexity: O(n), where n is the total number of elements in the reshaped DataFrame.
1import pandas as pd
2
3# Sample DataFrame
4data = {
5
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
While this exact problem may not always appear, data transformation and pandas operations are commonly tested in data engineering, data science, and analytics interviews at top tech companies.
The melt operation converts wide data into long format, which is often easier to analyze, group, or visualize. Many analytical tools and plotting libraries prefer long-format data structures.
The optimal approach is to use the pandas melt function to convert columns into rows. By specifying identifier columns and value columns, you can efficiently transform wide-format data into a normalized long format suitable for analysis.
The problem is designed to be solved using the pandas DataFrame structure in Python. Pandas provides the melt method, which is specifically built for reshaping tabular data efficiently.
This solution constructs a new DataFrame by manually iterating over each entry in the original data. It collects product names, each quarter, and the associated sales figure. Each item is appended to a list that forms the structure of the new DataFrame.