DataFramedf1+-------------+--------+ | Column Name | Type | +-------------+--------+ | student_id | int | | name | object | | age | int | +-------------+--------+ DataFramedf2+-------------+--------+ | Column Name | Type | +-------------+--------+ | student_id | int | | name | object | | age | int | +-------------+--------+
Write a solution to concatenate these two DataFrames vertically into one DataFrame.
The result format is in the following example.
Example 1:
Input: df1 +------------+---------+-----+ | student_id | name | age | +------------+---------+-----+ | 1 | Mason | 8 | | 2 | Ava | 6 | | 3 | Taylor | 15 | | 4 | Georgia | 17 | +------------+---------+-----+ df2 +------------+------+-----+ | student_id | name | age | +------------+------+-----+ | 5 | Leo | 7 | | 6 | Alex | 7 | +------------+------+-----+ Output: +------------+---------+-----+ | student_id | name | age | +------------+---------+-----+ | 1 | Mason | 8 | | 2 | Ava | 6 | | 3 | Taylor | 15 | | 4 | Georgia | 17 | | 5 | Leo | 7 | | 6 | Alex | 7 | +------------+---------+-----+ Explanation: The two DataFramess are stacked vertically, and their rows are combined.
In #2888 Reshape Data: Concatenate, the task is to combine two datasets into a single structure by stacking their rows. The most efficient approach is to use the pandas.concat() function, which allows multiple DataFrames to be merged along a specific axis.
Since the goal is to append rows from one DataFrame below another, you concatenate them along axis=0. This keeps the column structure intact while expanding the number of rows. If the indices should be continuous after concatenation, the ignore_index=True parameter can be used to reset them automatically.
The operation processes each row once while constructing the resulting DataFrame, making it efficient even for moderately large datasets. Understanding how concat works is essential for many real-world data transformation tasks such as merging logs, combining datasets, or preparing data for analysis.
Time Complexity: O(n + m) where n and m are the number of rows in the two DataFrames.
Space Complexity: O(n + m) for storing the combined result.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
Pandas Concatenation using pd.concat() | O(n + m) | O(n + m) |
NeetCode
Use these hints if you're stuck. Try solving on your own first.
Consider using a built-in function in pandas library with the appropriate axis argument.
The most straightforward way to concatenate two DataFrames vertically in Python is by using the pandas.concat function. This function allows you to combine multiple DataFrames along either the rows or columns, specified by the axis parameter. Setting axis=0 will stack the DataFrames on top of each other.
Time Complexity: O(n + m), where n and m are the number of rows in df1 and df2, respectively.
Space Complexity: O(n + m) for storing the new concatenated DataFrame.
1import pandas as pd
2
3def concatenate_dataframes(df1, df2):
4 return pd.concat([df1, df2], axis=0, ignore_index=True)This Python solution uses the pandas library to concatenate df1 and df2. The pd.concat function combines the two DataFrames by stacking them on top of each other, specified by axis=0. Setting ignore_index=True reassigns an automatic sequential index to the concatenated DataFrame.
This approach involves manually appending rows from the second DataFrame to the first. This could be slower compared to built-in function calls but provides a clear understanding of what's happening under-the-hood when DataFrames are concatenated.
Time Complexity: O(n + m), similar to the concatenation function, but with additional overhead due to manual row iteration.
Space Complexity: O(n + m) for storing the combined rows in a new DataFrame.
1import pandas as pd
2
3 def concatenate_dataframes_manually(df1, df2):
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
Use pandas concat when you want to stack datasets vertically or horizontally without matching keys. Merge is more suitable when you need to join tables based on common columns or relationships.
Direct pandas problems are less common in traditional algorithm interviews but appear in data engineering and data science roles. Understanding DataFrame operations like concat is valuable for practical data manipulation tasks.
The problem works with pandas DataFrames, which are tabular data structures used for data analysis in Python. DataFrames allow efficient row and column operations such as concatenation and reshaping.
The optimal approach is to use the pandas function pd.concat() to combine multiple DataFrames. By concatenating along axis=0, rows from both datasets are stacked while preserving the column structure.
This Python solution manually creates a list of rows from both df1 and df2 using the iterrows() function. It concatenates these lists to form a complete list of rows and constructs a new DataFrame from it.