Watch 9 video solutions for Create a New Column, a easy level problem. This walkthrough by You Data And AI has 672 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
DataFrame employees
+-------------+--------+
| Column Name | Type. |
+-------------+--------+
| name | object |
| salary | int. |
+-------------+--------+
A company plans to provide its employees with a bonus.
Write a solution to create a new column name bonus that contains the doubled values of the salary column.
The result format is in the following example.
Example 1:
Input: DataFrame employees +---------+--------+ | name | salary | +---------+--------+ | Piper | 4548 | | Grace | 28150 | | Georgia | 1103 | | Willow | 6593 | | Finn | 74576 | | Thomas | 24433 | +---------+--------+ Output: +---------+--------+--------+ | name | salary | bonus | +---------+--------+--------+ | Piper | 4548 | 9096 | | Grace | 28150 | 56300 | | Georgia | 1103 | 2206 | | Willow | 6593 | 13186 | | Finn | 74576 | 149152 | | Thomas | 24433 | 48866 | +---------+--------+--------+ Explanation: A new column bonus is created by doubling the value in the column salary.
Problem Overview: You receive a Pandas DataFrame representing employee data. The task is simple: create a new column called bonus where each value is double the employee's salary. The result should return the updated DataFrame with the new column added.
Approach 1: Looping Through DataFrame Rows (O(n) time, O(n) space)
A straightforward approach iterates through each row of the DataFrame, computes the bonus for that row, and stores the result in a new column. In Python, this is often done using iterrows() or a similar row-wise loop. For every row, you read the salary value, multiply it by 2, and assign the result to the corresponding index in the bonus column. This approach works for beginners because it mirrors how you would process a list: read one element, compute, store, repeat.
The downside is performance. Row-wise iteration in Pandas bypasses many internal optimizations and runs slower for large datasets. The time complexity is O(n) because every row is processed once, and the extra column requires O(n) additional space. While correct, this approach is generally avoided in production data workflows. If you're exploring row manipulation concepts in Python or learning DataFrame basics, it still helps build intuition.
Approach 2: Pandas Vectorized Operations (O(n) time, O(n) space)
The optimal approach uses Pandas vectorized operations. Instead of iterating row by row, you operate on the entire column at once. Pandas internally applies the computation using optimized C-backed routines, which makes it significantly faster and more concise.
You directly assign a new column using an expression such as df['bonus'] = df['salary'] * 2. The multiplication runs across the entire column in a single vectorized operation. Conceptually, Pandas broadcasts the arithmetic operation across every element in the salary Series and constructs the new bonus column automatically.
This method still touches each row once, so the time complexity remains O(n), but the implementation is cleaner and leverages Pandas' optimized execution engine. Space complexity is O(n) for storing the new column. Vectorized operations are a core concept in Pandas and general DataFrame manipulation.
Recommended for interviews: Interviewers expect the vectorized Pandas solution. The row-iteration approach demonstrates basic understanding but signals inefficient use of the library. Writing the vectorized expression immediately shows familiarity with real-world data processing patterns and Pandas best practices.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Looping Through DataFrame Rows | O(n) | O(n) | When learning DataFrame iteration or debugging row-level logic |
| Pandas Vectorized Operations | O(n) | O(n) | Preferred for production Pandas workflows and interview solutions |