DataFrame employees
+-------------+--------+
| Column Name | Type. |
+-------------+--------+
| name | object |
| salary | int. |
+-------------+--------+
A company plans to provide its employees with a bonus.
Write a solution to create a new column name bonus that contains the doubled values of the salary column.
The result format is in the following example.
Example 1:
Input: DataFrame employees +---------+--------+ | name | salary | +---------+--------+ | Piper | 4548 | | Grace | 28150 | | Georgia | 1103 | | Willow | 6593 | | Finn | 74576 | | Thomas | 24433 | +---------+--------+ Output: +---------+--------+--------+ | name | salary | bonus | +---------+--------+--------+ | Piper | 4548 | 9096 | | Grace | 28150 | 56300 | | Georgia | 1103 | 2206 | | Willow | 6593 | 13186 | | Finn | 74576 | 149152 | | Thomas | 24433 | 48866 | +---------+--------+--------+ Explanation: A new column bonus is created by doubling the value in the column salary.
In #2881 Create a New Column, the goal is to generate an additional column in a dataset using values derived from an existing column. Since the problem is based on tabular data manipulation, the most efficient approach is to use vectorized operations provided by libraries such as pandas. Vectorization allows you to apply an operation to an entire column at once instead of iterating through rows manually.
The key idea is to reference the existing column and apply a mathematical transformation to produce the new column. This new column can then be inserted directly into the DataFrame using column assignment or helper methods designed for column creation. Because the operation runs across the column in a single pass, it is highly efficient and concise.
This method processes each row exactly once, leading to linear time complexity. The additional space required corresponds to storing the new column values.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Vectorized DataFrame Column Operation | O(n) | O(n) |
NeetCode
Use these hints if you're stuck. Try solving on your own first.
Consider using the `[]` brackets with the new column name at the left side of the assignment. The calculation of the value is done element-wise.
We can use Pandas for this task, leveraging its ability to perform operations over complete columns. Specifically, we can utilize the vectorized operation in Pandas to create a new column bonus by simply doubling the values of the existing salary column.
Time Complexity: O(n) - Where n is the number of rows in the DataFrame, as it processes each row once.
Space Complexity: O(n) - Requires additional space for the new bonus column.
1import pandas as pd
2
3# Sample DataFrame
4employees = pd.DataFrame({
5 'name': ['Piper', 'Grace', 'Georgia', 'Willow', 'Finn', 'Thomas'],
6 'salary': [4548, 28150, 1103, 6593, 74576, 24433]
7})
8
9# Creating the 'bonus' column with doubled salary values
10employees['bonus'] = employees['salary'] * 2
11
12print(employees)This Python solution uses Pandas to create a new column by applying a mathematical operation on an existing column. The multiplication operator (*) is used between the salary column and the scalar value 2, which doubles each of the salary values directly and assigns the result to a new column bonus.
In the absence of vectorized operations, we can achieve the task by iterating over each row of the DataFrame manually. However, this method is generally less efficient than vectorized operations.
Time Complexity: O(n) - It explicitly visits each row to calculate the bonus.
Space Complexity: O(n) - Holds the list of bonuses in memory before adding it as a new column.
1import pandas as pd
2
3# Sample DataFrame
4employees = pd.DataFrame({
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
The problem uses a pandas DataFrame, which is a tabular data structure similar to a spreadsheet or SQL table. Each column is represented as a Series, allowing efficient vectorized operations across all rows.
The optimal approach is to use vectorized column operations provided by pandas. Instead of looping through rows, apply the transformation directly to the entire column and assign the result to a new column in the DataFrame.
While the exact problem may not always appear, similar DataFrame manipulation tasks are common in data engineering and data science interviews. Understanding how to create and transform columns efficiently is an important practical skill.
Vectorization allows operations to run on entire columns at once using optimized internal implementations. This avoids slow Python loops and results in cleaner code and better performance for large datasets.
This implementation does not use vectorized operations and loops over each row of the DataFrame using the DataFrame's iterrows() method. It calculates the double salary for each employee and stores it in a list, which is then added as a new column to the DataFrame.