Delete Duplicate Folders in System - Video Solutions

HardArrayHash TableStringTrieHash Function

Delete Duplicate Folders in System | Super Detailed Explanation | Leetcode 1948 | codestorywithMIK

1:18:048,769 views

10 video solutions available

Delete Duplicate Folders in System - Video Solution

Watch 10 video solutions for Delete Duplicate Folders in System, a hard level problem involving Array, Hash Table, String. This walkthrough by codestorywithMIK has 8,769 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.

Problem Statement

Due to a bug, there are many duplicate folders in a file system. You are given a 2D array paths, where paths[i] is an array representing an absolute path to the i^th folder in the file system.

For example, ["one", "two", "three"] represents the path "/one/two/three".

Two folders (not necessarily on the same level) are identical if they contain the same non-empty set of identical subfolders and underlying subfolder structure. The folders do not need to be at the root level to be identical. If two or more folders are identical, then mark the folders as well as all their subfolders.

For example, folders "/a" and "/b" in the file structure below are identical. They (as well as their subfolders) should all be marked:
- /a
- /a/x
- /a/x/y
- /a/z
- /b
- /b/x
- /b/x/y
- /b/z
However, if the file structure also included the path "/b/w", then the folders "/a" and "/b" would not be identical. Note that "/a/x" and "/b/x" would still be considered identical even with the added folder.

Once all the identical folders and their subfolders have been marked, the file system will delete all of them. The file system only runs the deletion once, so any folders that become identical after the initial deletion are not deleted.

Return the 2D array ans containing the paths of the remaining folders after deleting all the marked folders. The paths may be returned in any order.

Example 1:

Input: paths = [["a"],["c"],["d"],["a","b"],["c","b"],["d","a"]]
Output: [["d"],["d","a"]]
Explanation: The file structure is as shown.
Folders "/a" and "/c" (and their subfolders) are marked for deletion because they both contain an empty
folder named "b".

Example 2:

Input: paths = [["a"],["c"],["a","b"],["c","b"],["a","b","x"],["a","b","x","y"],["w"],["w","y"]]
Output: [["c"],["c","b"],["a"],["a","b"]]
Explanation: The file structure is as shown. 
Folders "/a/b/x" and "/w" (and their subfolders) are marked for deletion because they both contain an empty folder named "y".
Note that folders "/a" and "/c" are identical after the deletion, but they are not deleted because they were not marked beforehand.

Example 3:

Input: paths = [["a","b"],["c","d"],["c"],["a"]]
Output: [["c"],["c","d"],["a"],["a","b"]]
Explanation: All folders are unique in the file system.
Note that the returned array can be in a different order as the order does not matter.

Constraints:

1 <= paths.length <= 2 * 10⁴
1 <= paths[i].length <= 500
1 <= paths[i][j].length <= 10
1 <= sum(paths[i][j].length) <= 2 * 10⁵
path[i][j] consists of lowercase English letters.
No two paths lead to the same folder.
For any folder not at the root level, its parent folder will also be in the input.

Read full problem with examples

Approach Overview

Problem Overview: You receive a list of folder paths that represent a virtual file system. If two folders contain identical subfolder structures, both folders (and their subtrees) must be removed. The task is to detect structurally identical folder trees and return the remaining folder paths.

Approach 1: Tree Serialization with Hashing (O(n * k) time, O(n * k) space)

Build a directory tree using a trie-like structure where each node represents a folder. Each path from the input list is inserted into the tree. After constructing the tree, perform a post-order traversal to serialize every subtree. The serialization string encodes the folder name and the ordered representation of its children. Store each serialization in a hash table to count occurrences. If the same serialized structure appears more than once, mark those nodes as duplicates. A second traversal collects only the folders that are not part of duplicate subtrees.

The key insight: two folders are duplicates if their entire subtree structure is identical. Serialization converts tree structure comparison into simple hash comparisons. Post-order traversal ensures child structures are processed before their parent folders.

Approach 2: Hashing Subfolder Structures (O(n * k) time, O(n * k) space)

This variation also constructs the folder tree but focuses on computing structural hashes instead of long serialization strings. Each node generates a hash based on its folder name and the hashes of its children. Child hashes are sorted and combined to create a canonical representation of the subtree. The resulting hash is stored in a frequency map. If multiple nodes produce the same hash, those folders represent duplicate subtrees.

Using hashes instead of long serialized strings reduces memory overhead and speeds up comparisons. The algorithm still relies on a bottom-up traversal so that every folder's structure is uniquely determined by its children.

Both approaches depend on representing folder hierarchies as a tree and comparing subtree structures using hashing. The use of string serialization or structural hashes converts a complex tree comparison problem into efficient map lookups.

Recommended for interviews: Tree serialization with hashing is the approach most interviewers expect. It clearly demonstrates how to model hierarchical data using a trie and how to detect duplicate structures with hashing. Explaining the serialization idea first shows strong problem decomposition, while the hash-based optimization shows awareness of performance tradeoffs.

Complexity Analysis

Approach	Time	Space	When to Use
Tree Serialization with Hashing	O(n * k)	O(n * k)	Standard interview solution. Easy to reason about subtree equality using serialized strings.
Hashing Subfolder Structures	O(n * k)	O(n * k)	When large serialized strings become expensive and structural hashing improves performance.
Naive Subtree Comparison	O(n^2 * k)	O(n * k)	Conceptual baseline for understanding duplicate subtree detection but not practical for large inputs.

Continue Learning

Practice this problem

Solve it on FleetCode

Read Solution & Explanation

Detailed text walkthrough

Delete Duplicate Folders in System - Video Solutions

Delete Duplicate Folders in System | Super Detailed Explanation | Leetcode 1948 | codestorywithMIK

Delete Duplicate Folders in System | Leetcode 1948

LeetCode 1948. Delete Duplicate Folders in System

1948. Delete Duplicate Folders in System | Leetcode Daily - Python

Daily Leetcode #1937: Jul 20, 2025 - 1948. Delete Duplicate Folders in System

1948. Delete Duplicate Folders in System (Leetcode Hard)

Delete Duplicate Folders in System | LeetCode 1948 - Python

Leetcode 1948 🎯 | Kill 🔪 fear of Trie | Delete Duplicate Folders | Trie + DFS | Today’s DCC

Delete Duplicate Folders in System| LeetCode 1948|Leetcode potd| today daily leetcode| daily potd

1948. Delete Duplicate Folders in System - Leetcode Solution ⭐

Delete Duplicate Folders in System - Video Solution

Problem Statement

Approach Overview

Complexity Analysis

Continue Learning

Delete Duplicate Folders in System - Video Solutions

Delete Duplicate Folders in System | Super Detailed Explanation | Leetcode 1948 | codestorywithMIK

Delete Duplicate Folders in System | Leetcode 1948

LeetCode 1948. Delete Duplicate Folders in System

1948. Delete Duplicate Folders in System | Leetcode Daily - Python

Daily Leetcode #1937: Jul 20, 2025 - 1948. Delete Duplicate Folders in System

1948. Delete Duplicate Folders in System (Leetcode Hard)

Delete Duplicate Folders in System | LeetCode 1948 - Python

Leetcode 1948 🎯 | Kill 🔪 fear of Trie | Delete Duplicate Folders | Trie + DFS | Today’s DCC

Delete Duplicate Folders in System| LeetCode 1948|Leetcode potd| today daily leetcode| daily potd

1948. Delete Duplicate Folders in System - Leetcode Solution ⭐

Delete Duplicate Folders in System - Video Solution

Problem Statement

Approach Overview

Complexity Analysis

Continue Learning