Find Duplicate Subtrees - Solution & Explanation

Q: Is Find Duplicate Subtrees easy or hard?

Find Duplicate Subtrees is generally considered a medium-level problem. The challenge lies in recognizing that subtree structures must be uniquely represented and compared efficiently, typically using DFS traversal combined with hashing.

Q: Find Duplicate Subtrees Python/Java solution

Python and Java implementations typically perform a postorder DFS and store subtree representations in a hash map. The optimized version assigns unique IDs to subtree structures, ensuring O(n) time complexity while keeping memory usage linear.

Q: How to solve Find Duplicate Subtrees in O(n)?

Traverse the tree using postorder DFS and assign a unique ID to each distinct subtree structure. Store tuples (value, leftID, rightID) in a hash map that maps structures to IDs. Maintain a frequency map for each ID and record the subtree root when its count reaches two.

Q: What is the best approach for Find Duplicate Subtrees?

The most efficient approach uses DFS with unique subtree identifiers stored in a hash map. Each subtree is represented by a tuple of (node value, left subtree ID, right subtree ID), which is mapped to a unique integer. This avoids expensive string comparisons and runs in O(n) time with O(n) space.

Q: Is Find Duplicate Subtrees asked at Google/Amazon/Meta?

Find Duplicate Subtrees is a common binary tree hashing problem that has appeared in interviews at companies like Google, Amazon, and Meta. It tests understanding of DFS traversal, subtree representation, and efficient hashing techniques for structural comparison.

Q: What data structure is used in Find Duplicate Subtrees?

The solution relies on hash maps to store serialized subtree representations or subtree identifiers and their frequencies. Depth-first search recursion processes the binary tree, while the hash table allows constant-time lookup to detect duplicate subtree structures.

Q: What is the time complexity of Find Duplicate Subtrees?

The optimized solution runs in O(n) time because each node in the binary tree is processed once during DFS. Hash map lookups and ID assignments are constant time on average. A simpler serialization approach may degrade to O(n^2) time due to repeated string concatenation.

MediumHash Table Tree Depth-First Search Binary Tree10 min readAsked at: Amazon, Microsoft, Meta +5

Practice this problem

Problem Statement

Given the root of a binary tree, return all duplicate subtrees.

For each kind of duplicate subtrees, you only need to return the root node of any one of them.

Two trees are duplicate if they have the same structure with the same node values.

Example 1:

Input: root = [1,2,3,4,null,2,4,null,null,4]
Output: [[2,4],[4]]

Example 2:

Input: root = [2,1,1]
Output: [[1]]

Example 3:

Input: root = [2,2,2,3,null,3,null]
Output: [[2,3],[3]]

Constraints:

The number of the nodes in the tree will be in the range [1, 5000]
-200 <= Node.val <= 200

Approach Overview

Problem Overview: You are given the root of a binary tree and need to return the roots of all subtrees that appear more than once. Two subtrees are considered duplicates if they have the same structure and the same node values. The challenge is efficiently detecting identical subtree structures across the entire tree.

Approach 1: Tree Serialization Using DFS (O(n²) time, O(n) space)

This approach converts every subtree into a unique string representation using postorder traversal. During a DFS traversal, you serialize the left subtree, the right subtree, and the current node value into a string like "left,right,val". A hash table tracks how many times each serialized subtree appears. When a serialization is seen for the second time, you record that subtree root as a duplicate.

The key insight is that identical subtree structures generate identical serialized strings. Using depth-first search, every node contributes to exactly one serialization result, allowing you to systematically compare subtrees without pairwise comparisons. The downside is that repeated string concatenation can grow large for deep trees, leading to a worst‑case time complexity of O(n²).

Approach 2: Tree Serialization with Unique Identifiers (O(n) time, O(n) space)

A more efficient strategy replaces full string serialization with compact integer identifiers. During DFS, each subtree is represented by a tuple containing the current node value and the IDs of its left and right subtrees. This tuple is stored in a hash map that assigns a unique integer ID the first time the structure appears.

Each subtree is therefore represented by a small numeric signature instead of a long string. Another map counts how many times each subtree ID appears, and once the count reaches two, the subtree root is added to the result list. Because each node is processed once and tuple lookups are constant time, the algorithm runs in O(n) time and uses O(n) space.

This approach works particularly well for large binary tree inputs because subtree comparisons become integer comparisons instead of string comparisons.

Recommended for interviews: The unique identifier approach is the one interviewers typically expect. It demonstrates strong understanding of DFS traversal, hashing, and subtree memoization while achieving optimal O(n) complexity. Explaining the basic serialization idea first shows clear reasoning, but optimizing it with subtree IDs highlights deeper algorithmic thinking.

Approach 1: Tree Serialization Using DFS

Utilize a Depth-First Search (DFS) strategy and serialize each subtree. Treat the serialization as a key in a hashmap to track occurrences of identical subtrees. Duplicate entries are detected when a serialized key is encountered more than once.

This Python solution uses DFS to serialize each subtree into a string. These strings are stored in a dictionary. If a string appears more than once, it indicates a duplicate subtree. The function returns one node from each set of duplicates.

Code

Python Java C C++C#JavaScript

Python

Java

C++

JavaScript

Complexity

Time Complexity: O(N), where N is the number of nodes since we visit each node once.
Space Complexity: O(N), accounting for the hashmap to store serialized trees and recursion stack space.

Try this approach in the editor →

Approach 2: Tree Serialization with Unique Identifiers

Rather than using direct serialization strings, assign each unique subtree encountered a unique ID using a hash map. If a subtree's ID is seen again, it's a duplicate.

This Python approach maps each subtree structure to a unique identifier, incrementally assigning IDs using a dictionary. Duplicates are recognized when the same ID appears multiple times.

Code

Python Java C++JavaScript

Python

Java

C++

JavaScript

Complexity

Time Complexity: O(N), as we traverse each node.
Space Complexity: O(N), necessary for dictionaries and recursive processing space.

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Tree Serialization Using DFS	Time Complexity: O(N), where N is the number of nodes since we visit each node once. Space Complexity: O(N), accounting for the hashmap to store serialized trees and recursion stack space.
Tree Serialization with Unique Identifiers	Time Complexity: O(N), as we traverse each node. Space Complexity: O(N), necessary for dictionaries and recursive processing space.

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Tree Serialization Using DFS	O(n^2)	O(n)	Simple implementation when tree size is moderate and string serialization cost is acceptable
Serialization with Unique Subtree Identifiers	O(n)	O(n)	Optimal solution for interviews and large trees where repeated string comparisons are expensive

Video Solution

Find Duplicate Subtrees - Leetcode 652 - Python • NeetCodeIO • 27,013 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Find Duplicate Subtrees easy or hard?

Find Duplicate Subtrees is generally considered a medium-level problem. The challenge lies in recognizing that subtree structures must be uniquely represented and compared efficiently, typically using DFS traversal combined with hashing.

Find Duplicate Subtrees Python/Java solution

Python and Java implementations typically perform a postorder DFS and store subtree representations in a hash map. The optimized version assigns unique IDs to subtree structures, ensuring O(n) time complexity while keeping memory usage linear.

How to solve Find Duplicate Subtrees in O(n)?

Traverse the tree using postorder DFS and assign a unique ID to each distinct subtree structure. Store tuples (value, leftID, rightID) in a hash map that maps structures to IDs. Maintain a frequency map for each ID and record the subtree root when its count reaches two.

What is the best approach for Find Duplicate Subtrees?

The most efficient approach uses DFS with unique subtree identifiers stored in a hash map. Each subtree is represented by a tuple of (node value, left subtree ID, right subtree ID), which is mapped to a unique integer. This avoids expensive string comparisons and runs in O(n) time with O(n) space.

Is Find Duplicate Subtrees asked at Google/Amazon/Meta?

Find Duplicate Subtrees is a common binary tree hashing problem that has appeared in interviews at companies like Google, Amazon, and Meta. It tests understanding of DFS traversal, subtree representation, and efficient hashing techniques for structural comparison.

What data structure is used in Find Duplicate Subtrees?

The solution relies on hash maps to store serialized subtree representations or subtree identifiers and their frequencies. Depth-first search recursion processes the binary tree, while the hash table allows constant-time lookup to detect duplicate subtree structures.

What is the time complexity of Find Duplicate Subtrees?

The optimized solution runs in O(n) time because each node in the binary tree is processed once during DFS. Hash map lookups and ID assignments are constant time on average. A simpler serialization approach may degrade to O(n^2) time due to repeated string concatenation.

Ready to solve this problem?

Practice Find Duplicate Subtrees with our built-in code editor and test cases.

Practice on FleetCode

Construct String from Binary Tree

Delete Duplicate Folders in System

Problem Info

DifficultyMedium

Acceptance60.6%

Approaches2

Reading time10 min

Asked at

Amazon Microsoft Meta Salesforce Google

Practice this problem

Open in Editor

Find Duplicate Subtrees - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Tree Serialization Using DFS

Code

Complexity

Approach 2: Tree Serialization with Unique Identifiers

Code

Complexity

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Find Duplicate Subtrees - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Tree Serialization Using DFS

Code

Complexity

Approach 2: Tree Serialization with Unique Identifiers

Code

Complexity

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Tree Serialization Using DFS

Code

Complexity

Approach 2: Tree Serialization with Unique Identifiers

Code

Complexity

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Tree Serialization Using DFS

Code

Complexity

Approach 2: Tree Serialization with Unique Identifiers

Code

Complexity

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents