Repeated DNA Sequences - Solution & Explanation

Q: Is Repeated DNA Sequences easy or hard?

Repeated DNA Sequences is considered a medium difficulty problem. The basic sliding window and hashing solution is straightforward, but recognizing the bit manipulation optimization with 2-bit encoding requires deeper understanding of hashing and memory-efficient representations.

Q: Repeated DNA Sequences Python/Java solution

Python and Java implementations usually rely on a sliding window and a hash map or set to store 10-character substrings. Each iteration extracts the substring, checks if it has appeared before, and records duplicates. The logic runs in linear time and is easy to implement in high-level languages.

Q: How to solve Repeated DNA Sequences in O(n)?

Scan the string using a fixed-size sliding window of length 10. Store each sequence in a hash set or hash map and track duplicates. A more optimized version encodes each DNA character with 2 bits and maintains a rolling 20-bit hash, allowing constant-time updates while scanning the string once.

Q: What is the best approach for Repeated DNA Sequences?

The most practical approach uses a sliding window of length 10 with a hash map to track seen substrings. Each window substring is inserted into the hash table and duplicates are recorded when the count reaches two. This method runs in O(n) time with O(n) space and is simple to explain in interviews.

Q: Is Repeated DNA Sequences asked at Google/Amazon/Meta?

Repeated DNA Sequences is a common medium-level string and hashing problem seen in interviews at large tech companies. Variants involving substring hashing, rolling hashes, and duplicate detection appear in interviews at companies like Amazon and Google.

Q: What data structure is used in Repeated DNA Sequences?

Hash tables or hash sets are the primary data structures used to track previously seen sequences. In optimized solutions, bit manipulation combined with a rolling hash encodes sequences into integers while still using a hash set for duplicate detection.

Q: What is the time complexity of Repeated DNA Sequences?

The optimal solutions run in O(n) time where n is the length of the DNA string. Each position is processed once using a sliding window, and hash lookups are constant time on average. Space complexity is O(n) due to storing seen sequences or encoded hashes.

MediumHash Table String Bit Manipulation Sliding Window12 min readAsked at: Amazon, Microsoft, Meta +4

Practice this problem

Problem Statement

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.

For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example 1:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]

Constraints:

1 <= s.length <= 10⁵
s[i] is either 'A', 'C', 'G', or 'T'.

Approach Overview

Problem Overview: Given a DNA string containing characters A, C, G, and T, find every 10-letter-long sequence that appears more than once. The challenge is efficiently detecting repeated substrings of fixed length without comparing every possible pair.

Approach 1: Sliding Window with Hash Map (O(n) time, O(n) space)

Use a sliding window of length 10 and scan the string from left to right. For each index, extract the substring s[i:i+10] and store it in a hash table. The hash map tracks how many times each sequence appears. When a sequence’s count becomes 2, add it to the result list. Each substring lookup and update is an average O(1) hash operation, so the overall scan stays linear.

This approach is straightforward and easy to implement in most languages. The key insight is that the window size is fixed, so each position generates exactly one candidate substring. Hashing avoids expensive pairwise comparisons between sequences. The downside is memory usage: storing many 10-character strings increases space consumption, especially when the DNA string is large.

Approach 2: Bit Manipulation with Rolling Encoding (O(n) time, O(n) space)

DNA characters come from a set of four symbols, which means each can be encoded using 2 bits. Map A=00, C=01, G=10, and T=11. A 10-character sequence therefore fits into a 20-bit integer. Instead of storing substrings, maintain a rolling integer hash. Shift the current value left by 2 bits, add the new character’s encoding, and mask the result to keep only the latest 20 bits.

This technique acts like a compact bit manipulation rolling hash. Store each encoded sequence in a hash set to track whether it has appeared before. When the same 20-bit pattern appears again, record the corresponding substring as a repeated sequence. The rolling update makes each step O(1) while reducing memory compared to storing full strings.

Recommended for interviews: Start by explaining the sliding window with a hash map. It clearly demonstrates understanding of substring scanning and hash-based duplicate detection. After that, propose the bit manipulation optimization. Interviewers often look for this encoding trick because it reduces memory and demonstrates knowledge of rolling hashes and compact representations.

Approach 1: Sliding Window with Hash Map

This approach utilizes a sliding window technique combined with a hash map (or dictionary) to identify 10-letter-long sequences that appear more than once. You slide the window across the string and keep track of all sequences in a hash map. If a sequence is found more than once, it's added to the result list.

This Python solution uses a set to track seen sequences and another set to track repeated sequences. It iterates over the string and extracts each 10-letter sequence using slicing. If a sequence has been seen before, it's added to the 'repeated' set.

Code

Python JavaScript

Python

JavaScript

Complexity

Time Complexity: O(n), where n is the length of the string, due to the single pass over the string.
Space Complexity: O(n), as we store sequences in sets.

Try this approach in the editor →

Approach 2: Bit Manipulation

This approach utilizes bit manipulation to represent sequences as numbers, which can help optimize space usage when compared to storing strings directly. Each nucleotide is encoded as 2 bits ('A' = 00, 'C' = 01, 'G' = 10, 'T' = 11), allowing the entire 10-letter sequence to fit within 20 bits.

This C solution uses bit manipulation to encode 10-letter sequences as 20-bit integers. It uses arrays to track sequences seen and repeated. This allows for efficient membership checking and space usage.

Code

C C++

C++

Complexity

Time Complexity: O(n), as each character in the string is processed once.
Space Complexity: O(4^10), due to fixed encoding size for sequences.

Try this approach in the editor →

Approach 3: Hash Table

We define a hash table cnt to store the occurrence count of all substrings of length 10.

We iterate through all substrings of length 10 in the string s. For the current substring t, we update its count in the hash table. If the count of t is 2, we add it to the answer.

After the iteration, we return the answer array.

The time complexity is O(n times 10), and the space complexity is O(n times 10). Here, n is the length of the string s.

Code

Python Java C++Go TypeScript Rust JavaScript C#

Python

Java

C++

TypeScript

Rust

JavaScript

Try this approach in the editor →

Approach 4: Rabin-Karp String Matching Algorithm

This method essentially combines sliding window and hash. Similar to 0028. Find the Index of the First Occurrence in a String, this problem can use a hash function to reduce the time complexity of counting subsequences to O(1).

The time complexity is O(n), and the space complexity is O(n). Here, n is the length of the string s.

Code

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Sliding Window with Hash Map	Time Complexity: O(n), where n is the length of the string, due to the single pass over the string. Space Complexity: O(n), as we store sequences in sets.
Bit Manipulation	Time Complexity: O(n), as each character in the string is processed once. Space Complexity: O(4^10), due to fixed encoding size for sequences.
Hash Table	—
Rabin-Karp String Matching Algorithm	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Sliding Window with Hash Map	O(n)	O(n)	Best for readability and quick implementation during interviews
Bit Manipulation Rolling Hash	O(n)	O(n)	When optimizing memory and demonstrating advanced hashing techniques

Video Solution

Repeated DNA Sequences - Leetcode 187 - Python • NeetCode • 43,860 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Repeated DNA Sequences easy or hard?

Repeated DNA Sequences is considered a medium difficulty problem. The basic sliding window and hashing solution is straightforward, but recognizing the bit manipulation optimization with 2-bit encoding requires deeper understanding of hashing and memory-efficient representations.

Repeated DNA Sequences Python/Java solution

Python and Java implementations usually rely on a sliding window and a hash map or set to store 10-character substrings. Each iteration extracts the substring, checks if it has appeared before, and records duplicates. The logic runs in linear time and is easy to implement in high-level languages.

How to solve Repeated DNA Sequences in O(n)?

Scan the string using a fixed-size sliding window of length 10. Store each sequence in a hash set or hash map and track duplicates. A more optimized version encodes each DNA character with 2 bits and maintains a rolling 20-bit hash, allowing constant-time updates while scanning the string once.

What is the best approach for Repeated DNA Sequences?

The most practical approach uses a sliding window of length 10 with a hash map to track seen substrings. Each window substring is inserted into the hash table and duplicates are recorded when the count reaches two. This method runs in O(n) time with O(n) space and is simple to explain in interviews.

Is Repeated DNA Sequences asked at Google/Amazon/Meta?

Repeated DNA Sequences is a common medium-level string and hashing problem seen in interviews at large tech companies. Variants involving substring hashing, rolling hashes, and duplicate detection appear in interviews at companies like Amazon and Google.

What data structure is used in Repeated DNA Sequences?

Hash tables or hash sets are the primary data structures used to track previously seen sequences. In optimized solutions, bit manipulation combined with a rolling hash encodes sequences into integers while still using a hash set for duplicate detection.

What is the time complexity of Repeated DNA Sequences?

The optimal solutions run in O(n) time where n is the length of the DNA string. Each position is processed once using a sliding window, and hash lookups are constant time on average. Space complexity is O(n) due to storing seen sequences or encoded hashes.

Ready to solve this problem?

Practice Repeated DNA Sequences with our built-in code editor and test cases.

Practice on FleetCode

Two Sum

Longest Substring Without Repeating Characters

Problem Info

DifficultyMedium

Acceptance53.0%

Approaches4

Reading time12 min

Asked at

Amazon Microsoft Meta Google LinkedIn

Practice this problem

Open in Editor

Repeated DNA Sequences - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Sliding Window with Hash Map

Code

Complexity

Approach 2: Bit Manipulation

Code

Complexity

Approach 3: Hash Table

Code

Approach 4: Rabin-Karp String Matching Algorithm

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Repeated DNA Sequences - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Sliding Window with Hash Map

Code

Complexity

Approach 2: Bit Manipulation

Code

Complexity

Approach 3: Hash Table

Code

Approach 4: Rabin-Karp String Matching Algorithm

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Sliding Window with Hash Map

Code

Complexity

Approach 2: Bit Manipulation

Code

Complexity

Approach 3: Hash Table

Code

Approach 4: Rabin-Karp String Matching Algorithm

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Sliding Window with Hash Map

Code

Complexity

Approach 2: Bit Manipulation

Code

Complexity

Approach 3: Hash Table

Code

Approach 4: Rabin-Karp String Matching Algorithm

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents