Watch 10 video solutions for Repeated DNA Sequences, a medium level problem involving Hash Table, String, Bit Manipulation. This walkthrough by NeetCode has 37,535 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.
"ACGAATTCCG" is a DNA sequence.When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
Input: s = "AAAAAAAAAAAAA" Output: ["AAAAAAAAAA"]
Constraints:
1 <= s.length <= 105s[i] is either 'A', 'C', 'G', or 'T'.Problem Overview: Given a DNA string containing characters A, C, G, and T, find every 10-letter-long sequence that appears more than once. The challenge is efficiently detecting repeated substrings of fixed length without comparing every possible pair.
Approach 1: Sliding Window with Hash Map (O(n) time, O(n) space)
Use a sliding window of length 10 and scan the string from left to right. For each index, extract the substring s[i:i+10] and store it in a hash table. The hash map tracks how many times each sequence appears. When a sequence’s count becomes 2, add it to the result list. Each substring lookup and update is an average O(1) hash operation, so the overall scan stays linear.
This approach is straightforward and easy to implement in most languages. The key insight is that the window size is fixed, so each position generates exactly one candidate substring. Hashing avoids expensive pairwise comparisons between sequences. The downside is memory usage: storing many 10-character strings increases space consumption, especially when the DNA string is large.
Approach 2: Bit Manipulation with Rolling Encoding (O(n) time, O(n) space)
DNA characters come from a set of four symbols, which means each can be encoded using 2 bits. Map A=00, C=01, G=10, and T=11. A 10-character sequence therefore fits into a 20-bit integer. Instead of storing substrings, maintain a rolling integer hash. Shift the current value left by 2 bits, add the new character’s encoding, and mask the result to keep only the latest 20 bits.
This technique acts like a compact bit manipulation rolling hash. Store each encoded sequence in a hash set to track whether it has appeared before. When the same 20-bit pattern appears again, record the corresponding substring as a repeated sequence. The rolling update makes each step O(1) while reducing memory compared to storing full strings.
Recommended for interviews: Start by explaining the sliding window with a hash map. It clearly demonstrates understanding of substring scanning and hash-based duplicate detection. After that, propose the bit manipulation optimization. Interviewers often look for this encoding trick because it reduces memory and demonstrates knowledge of rolling hashes and compact representations.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Sliding Window with Hash Map | O(n) | O(n) | Best for readability and quick implementation during interviews |
| Bit Manipulation Rolling Hash | O(n) | O(n) | When optimizing memory and demonstrating advanced hashing techniques |