
Sponsored
Sponsored
This approach utilizes a sliding window technique combined with a hash map (or dictionary) to identify 10-letter-long sequences that appear more than once. You slide the window across the string and keep track of all sequences in a hash map. If a sequence is found more than once, it's added to the result list.
Time Complexity: O(n), where n is the length of the string, due to the single pass over the string.
Space Complexity: O(n), as we store sequences in sets.
1function findRepeatedDnaSequences(s) {
2 const seen = new Set();
3 const repeated = new Set();
4 for (let i = 0; i <= s.length - 10; i++) {
5 const seq = s.substring(i, i + 10);
6 if (seen.has(seq)) {
7 repeated.add(seq);
8 } else {
9 seen.add(seq);
10 }
11 }
12 return Array.from(repeated);
13}This JavaScript solution is similar to the Python approach, using JavaScript's Set to store seen and repeated sequences. It iterates through the string, adds new sequences to the 'seen' set, and tracks sequences that repeat.
This approach utilizes bit manipulation to represent sequences as numbers, which can help optimize space usage when compared to storing strings directly. Each nucleotide is encoded as 2 bits ('A' = 00, 'C' = 01, 'G' = 10, 'T' = 11), allowing the entire 10-letter sequence to fit within 20 bits.
Time Complexity: O(n), as each character in the string is processed once.
Space Complexity: O(4^10), due to fixed encoding size for sequences.
1#include <vector>
2#include <string>
3#include <unordered_set>
4
5std::vector<std::string> findRepeatedDnaSequences(std::string s) {
std::unordered_set<int> seen, repeated;
std::vector<std::string> result;
int map[128] = {0};
map['A'] = 0;
map['C'] = 1;
map['G'] = 2;
map['T'] = 3;
int num = 0;
for (int i = 0; i < s.size(); ++i) {
num = ((num << 2) & 0xFFFFF) | map[s[i]];
if (i < 9) continue;
if (seen.count(num) && !repeated.count(num)) {
result.push_back(s.substr(i - 9, 10));
repeated.insert(num);
}
seen.insert(num);
}
return result;
}This C++ solution uses unordered sets to track sequences encoded as numbers. Each 10-letter sequence is converted into an integer using bit shifting and masking to ensure only 20 bits are used, allowing for efficient comparison and storage.