A gene string can be represented by an 8-character long string, with choices from 'A', 'C', 'G', and 'T'.
Suppose we need to investigate a mutation from a gene string startGene to a gene string endGene where one mutation is defined as one single character changed in the gene string.
"AACCGGTT" --> "AACCGGTA" is one mutation.There is also a gene bank bank that records all the valid gene mutations. A gene must be in bank to make it a valid gene string.
Given the two gene strings startGene and endGene and the gene bank bank, return the minimum number of mutations needed to mutate from startGene to endGene. If there is no such a mutation, return -1.
Note that the starting point is assumed to be valid, so it might not be included in the bank.
Example 1:
Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"] Output: 1
Example 2:
Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"] Output: 2
Constraints:
0 <= bank.length <= 10startGene.length == endGene.length == bank[i].length == 8startGene, endGene, and bank[i] consist of only the characters ['A', 'C', 'G', 'T'].Problem Overview: You start with a gene string and need to transform it into a target gene. Each mutation changes exactly one character, and every intermediate gene must exist in the given bank. The task is to compute the minimum number of mutations required, or return -1 if the transformation is impossible.
Approach 1: Breadth-First Search (BFS) (Time: O(N × L × 4), Space: O(N))
Treat each gene as a node in an implicit graph. Two genes are connected if they differ by exactly one character. Start a BFS from the startGene and explore all valid mutations level by level. For each position in the gene string, try replacing it with the four possible characters (A, C, G, T). If the new gene exists in the bank and has not been visited, push it into the queue. Because Breadth-First Search explores by distance, the first time you reach endGene gives the minimum mutation count.
Using a hash table or set for the bank enables O(1) membership checks while generating mutations. The gene length L is fixed (typically 8), so each state generates at most L × 4 candidates. BFS works well here because the problem asks for the shortest transformation path in an unweighted graph.
Approach 2: Bidirectional BFS (Time: O(N × L × 4), Space: O(N))
Bidirectional BFS improves practical performance by searching from both the start and end genes simultaneously. Maintain two frontier sets: one expanding from startGene and another from endGene. At each step, expand the smaller frontier by generating all one-character mutations and checking whether any mutation appears in the opposite frontier.
This approach reduces the search depth roughly by half, which significantly cuts the number of explored states when the bank is large. A hash set stores the valid genes, and visited genes are removed to avoid revisiting. The mutation generation logic is the same as in the standard BFS, still operating on the string representation of genes.
Recommended for interviews: The standard BFS solution is what most interviewers expect first because it directly models the problem as a shortest-path search in an unweighted graph. Implementing BFS with a queue and hash set demonstrates solid graph traversal fundamentals. After presenting it, mentioning Bidirectional BFS shows deeper optimization awareness and stronger problem-solving maturity.
In this approach, we utilize the Breadth-First Search algorithm to explore the mutation process. The problem can be transformed into finding the shortest path in an unweighted graph (where nodes are gene strings and edges represent valid transformations as per the gene bank).
Steps involved:
startGene.endGene is reached, return the mutation count.endGene, return -1.This C program uses a queue to perform a BFS search. It starts from the startGene and attempts to form every possible valid mutation step by step using the genes in the bank. A visited array prevents cycles.
Time Complexity: O(N * M * 4) where N is the number of steps in bank, M is length of startGene (8), and 4 is from trying each nucleotide.
Space Complexity: O(N) for the queue.
This approach aims to optimize the BFS by conducting simultaneous searches from both the start and end genes. This reduces the search space effectively.
Steps:
startGene and endGene, respectively.This C++ solution implements bidirectional BFS to find the shortest mutation path by expanding the smaller frontier set of genes in each step, improving efficiency.
Time Complexity: O(N * M * 4), with quicker terminations due to bidirectional checks.
Space Complexity: O(N) for two sets tracking forward and backward layers.
We define a queue q to store the current gene sequence and the number of changes, and a set vis to store the visited gene sequences. Initially, we add the starting gene sequence start to the queue q and the set vis.
Then, we continuously take out a gene sequence from the queue q. If this gene sequence equals the target gene sequence, we return the current number of changes. Otherwise, we iterate through the gene bank bank, calculate the difference value between the current gene sequence and the gene sequence in the gene bank. If the difference value is 1 and the gene sequence in the gene bank has not been visited, we add it to the queue q and the set vis.
If the queue q is empty, it means that the gene change cannot be completed, so we return -1.
The time complexity is O(C times n times m), and the space complexity is O(n times m). Where n and m are the lengths of the gene sequence and the gene bank respectively, and C is the size of the character set of the gene sequence. In this problem, C = 4.
| Approach | Complexity |
|---|---|
| Breadth-First Search (BFS) Approach | Time Complexity: O(N * M * 4) where N is the number of steps in bank, M is length of startGene (8), and 4 is from trying each nucleotide. Space Complexity: O(N) for the queue. |
| Bidirectional BFS Approach | Time Complexity: O(N * M * 4), with quicker terminations due to bidirectional checks. Space Complexity: O(N) for two sets tracking forward and backward layers. |
| BFS | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Breadth-First Search (BFS) | O(N × L × 4) | O(N) | General solution. Best when modeling the problem as shortest path in an unweighted graph. |
| Bidirectional BFS | O(N × L × 4) | O(N) | Prefer when the gene bank is large and you want to reduce search depth by expanding from both start and end. |
Minimum Genetic Mutation -(Asked by Twitter) : Explanation ➕ Live Coding • codestorywithMIK • 70,886 views views
Watch 9 more video solutions →Practice Minimum Genetic Mutation with our built-in code editor and test cases.
Practice on FleetCode