A gene string can be represented by an 8-character long string, with choices from 'A'
, 'C'
, 'G'
, and 'T'
.
Suppose we need to investigate a mutation from a gene string startGene
to a gene string endGene
where one mutation is defined as one single character changed in the gene string.
"AACCGGTT" --> "AACCGGTA"
is one mutation.There is also a gene bank bank
that records all the valid gene mutations. A gene must be in bank
to make it a valid gene string.
Given the two gene strings startGene
and endGene
and the gene bank bank
, return the minimum number of mutations needed to mutate from startGene
to endGene
. If there is no such a mutation, return -1
.
Note that the starting point is assumed to be valid, so it might not be included in the bank.
Example 1:
Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"] Output: 1
Example 2:
Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"] Output: 2
Constraints:
0 <= bank.length <= 10
startGene.length == endGene.length == bank[i].length == 8
startGene
, endGene
, and bank[i]
consist of only the characters ['A', 'C', 'G', 'T']
.This problem asks for the minimum number of mutations required to transform a starting gene string into an ending gene string, given a bank of valid gene strings. Each mutation involves changing a single character in the gene string.
The most efficient approach is using Breadth-First Search (BFS). BFS guarantees finding the shortest path (minimum mutations) in an unweighted graph. We can represent the problem as a graph where:
startGene
and endGene
) is a node.The algorithm proceeds as follows:
Initialization:
q
to store gene strings to be processed. Initially, it contains the startGene
along with its mutation distance (0).visited
to track visited gene strings, preventing cycles and redundant exploration. Initially, it contains the startGene
.Iteration:
currentGene
and its mutation distance distance
.currentGene
equals endGene
, return distance
.bank
of valid gene strings:
neighborGene
:
currentGene
and neighborGene
.neighborGene
hasn't been visited:
neighborGene
with a distance of distance + 1
.neighborGene
as visited.No Path:
endGene
, it means no valid mutation path exists. Return -1.Time Complexity: O(N * L), where N is the number of genes in the bank and L is the length of the genes (8 in this case). In the worst case, we might explore all genes in the bank, and for each gene, we compare it with other genes (at most N times). The character comparison takes constant time (O(L)).
Space Complexity: O(N), dominated by the visited
set which stores at most all the genes from the bank. The queue can also hold at most all the genes in the worst case.
The code implementations below use BFS as described above. They differ slightly in syntax but follow the same core logic.
Python:
from collections import deque
def minMutation(startGene, endGene, bank):
q = deque([(startGene, 0)])
visited = {startGene}
while q:
gene, dist = q.popleft()
if gene == endGene:
return dist
for nextGene in bank:
diff = sum(a != b for a, b in zip(gene, nextGene))
if diff == 1 and nextGene not in visited:
q.append((nextGene, dist + 1))
visited.add(nextGene)
return -1
Java:
import java.util.*;
class Solution {
public int minMutation(String startGene, String endGene, String[] bank) {
Queue<Pair<String, Integer>> q = new LinkedList<>();
q.offer(new Pair<>(startGene, 0));
Set<String> visited = new HashSet<>();
visited.add(startGene);
while (!q.isEmpty()) {
Pair<String, Integer> current = q.poll();
String gene = current.getKey();
int dist = current.getValue();
if (gene.equals(endGene)) return dist;
for (String nextGene : bank) {
if (!visited.contains(nextGene) && diff(gene, nextGene) == 1) {
q.offer(new Pair<>(nextGene, dist + 1));
visited.add(nextGene);
}
}
}
return -1;
}
private int diff(String s1, String s2) {
int count = 0;
for (int i = 0; i < s1.length(); i++) {
if (s1.charAt(i) != s2.charAt(i)) count++;
}
return count;
}
private static class Pair<K, V> {
private K key;
private V value;
public Pair(K key, V value){
this.key = key;
this.value = value;
}
public K getKey(){
return key;
}
public V getValue(){
return value;
}
}
}
//Other languages (C++, Go, TypeScript, Rust) would follow a similar structure, using their respective queue and set data structures. The core algorithm remains the same.