include a guide RNA containing a 20-basepair (bp) long spacer sequence that
matches the DNA sequence at the target site. One of the possible risks of genetic
engineering methods is "off-target" editing, which occurs when a guide RNA matches
a part of the genome other than the intended target site.
(i) How many possible guide RNA sequences are there?
(ii) Estimate probability that a single site in the human genome matches a random
20-bp spacer. State all your assumptions.
(iii) After infection, HIV converts its RNA genome into DNA and inserts itself into the
human genome. Imagine you have designed a 20-bp spacer to target and deactivate
part of the HIV DNA sequence. Based on the previous answer, estimate the
probability that this sequence will have at least one off-target match somewhere in
the human X chromosome, which is 300 000 bp long (counting both strands). Note:
when P is very small (close to 0), (1-P)^n is approximately equal to (1-nP).
(iv) What would be the probability of an off-target site appearing somewhere in the
entire human genome (6 billion pairs counting both strands)?