Question

In order to target a specific region of genomic DNA with CRISPR, researchers must

include a guide RNA containing a 20-basepair (bp) long spacer sequence that

matches the DNA sequence at the target site. One of the possible risks of genetic

engineering methods is "off-target" editing, which occurs when a guide RNA matches

a part of the genome other than the intended target site.

(i) How many possible guide RNA sequences are there?

(ii) Estimate probability that a single site in the human genome matches a random

20-bp spacer. State all your assumptions.

(iii) After infection, HIV converts its RNA genome into DNA and inserts itself into the

human genome. Imagine you have designed a 20-bp spacer to target and deactivate

part of the HIV DNA sequence. Based on the previous answer, estimate the

probability that this sequence will have at least one off-target match somewhere in

the human X chromosome, which is 300 000 bp long (counting both strands). Note:

when P is very small (close to 0), (1-P)^n is approximately equal to (1-nP).

(iv) What would be the probability of an off-target site appearing somewhere in the

entire human genome (6 billion pairs counting both strands)?