Cas9 Protein

Cas9 (also known as Csn1) protein is an RNA-guided DNA endonuclease enzyme associated with the Type II CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)–Cas to generate double-strand breaks in invasive DNA during an adaptive bacterial immune response. Apart from its original function in bacterial immunity, Cas9 has been harnessed as a powerful tool for genome editing and gene regulation in many eukaryotic organisms. Because it can cleave nearly any sequence complementary to the guide RNA, Cas9 is becoming a prominent tool in the field of genome editing, besides the zinc finger nucleases and transcription activator-like effector nuclease (TALEN) proteins.

Cas9 proteins are abundant across the bacterial kingdom, but vary widely in both sequence and size. All known Cas9 enzymes contain an HNH domain that cleaves the DNA strand complementary to the guide RNA sequence (target strand), and a RuvC nuclease domain required for cleaving the noncomplementary strand (non-target strand), yielding double-strand DNA breaks (DSBs). In addition, Cas9 enzymes contain a highly conserved arginine-rich (Arg-rich) region previously suggested to mediate nucleic acid binding.

Schematic of the type II-A Spy Cas9

Fig 1. Schematic of the polypeptide sequence and domain organization for the type II-A Cas9 protein from S. pyogenes (Spy Cas9)

Cas9 Protein Contains The Following Sections

Orthogonal Cas9 Proteins

Orthogonal Cas9 proteins exhibit high diversities in amino acid composition and protein size. The Cas9 orthologs share only a few identical amino acids and all retrieved sequences have the same domain architecture with a central conserved HNH endonuclease domain and splitted RuvC / RNase H domain that are required for dsDNA cleavage. The lengths of Cas9 proteins range from 984 (Campylobacter jejuni) to 1,629 (Francisella novicida) amino acids, with typical sizes of ~1,100 or ~1,400 amino acids.

On the basis of CRISPR-Cas locus architecture and protein sequence phylogeny, orthogonal Cas9 proteins cluster into three subtypes: types II-A, II-B, and II-C. II-C contains a minimal cas operon (cas9, cas1, cas2), while II-A and II-B are characterized by an additional CRISPR-associated protein, either Cas4 (II-A) or a Csn2-like protein (II-B). Cas9 proteins found in II-A and II-C subfamilies typically contain ~1400 and ~1100 amino acids, respectively. As the signature protein of the type II CRISPR-Cas systems II, Cas9 does not show any detectable similarity to any proteins in Type I and Type III systems. It appears that Cas9 is sufficient both to generate crRNA and to cleave the target DNA.

Phylogenetic tree of the representative Cas9 orthologs from each subtype

Fig 1. Phylogenetic tree of the representative Cas9 orthologs from each subtype

So far, several representative members from each subtype of Cas9-based CRISPR systems have been implemented for genome editing in eukaryotes. Among them, orthogonal Cas9 protein (1,368 amino acids) from S. pyogenes (subtype II-A) is the most studied and commonly used Cas9 version, while Cas9 orthologs from Neisseria meningitidis Cas9 (NmeCas9, subtype II-C) and Staphylococcus aureus (SaCas9, subtype II-A) are potentially advantageous for adeno-associated virus delivery to somatic tissues for genome editing owing to considerably smaller size. Both type II-A and type II-C Cas9 proteins have been used in eukaryotic genome engineering. Smaller Cas9 proteins, encoded by more compact genes, have potential advantages for cellular delivery using vectors that have limited size, such as adeno-associated virus and lentivirus.

On the contrary, orthogonal Cas9 protein from Francisella novicida (FnCas9, subtype II-B) consists of 1,629 amino acids and is significantly larger than other Cas9 orthologs from II-A and II-C subtypes. Except as divergent lengths and sequences, Cas9 orthologs recognize distinct dual-RNA and PAM sequences for their functionality. For example, Spy Cas9 recognizes a 5'-NGG-3' PAM sequence immediately next to the 3' end of the target DNA for specific cleavage activity, whereas Sa Cas9 recognizes a 5'-NNGRRT-3' PAM and Fn-Cas9 specifically detects the 5'-NGG-3' PAM motif downstream of the target DNA.

Cas9 Protein Related References

1. Fuguo Jiang and Jennifer A. CRISPR–Cas9 Structures and Mechanisms. Annual Review of Biophysics. 2017 May 25. 46:505–29.
2. Jinek et al. Structures of Cas9 Endonucleases Reveal RNA-Mediated Conformational Activation. Science. 2014. 343(6176): 1247997–1247997.
3. Krzysztof Chylinski, et al. The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems. RNA Biology. May 2013. 10:5, 726–737.
4. Makarova et al. Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems. Biology Direct 2011, 6:38