The machinery required for the CRISPR (Clustered regularly interspaced short palindromic repeats)-mediated immune response is encoded by one contiguous sequence in the prokaryotic genome, known as CRISPR locus. A peculiar short repeat were first discovered in E. coli genome by Ishino and coworkers in the 1980s. Subsequently, similar repeats were noted in a number of bacteria and archaea. Mojica and Jansen and their colleagues coined the CRISPR acronym, and characterized the CRISPR loci. CRISPR loci are hypervariable loci widely distributed in prokaryotes that provide acquired immunity against foreign genetic elements. CRISPR loci expand via iterative uptake of invasive DNA sequences into the CRISPR array during the adaptation process. CRISPR loci have been found in about 40% of bacterial and in most archaeal species with sequenced genomes.
The discovery of the structure of CRISPR locus is of great significance. A CRISPR locus is defined as an array of short direct repeats interspersed with spacer sequences. The feature of a CRISPR locus, the repeat-spacer-repeat pattern, was first discovered by sequencing a chromosomal fragment of E. coli. Two decades later, a vast number of CRISPR arrays in numerous species have been identified. A combination of computational and molecular biology approaches have shown that all CRISPR loci share a common design and are composed of four universally present elements. While a single CRISPR locus per genome is typical, finding several loci within a single species genome is not uncommon. In these cases, similar repeat sequences are often shared among the loci, but there are interesting exceptions.
Repeats: The first striking feature is a series of short sequences, termed repeats, ranging from 20-50 base pairs with a conserved sequence at the 3′ end: GAAAN, implicated in protein-binding. The repeats of one CRISPR locus are almost always identical with respect to size and sequence. However, repeats of different loci vary in sequence, length and secondary structure of their transcripts. Among different species, repeats vary from 21 to 47bp, being 32bp on average. Comprehensive studies of prokaryotic CRISPR loci have classified repeats based on their sequence and found that most bacterial repeats were palindromic, whereas most archaeal repeats were not. Related species can have similar repeat sequences, but the overall bacterial and archaeal sequence diversity of both spacers and repeats is great.
Fig 1. The structure of a CRISPR locus.
Spacers: The second feature is determined by non-identical spacer sequences of similar length that separate the repeats. Within a given locus, The spacers are uniform in length but have highly variable sequence content. Among different species, spacers are of a similar size, 20-72bp. Analyses of bacterial, archaeal and viral genome sequences have led to the understanding that the variable spacer elements are virus-derived and confer resistance to the corresponding viruses. In the case of bacteria, the spacers are taken from viruses that previously attacked the organism. They serve as a bank of memories, which enables bacteria to recognize the viruses and fight off future attacks. Once a spacer is incorporated and the virus attacks again, a portion of the CRISPR is transcribed and processed into CRISPR RNA, or "crRNA."
Leader: The third element is an adenine/thymine (A/T)-rich sequence of approximately 100-500 base pairs that flanks the CRIPSR locus and is termed the leader. The leader, the conservation region between CRISPR loci, lacks coding potential, and is always found on one side of the CRISPR in a fixed orientation. Much like the repeats themselves, leaders are up to 80% identical within a genome, but quite dissimilar among species. Initial observations indicated that new spacers are inserted near the leader sequence. More detailed analyses showed that the leader contains promoter elements and is a binding site for putative regulatory proteins, controlling expression and spacer acquisition.
Cas genes: A variable cassette of so called CRISPR-associated (cas) genes, which appear in varying orientation and order, forms the final building block of the CRISPR locus. The cas genes lie adjacent to the CRIPSR array and encode all proteins that are necessary for mediating the adaptive immune response. Cas genes encode a large group of proteins with functions ranging from nucleolytic or helicase enzymatic activity to unique RNA binding properties. Cas genes exhibit an exceptional degree of variation and add to the complexity of the system.
CRISPR loci are transcribed as a single, long RNA precursor that is processed to generate crRNAs. Transcription is generally unidirectional, and initiates at the end of the locus that contains the leader sequence. Cas proteins are necessary for the processing of the long CRISPR transcript, but the regulation of Cas genes is likely to differ according to bacterial species or strain. CRISPR-Cas mediated resistance then generally proceeds in three stages in the CRISPR-Cas systems described thus far. In the first stage, Cas proteins recognize and cleave exogenous sequences and incorporate them next to cas genes; the second stage involves the transcription of crRNAs, and the third stage involves the interference mechanism.
Aside from its defensive roles, the CRISPR loci appear to have other additional functions. For example, the regular nature of the repeats presents an opportunity for homology-driven genome rearrangements. Accordingly, many large inversions/translocations identified in two related Thermotoga species occur between CRISPR hotspots, rivaling those that occur at tRNA genes. Perhaps even more interesting is the involvement of the CRISPR system in regulating endogenous cellular processes. Numerous studies indicate that the CRISPR-Cas pathway can be adapted to additional functions, though precise mechanisms by which control of cellular behavior is exerted and if and how it is integrated with defensive roles remains to be studied.
1. Karginov and Hannon. The CRISPR system: small RNA-guided defense in bacteria and archaea. Mol Cell. 2010 January 15; 37(1): 7.
2. HORVATH et al. Diversity, Activity, and Evolution of CRISPR Loci in Streptococcus thermophilus. JOURNAL OF BACTERIOLOGY, 2008, p. 1401–1412.
3. Hidalgo-Cantabrana et al. Characterization and Exploitation of CRISPR Loci in Bifidobacterium longum. 26 September 2017; 8(1851).
4. Ajla Hrle. The Backbone of Prokaryotic Adaptive Immunity: the Cas7 Protein Family. 30 October 2014.
5. Lum et al. Global transcription of CRISPR loci in the human oral cavity. 26 September 2017; BMC Genomics. (2015) 16:401.