The Diversity of The Classification of Noncoding RNAs

Amanda CG*

Department of Internal Medicine, Federal University of Parana, Curitiba- PR, Brazil

*Corresponding Author:
Amanda CG
Department of Internal Medicine
Federal University of Parana
Curitiba-PR, 80.060- 240, Brazil
Tel: +55-41-99833-0124
E-mail: amanda.ufpr@gmail.com

Received Date: January 17, 2019 Accepted Date: February 18, 2019 Published Date: February 25, 2019

Citation: Amanda CG (2019) The Diversity of The Classification of Non-coding RNAs. J Genom Gene Study Vol.2 No.1:1

Visit for more related articles at Journal of Genomics & Gene Study

Abstract

The genes that encode regulatory RNAs - known as short RNAs (sRNAs) or noncoding sRNAs (ncRNAs) - modulate physiological responses through different mechanisms, such as RNA-RNA interaction or RNA-protein interaction. These molecules are transcribed in trans and in cis relative to the targeted RNA. They are located within the protein coding regions, in the intergenic regions of the genome and show signs of promoter and terminator sequences that are generally Rhoindependent. The size of the ncRNA genes ranges from ~ 50 to ~ 500 nucleotides and several transcripts are processed by RNase with smaller end-products. These modulate the physiological responses through different mechanisms, either by RNA-RNA or RNA-protein interactions, and some of the interactions can be stabilized by the Hfq chaperone. The Riboswitches constitute another class of ncRNAs that are located in the 5’UTR region of an mRNA and induce transcriptional regulation through their molecular interactions with linkers. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) regions have been recently described in prokaryotes, which are based on repeated palindromic sequences. Each replicate consists of small segments of "spacer DNA" taken from exposures prior to the isolation of a bacteriophage virus or exogenous plasmid. CRISPR can be defined as an immune system of resistance to exogenous molecules.

Keywords

ncRNA; cis-encoded ncRNA; Trans-encoded ncRNA; Riboswitch; CRISPR

Introduction

Non-coding RNAs can be classified as short non-messenger RNAs (snmRNAs), small non-coding RNAs (ncRNAs), untranslated RNA molecules, or non-protein-encoding RNAs (npcRNAs) [1]. Short RNAs (small RNAs or sRNAs or non-coding RNAs or ncRNAs are molecules that modulate physiological responses through different mechanisms involving RNA-RNA interaction or RNAprotein interaction). Some interactions can be stabilized by the chaperone Hfq [2-4], which often occurs in the class of transencoded ncRNAs where the formation of the ncRNA:Hfq:mRNA complex may act positively or negatively on post-transcriptional regulation [5].

The most widely studied ncRNAs are the cis-encoded RNAs and the trans-encoded RNAs, the first transcripts being in cisnatural antisense relative to the targeted mRNA and the second transcripts in genomic regions that are far from the targeted mRNA [6]. Hence, only the cis-encoded RNAs have a perfect base pairing with the targets. The Riboswitches constitute another class of ncRNAs that are located in the 5’ UTR region of an mRNA, and induce transcriptional regulation through their interaction with a linker molecule [7-9]. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) regions have been recently described that are based on repeated palindromic sequences [6,10]. Each replicate consists of small segments of "spacer DNA" from exposures prior to the isolation of a bacteriophage virus or exogenous plasmid. The CRISPR can be defined as an immune system of resistance to exogenous molecules [11].

Computational tools are widely used for the prediction of ncRNAs, such as a) QRNA [12] which conducts a comparative analysis of genomes, and b) ISI [13] which analyzes conserved intergenic sequences between the genomes for the identification of ncRNAs. The RNAz 2.0 computational program [14] is employed to analyse the thermodynamic stability of the conserved RNA structure and the possible existence of a promoter and terminator in the predicted ncRNAs. The prediction tools (sRNAPredict2) [15] and SIPHT [16] evaluate information obtained from the Rhoindependent terminator promoters within the database. In addition, the large-scale experimental strategy involving cDNA sequencing (RNA-seq) has been widely used in the prediction of ncRNAs and is a very efficient method owing to its ability to confirm the gene expression of predicted ncRNAs [17-20]. The aim is to provide an understanding of the role played by this type of RNA in the regulation of bacterial metabolism.

Non-coding RNAs

Scientific and technological advances have led to the discovery of the functions of what can be regarded as the main RNAs: a) The messenger RNA (mRNA), b) Molecular information transfer during protein synthesis; c) The RNA ribosomal (rRNA) component of the structure of protein synthesis and d) The RNA transporter (tRNA) that is capable of transporting amino acids and interacting with proteins [21]. Several structural and functional studies of RNA have been carried out and allowed new classes of RNA to be described, with various functions in the Archea, Bacteria and Eukaria domains [22].

While genomics is concerned with analyzing genomes as coding sequences for mRNAs, rRNAs and tRNAs, RNomica further investigates the RNA-encoding genes that are untranslated but are functional and involved in different cellular processes [23,24]. Non-coding RNAs are involved in several cellular processes, such as the following: chromosome replication in cell division (diF), transcription regulation (6S RNA), RNA processing (mRNA), mRNA stability and translation (antisense sequence - spot42), protein stability (tmRNA) and transport (4.5S or ffs) [25]. There are ncRNAs involved in oxidative stress (oxyS), stationary phase transcription (dsrA, rprA), related to the control of plasmid copy number (RNAI, RNAIII), carbon storage (csrBC) and carbon transport (gcvB) [24].

The class of ncRNAs with regulatory function is involved in several regulatory mechanisms such as gene expression in the modulation of outer membrane surface proteins (Omp) [26]. Some ncRNAs may bind to proteins as a means of modulating their activities, as is the case of the 6S RNA that forms a complex with RNA Polymerase (RNAP) [2,24,27]. These molecules are transcribed in trans or in cis relative to target RNA [4,28]. The genes are located within the protein-coding regions, that is, in the intergenic regions of the genome and display signs of promoter and terminator sequences, that are generally Rho-independent [29-32].

The size of the ncRNAs genes ranges from ~ 50 to ~ 500 nucleotides and several transcripts are processed by RNase with fewer end-products [20,33-35]. The antisense pairing between the regulatory ncRNA and the target messenger RNA is the most common active mechanism [36,37]. This interaction occurs in small regions where there is an imperfect sequence complementarity, and can be stabilized by the Hfq chaperone protein [26,38,39].

Jager and collaborators [40] demonstrated that the interaction of ncRNA162 targeted RNA in Archaea can occur through cisencoded or trans-encoded action, within two distinct domains. As in bacteria, ncRNAs in Archaea are involved in many biological processes, such as metabolic regulation, adaptation to environmental conditions, stress response, regulation of morphology, and cellular behavior. In Methanosarcinamazei GO1, Babski et al. [41] identified ncRNAs that are aligned in the 5'UTR region of targeted mRNAs and in Sulfolobussolfataricus ncRNAs that are aligned in the 3'UTRregion of the targeted mRNAs. In the case of Pyrobaculum sp. and Haloferaxvolcanii, there is already evidence that ncRNAs can be aligned with targeted mRNAs at both the 5'UTR end and the 3'UTR end. In addition, it has been noted that the nfRNA TRF downregulates translation in response to the RBS binding site.

The mode of regulatory action taken by the ncRNAs depends on their co-localization with their targeted mRNAs. They can be classified as trans-encoded RNA, when encoded far from their mRNA targets, and as cis-encoded when located in the 5'UTR region relative to the target. The riboswitches that are mainly found in the 5’UTR region undergo conformational changes in their secondary structure, owing to the binding of a ligand, and regulate gene expression [27,34]. The mechanism of action taken by the ncRNAs varies in accordance with their function (Figures 1 and 2). In the post-transcriptional stage, the cis-encoded ncRNAs can cause mRNA degradation (Figure 1a), translation inhibition (Figure 1b), cleavage of the targeted mRNA (Figure 1c) (5'UTR Overlapping) and transcription termination (Figure 1d) [42]. The interaction of the trans-encoded ncRNA with the targeted mRNA may result in translation inhibition (Figure 2a) and degradation of mRNA caused by the negative interaction of 5'UTR that inhibits the ribosome binding site, or in the degradation triggered by the RNase pairing status (Figure 2b). The ncRNA that is transencoded with its targeted mRNA may form an inhibitory structure that could in some way block the ribosome binding site, thereby impeding translation (Figure 2c) [42]. Many cis-encoded RNAs with antisense orientation can form binary trans-encoded complexes with the targeted mRNA, which demonstrates that this is not a strict or definitive classification [43].

genomics-gene-study-genome

Figure 1: Mechanisms of cis-encoded action in which, the ncRNA in the genome is located close to the gene of its targeted mRNA. The cis-encoded ncRNAs are highlighted in yellow and their target in blue. (1a) Degradation of mRNA (1b) Inhibition of translation. (1c) Cleavage of mRNA and (1d) Transcription termination.

genomics-gene-study-encoded

Figure 2: Mechanisms of trans-encoded action in which, the ncRNA in the genome is located far from the gene of its targeted mRNA. The trans-encoded ncRNAs are highlighted in red and their target mRNA in blue. There is limited base pair complementarity. (2a) Inhibition of translation (2b) Degradation of mRNA (2c) Allows translation.

The Cis-encoded ncRNAs

There are cis-encoded ncRNAs in different bacterial species, (Gram-positive and Gram-negative) [39,43]. In E. coli, Wagner and Simons [44] described the antisense regulatory role played by RNA in the control of mRNA gene expression, phage maturation, mobile transposition and plasmid replication. In addition, they may be involved in regulating the initiation of replication, plasmid conjugation, transposition, mRNA degradation and some cell metabolism pathways. These factors continue to be the subject of investigation by different researchers [4,28,43,45].

The cis-encoded ncRNAs are encoded at the same locus as their targeted mRNA, but in the antisense -sense duplexes, and thus remain fully complementary during the interaction. The mechanism for the post-transcriptional response of gene regulation involves a high degree of sequence complementarity, and this was considered to be an indication that interaction of the Hfq protein would not be required [23,45]. However, some researchers reported that there was interference of Hfq in the cis-encoded ncRNA pairing [46-48]. In general, these ncRNAs act by complementing the mRNA ribosome-binding site, and then by inhibiting, in turn, the translation [38,45].

An example of a typical cis-encoded ncRNA is the 5'ureB of Helicobacter pylori, located at 5'- antisense to the ureB gene that makes up the ureAB operon [49,50]. This ncRNA contains 292 bp and negatively regulates ureAB operon expression by blocking the translation in the 5'portion of ureB (Figure 3). The ureAB genes of H. pylori are located in the cluster of two ureABure IEFGH operons and encode the UreA and UreB subunits of apoenzyme urease. This enzyme is essential for the survival of the H. pylori at low pH since its reaction releases NH3 and HCO3 into the environment, thus allowing homeostasis for bacterial growth [49]. The diversity of the cis-encoded ncRNAs and their regulatory roles vary in accordance with the different organisms. For example, Salmonella enteric serovar Typhimurium possesses the cis-encoded ncRNA lesR-1 where the function is to control replication in eukaryotic cells [51] and Salmonella serovar Typi possesses the cis-encoded ncRNA AmgR depending on its virulence in rats [52] and AsdA which regulates intracellular replication [53].

genomics-gene-study-probable

Figure 3: Model of the probable interaction of the Hfq protein with the ncRNA/mRNA pair.

The same diversity can be found with regard to regulatory strategies. In Escherichia coli, the base pairing between the cisencoded ncRNAGadY and its gadXW targeted mRNA, causes the cleavage of the duplex between gadX and gadW, and leads to increased levels of the gadX transcript. The GadX product acts as a transcription factor for the GadAGadB operon during the synthesis of glutamate decarboxylase and this process entails employing an acid stress deflating system in E. coli [6,42].

The SymR-SymE system in E. coli consists of two genes the cisencoded ncRNA symR and the symE gene that encode a toxic protein [4,6,38]. The increase in the cellular concentration of the SymE protein reduces the synthetic activity of the ribosomes. The cis-encoded ncRNA symR negatively regulates the expression of the symE gene through the complementarity of the ncRNA/ mRNA bases, resulting in inhibition of mRNA symE translation and the resumption of synthetic ribosomal activity [6,38,42,50].

In the Brucella abortus 2308, Peng et al. [54] identified the cisencoded ncRNA BsrH that positively regulates the expression of the hemH gene, thus providing evidence of the importance of the regulatory expression that ncRNABsrH exerts on its targeted mRNA.

The cis-encoded antisense ncRNAs are also often found in the replication mechanism of plasmids. For example, in the replication control of the ColE1 plasmid, which uses ncRNA instead of proteins to initiate replication at the site of origin, two partially complementary RNAs are transcribed from opposite strands. The larger RNA, with 250-500 nucleotides (RNAII), is transcribed from the sense strand and forms a stable hybrid with a DNA template. This hybrid is then processed by the RNAse H to produce a primer design for the DNA polymerase. The smaller RNA, which has 68- 108 nucleotides (RNAI), is transcribed from the antisense strand and is complementary to the 5 'RNAII region. It functions as a negative regulator for the formation of a primer by forming the RNAI/RNAII duplex that prevents the formation of a hybrid RNAII/ DNA template. The concentration of RNAI is proportional to the number of plasmids per cell, and thus constitutes a negative feedback loop that regulates plasmid replication in response to metabolic changes [37,55].

The Trans-encoded ncRNAs

As previously mentioned, the mode of action of trans-encoded ncRNAs differs from cis-encoded ncRNAs since it involves a limited sharing of base pairs with their targeted mRNAs [42,50,56]. This type of ncRNA is encoded in trans and may have targeted mRNAs at different locations in the genome.

The transcribed ncRNA generally requires the chaperone Hfq protein to stabilize the targeted ncRNA-RNA interaction due to imperfect base pairing and thus can prevent its eventual degradation by RNase [39,42,56,57]. The most widely studied chaperone is the Hfq protein that in E. coli interacts with 40% of the ncRNAs [6]. Schoroeder et al. [58] found that almost 50% of all bacterial species possess trans- encoded ncRNAs that require the chaperone protein Hfq, one exception being Listeria monocytogenes here most trans-encoded ncRNAs are independent of Hfq.

The interaction of Hfq with trans-encoded ncRNAs is involved in post-transcriptional regulation in several species of bacteria, and may have either a negative or positive effect on their mRNAs [59,60]. On the basis of studies of crystallography, it has been found that the structure of Hfq has a hexameric protein that is homologous to Sm proteins that have two motifs (Sm1 and Sm2) [39,54]. Link et al. [61] characterized two RNA binding sites of the Hfq in E-coli: a proximal protein that binds to the ncRNA and targeted mRNA and the other distal that binds to the poly (U) tail (Figure 3) [62].

As a result of Fluorescence Resonance Energy Transfer (FRET) studies, Peng et al. [54] found that structural models in the Hfq interact with PAPI, PNPase and RNaseE. Soper et al. [63] described three regulatory ncRNAs, DsrA, RprA, and ArcZ in E. coli that positively regulate the translation of the RpoS sigma factor when ncRNA and rpoS mRNA pairing occurs. They detected the formation of an inhibitory clamp in the 5'UTR region and demonstrated that binding to Hfq is important to ensure the stability of the RNA: RNA complex. In the negative regulation, base pairing occurs between the Ribosome Binding Sequence (RBS) and ncRNA that blocks the ribosome binding, or degradation by RNases.

In E. coli, regulation of OmpC protein expression involves the MicF ncRNA and a 5'UTR regulation of 22 antisense nucleotides to ompC mRNA [64]. This interaction also involves Hfq and results in translation inhibition. In Salmonella typhimurium is the MicC ncRNA, combined with the Hfq protein, which silences the ompD mRNA through the duplex of 12 RNA/RNA base pairs in the protein-coding region. MicC does not inhibit translation initiation in the downstream position, but accelerates RNaseE- dependent activity [65,66].

In Salmonella enterica serovar Typhimurium, the trans-encoded ncRNA IsrM, controls the pathogenic factor SPI-1 [67]. In contrast, RybB-1 and RybB-2 are linked to combine with the regulation of oxidative stress response [68]. In Salmonella enterica serovar Typi, the trans-encoded ncRNA RfrA and RfrB play a key role in the regulation of iron homeostasis [69]. In Clamydia trachomatis, the trans-encoded ncRNA IhtA acts as an inhibitor of histone Hc1 protein, whereas in Neisseria meningitidis, the trans-encoded ncRNA Nrrf regulates iron homeostasis [70-72]. The following are other examples of trans-encoded ncRNA: AbcR-1 and AbcR- 2 in Brucella abortus which are related to virulence in rats and are important for the survival of macrophages [73] and Mrc7 in the Mycobacterium tuberculosis species which is involved in the regulation of the TAT secretion system [74].

Riboswitches

Riboswitches are the structured elements of non-coding RNA that are considered to be cis-encoded elements of RNA, and are mainly located in the 5'UTR region of a targeted mRNA but less frequent at the 3 'end UTR [7,42,50]. However, Loh et al. [75] described a case of a riboswitch that controls in trans the expression of the virulence regulating PrfA protein in Listeria monocytogenes. It has the ability to control gene expression at the level of transcription and translation and enable the molecules to acquire different conformations in response to environmental signals, such as high temperatures and the binding of small molecules, such as metabolites or metal ions [9,42,76]. A wide range of riboswitches have recently been detected in prokaryotes. 2% of all the Bacillus subtilis genes are regulated by riboswitches that do not bind to intracellular metabolites, such as Flavin Mononucleotide (FMN), thymine pyrophosphate, S-adenosyl-methionine (SAM), lysine and guanine (Figure 4) [43,50,77,78].

genomics-gene-study-structural

Figure 4: Riboswitch structural arrangement and regulatory function for the binding. The aptamer region (pink) and an expression platform (yellow) on the 5’ UTR of the respective mRNA (blue), form the Riboswitch. Molecular binding has a regulatory function in the processes of transcription (a and b) and translation.

The structure of the riboswitch consists of two parts the aptamer region that serves as the binding site for a ligand and the expression platform that provides a suitable conformation for the signal transduction (Figure 4) [42,79]. The molecular binding causes conformational changes in the native riboswitch structure, which can regulate the transcription and translation processes [7]. The latter sequence of the expression platform leads to a splicing within the aptamer domain, whether or not it is bound to the linker and may possibly signal a transcription term or control the helical structure at the ribosome-binding site [7,80].

Thus, riboswitches are used to regulate the termination of mRNA transcription (attenuation) and initiate the translation [81]. During the transcription, when the ligand binds to the mRNA in the aptamer region, conformational changes occur that result in the formation of an alternative clamp (Figure 4a) [42]. This clamp acts as a transcription terminator that inhibits gene expression. The binding of the linker molecule to the alternative clamp leads to anti-termination (Figure 4b).

Moreover binding of a ligand that causes structural changes and it can leads to RBS sequestration, prevents translation (Figure 4c). In contrast, the binding of a linker may cause RBS exposure to binder and induce translation (Figure 4d).

Riboswitches may play a role in biological cell systems because of the large number of gene families involved. Corbino et al. [82] found methA motifs in Agrobacterium tumefaciens similar to the S-adenosylmethionine by decoding the riboswitch (SAM). The SAM-II riboswitch class has a great structural diversity, and low conservation, since it is able to alter the conformational structure of the mRNA [7,83-85].

The structure of the riboswitch may contain several motifs that control gene expression by detecting determined metabolite concentrations, which makes these structures promising targets for antibiotics. Suresch et al. [86] have shown that the S-adenosylmethionine riboswitch-III that is found in anaerobic bacteria is involved in the methionine and SAM biosynthetic pathway regulation process. In either the presence or absence of the ligand, S-adenosylmethionine riboswitch-III plays a dual role, which facilitates the conformational change between the partially and fully folded state, and forms a stable duplex structure, which strengthens the interactions between the Shine- Dalgarno nucleotides (SD) and anti-Shine-Dalgarno (aDS).

Perez et al. [87] employed an in vivo gene expression system to validate the cobalamin - riboswitch of the Cyneobacterium, Synechococcus sp. strain PCC 7002 and concluded that methionine biosynthesis is probably the only means of using cobalamin in this strain of cyanobacteria.

The ncRNAs That Induce Protein Activity

The first ncRNAs characterized in the E. coli enterobacterium, include 4.5S, 6S, tmRNA (RNA transfer-messenger) and Spot42 [42,88]. In E. coli, Spot42 RNA was discovered almost 40 years ago as a small and unstable RNA molecule encoded by the spf gene that is also present in Vibrionaceae, a class of γ-proteobacteria. Salmonicidal alivivium encodes a Spot42 RNA ncRNA with 84% identity compared with that of E. coli. Deletion of the spf gene results in a 25% decrease in the action of RNA polymerase I (Pol I) [89]. This ncRNA optimizes carbon dioxide uptake and metabolism by binding to the targeted mRNAs related to this metabolism, such as galactose operon galK. The ncRNA Spot42 is an example of ncRNA that acts in a traditional way through the ncRNA/mRNA interaction, but there are cases, such as the CsrB and the 6SRNA that regulate targeted protein activity and usually have specific recognition sequences [39].

The E. coli CsrB ncRNA has 22 GGA sequences that are regarded as binding sites for the CsrA protein [88]. By binding to the protein, it is able to regulate mRNA stability and translation. The CsrA protein regulates carbon storage [90] through its binding to the mRNAs and is the essential component of the Csr regulatory system. This system is responsible for a) The repression of a wide range of stationary -phase genes, b) Negatively regulating gluconeogenesis, glycogen biosynthesis and catabolism, and c) Biofilm formation. In addition, it activates the glycolytic pathway, acetate metabolism and flagellum biosynthesis. CsrA acts posttranscriptionally by repressing the gene expression of enzymes that are essential for the metabolism of carbohydrates such as ADP-glucose pyrophosphorylase (glgC), glycogen synthase (glgA), glycogen branching enzyme (glgB), and glycogen phosphorylase (glgP).

Thus, there is a decrease in the intracellular levels of biosynthetic glycogen enzymes and a reduction of glycogen synthesis [91]. CsrA destabilizes its targeted mRNAs by binding to the nucleotidecoding region of nucleotides -18 and +3, which includes (RBS), and thus preventing the translation of the mRNA and leading to its degradation by RNase. The csrB ncRNA competes directly with the targeted mRNA of CsrA and the dimeric protein CsrA recognizes the GGA motif of the structure of the nsRB ncRNA and the targeted mRNA. The CsrC and CsrB ncRNAs in E. coli modulate the activity of CsrA and the binding of the targeted mRNA to the protein [39,42]. Thus, intracellular levels of CsrA are also regulated by the CsrB and CsrC ncRNA switch, and act as antagonists to capture CsrA. In Erwinia spp, the RsmA protein that is, homologous to CsrA, regulates several genes involved in plant disease. In Pseudomonas aeruginosa, Csr (Rms) controls several systems like Rhl and several virulence factors [92].

Hindley identified 6S RNA in 1967 in Escherichia coli [25]. Since then, 6S RNA has been predicted through computational tools and experimentally analyzed in several bacterial species [93]. Wehner et al. [94] analyzed 1611 bacterial genomes and determined a set of 1,750 RNA 6S genes, 1,367 of which were new. In the Rfam database, the 6S RNA is described by two entries - the first RF00013 with 153 sequences, and the second RF01685 with 89 sequences. It has been demonstrated in E. coli that 6S RNA (the ssrS gene) can act in the transcription process by combining with the sigma 70 factor that is dependent on RNA polymerase (6S RNA:RNAP), so that it can modulate its function. This ncRNA sequestrates almost all of the RNAP holoenzyme in late stationary phase and thus assists in the transcriptional adaptation of the stationary phase of growth [87,93,94]. With regard to the position of 6S RNA in the genome, there is a synthetic pattern for the Bacteria domain and this pattern is most closely followed in Gamma-Proteobacteria. Among the common synaptic genes are the following: ygfA (5-formyltetrahydrofolate cyclo ligase), zapA (ZapA protein), AAA (+) (ATPase superfamily) and peptidase_M24 (PF00557 family of metallopeptidases) [95,96]. Gutiérrez et al. [92] identified the ZapA protein in Bacillus subtilis together with the FtsZ-tubulin protein.

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)

In 2007, Barrangou [97] and colleagues demonstrated the change in resistance to phage infection found in bacterial cells, by removing or adding spacer sequences similar to the sequences of invading phages. This strategy defined an adaptive immune defense system that uses short RNA and is called Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR). The CRISPR locus contains hundreds of spacer sequences (~ 26-72 nucleotides) in the genome, flanked by repeats of 21 to 47 nucleotides. They are usually combined with proteins (cas), that form the CRISPR system, and this confers RNA-mediated resistance to nucleic acids from, for example, bacteriophage, plasmids or mobile genetic elements [1,43,98,99]. The CRISPR locus can be found in ~ 40% of bacterial genomes and 90% of archaeal genomes [42,100,101]. Most of the prokaryotes containing CRISPR have multiple groups of 2 to 20 loci, each arranged in tandem and containing up to 100 identical repeats among 25 and 50 base pairs [101]. Cas proteins provide the enzymatic machinery required to acquire new spacers and create recognition marks of invading elements [102].

The arrangement and processing of CRISPR RNAs involves a number of phases (Figure 5) [102]:

genomics-gene-study-integrated

Figure 5: Array model of the CRISPR system\Cas. The. Adaptation: DNA phage integrated into the CRISPR matrix and formation of the new spacer. Expression: Processing of cas proteins. W. Interference: Invasion of targeted DNA and its respective degradation.

i) Adhesion: invasive DNA is integrated into the CRISPR locus, resulting in new modes of replication in the matrix (Figure 5a);

ii) Expression: transcription of the written record containing the RNA phage (new spacer) and its processing by the Cas proteins to form crRNA ( crRNA matrix formed of single spacer sequences flanked by short repeats) and adjacent coded Cas proteins (Figure 5b);

iii) Interference: this is the formation of the complex CrRNAs/Cas proteins and later, the invasion of the targeted DNA takes place and leads to its respective degradation (Figure 5c).

CRISPR may be expressed in species such as Legionella pneumophila, where sRNA is the Cas2-dependent crRNA involved in the stress response [103-105]. In Listeria monocytogenes, the RliB-CRISPR sRNA also makes use of the CRISPR\Cas system in antiviral resistance [106,107]. Francisellanovicida possesses the ncRNAs, Cas9-dependent crRNA, tracrRNA and scaRNA, with CRISPR\Cas property in the regulation of endogenous viral factors [105]. According to Marraffini [108], the strains of Streptococcus pyogenes have the CRISPR\Cas9 system and their function is to mediate the anti-phagocytic resistance mechanisms. In Campylobacter jejuni PT14, the CRISPR\Cas14 system protects against the invasion/presence of bacteriophage [109].

Amitai and Sorek [110] studied the mechanisms involved in the adaptation of CRISPR-Cas, with regard to the perception that once the exogenous DNA has been identified by the CRISPRC as a system, it is integrated into the genome of the host cell, thus creating immunological memory, which is a natural ability based on the information contained in the DNA. This kind of system must be properly manipulated in the formation of new bases of information storage in living organisms, something that has been extensively investigated in recent years.

Prediction of ncRNAs and ats Targets

Bioinformatics is devoted to designing computer programs for the handling of biological data and the identification of gene sequences, the prediction of three-dimensional protein structure, the identification of enzyme inhibitors, the system of protein grouping, the establishment of phylogenetic trees and experiments in gene expression analysis [111]. It provides the tools required for Genomics, Transcriptomics, Proteomics and Metabolomics [112].

With regard to the prediction of ncRNAs, several computational approaches have been adopted for the identification of genes in the intergenic regions of prokaryotic genomes [113,114]. Many of these are based on the search for transcriptional signals, conserved promoter sequences, rho-independent terminators, the transcription factor binding site, such as sRNAPredict [115], a putative predicting algorithm for analysis using the TransTermHP database [116,117] or TRANSFER [118]. sRNAPredict3 and SIPHT are recent computational versions for the prediction of ncRNAs in bacteria [16]. sRNAscanner and sRNAfinder [119] were designed to overcome the limitations of the predictive capacity of the available transcription signals in all the genomic sequences and proved to be efficient.

The nocoRNAc (non-coding RNA characterization) is a computational tool that was developed to study the interactions between ncRNA-mRNA in conjunction with the prediction of ncRNAs in bacterial genomes. This program uses transcription termination signals that are predicted through the TransTermHP tool; the promoters are identified by the Stress Induced Duplex Destabilization (SIDD) model and determine any possible regions that can be destabilized [120,121]. Thus, it is possible to detect the regions of the genome flanked by the promoter sequence and Rho-independent terminator sequence, which is the candidate selected to encode the ncRNAs. In an alternative approach, the Cufflinks tool locates regions of the genome that have considerable levels of transcription and free ORFs [122].

Comparative genomic analyses have been conducted to predict new ncRNAs, and conserved sequences identified for the first time in the Intergenic Region (IGR), and shared by clustering; a comparison is made with the multiple alignments that are classified as ncRNAs. Programs such as QRNA [12], ERPIN [123], ISI [13], INFERNAL.1.1.1 [124], MSARI [125] and RNAz [126] are compared in terms of their thermodynamic stability through a prediction of their conserved stable RNA structures, and ncRNAs in bacteria [14,127]. In the case of secondary structures, the RNAFold carries out a statistical analysis of the RNA folding, through the disturbance of the thermodynamic parameters for equilibrium, to assess its predictive capacity [119,128,129].

With regard to the target prediction, it is very important to develop models with the aim of integrating bioinformatics for prediction with experimental validation for the confirmation of mRNA targets. The classification of ncRNAs provides information on the complementarity of the bases (whether perfect or imperfect) with their targeted mRNAs and eventual binding to proteins, by altering their activity [2,130-133]. The targetRNA2/ targeting mRNA prediction tool is currently one of the most widely used [134]. It employs various features including a) The ability to conserve ncRNA in other bacteria, b) The secondary structure of both the ncRNA and each target candidate, and c) The hybridization energy levels between the interactions. It also has the ability to integrate data from RNA-seq material when available.

Another computational approach used to predict targeted mRNAs is the IntaRNA tool [135], which is also regarded as quite efficient and rapid in predicting the interactions between ncRNA/mRNA. It uses energy- free hybridization and is integrated with the CopraRNA tool, which predicts ncRNAs by comparing the query-sequence with available sequences in the program [135,136]. Other approaches include direct detection by means of microarrays, and northern blotting, [3,30,111,137,138].

Pnek et al. [139] have found predicted ncRNA in Streptomyces that are based on a study of sequence conservation in intergenic regions, the location of the transcription termination factor, and the genomic arrangement of syngenic genes for ncRNAs. They detected the expression of 20 ncRNAs by microarray and RT-PCR, and adopted a computational approach to determine their secondary structure; as a result, they identified 6S ncRNA. Voss et al. [140] used a cyanobacteria model to predict ncRNAs from transcriptome and proteome data and identified the Yfr2a-Yfr2c ncRNA, a conserved structure that can be found among cyanobacteria. In an attempt to predict the existence of the 5'-operon-leader, the Rho- independent 3'-transcription terminator and riboswitches, these authors used the TransTermHP, ClustalW, RNAz, and RNAfold computational tools for validation with the aid of Northern Blot [141].

Modi et al. [137] carried out a functional characterization of ncRNAs in E. coli through network inference, based on a compendium of gene expression profiles with functional prediction and on the regulatory interactions of ncRNAs. These authors experimentally validated the functions attributed to three ncRNAs, IsrA and GlmZ, involved in DNA damage response, and GcvB, involved in the regulation of amino acid availability.

Khoo and collaborators [142] integrated several computational methods that entailed the prediction and analysis of ncRNAs and identified 29 in Burkholderia pseudomallei among which [8] were believed to be new. Ignatov and colleagues [143] used material and transcriptomic analysis to reveal that Mycobacterium avium and M. tuberculosis contain different sets of ncRNAs in the intergenic regions and suggested that this characteristic may be the basis of the observed physiological differences between the two species. Schroeder et al. [20] observed that in the genus Rickettsia, ncRNAs are the main post-transcriptional regulators involved in virulence, survival, plasmid expression, primary and secondary metabolism. This means that they can presumably encode trans-encoded ncRNAs involved in the pathogen interaction and host.

Conclusion and Future Perspectives

Any knowledge of the post-transcriptional regulation that ncRNAs exert on a given targeted mRNA, are related to their position, matched in ncRNA: mRNA, and depend on which sRNA class is being worked on. There are different classes of sRNAs and it is necessary to know how these can regulate several mechanisms when by forming a regulatory network, which plays a key role in the regulatory circuit of the genome under study. There have been considerable advances in computational technology aimed at providing knowledge of the role and functionality of non-coding RNAs in the various domains of life. The contribution of noncoding RNAs in bacteria is a relatively new area of research, based predominantly on the recent findings that its expression and function are frequently studied in the interaction between plant and bacterium, mainly in the diazotrophic class, of socioeconomic importance. Experimental approaches to provide biochemical understanding in detailed macromolecular modulations induced by post-transcriptional modifications. This will undoubtedly open new avenues of research for a better prognosis of the disease and therapeutic interventions.

Funding

This study was financed in part by the Coordination of Improvement of Higher Education Personnel—Brazil (CAPES)— Finance Code 001.

References

Select your language of interest to view the total content in your interested language

Viewing options

Flyer image

Share This Article