1Department of Biochemistry, Faculty of Medicine, Sbratha University, Sbratha, Libya
Received: February 18, 2022, Manuscript No. IPGMBR-22-11455; Editor assigned: February 21, 2022, PreQC No. IPGMBR-22-11455 (PQ); Reviewed: March 07, 2022, QC No. IPGMBR-22-11455; Revised: March 11, 2022, Manuscript No. IPGMBR-22-11455 (R); Published: March 18, 2022, DOI: 10.36648/ipgmbr.22.6.68
Citation: Alrouwab OS, Algblawi EB, Kareem MB, Aboujildah SA, Allafi MA et al. (2022) Zenobia: CODIS 13 STR Loci Allele Detection Tool. Genet Mol Biol Vol:6 No:1
Short Tandem Repeats (STRs) are one of the utmost mutable provinces in the human genome. They comprise tandem repeating DNA sequences ranging in length from two to six base pairs. Owing to their significant mutation rate, they exhibit considerable variation in pattern among populations and the capacity to be passed on from generation to generation. These loci are broadly employed in medicine, biology, and criminal investigation. They are pivotal in the genesis of a variety of genetic illnesses and have been intensively investigated in forensics, population genetics, and genetic genealogy. Although many implementations that manage STR loci are offered, the overwhelming majority of them rely primarily on the Command-Line Interface (CLI) inputs, which frequently necessitate the implementation of tools carried out in various scripting languages. Installing and launching programs through the Command Line (CL) is timeconsuming and/or unprofitable for many students and scholars. The fundamental intention of this project is to develop a cross-platform Graphical User Interface (GUI) package directed to the Combined DNA Index System (CODIS) STR analysis. Zenobia is a Java-based application considered as a step in consistently making CL-only programs available to more apprentices and researchers. In general, Zenobia's application outcomes satisfy the evaluation metrics for efficiency and time consumption. However, more genetic markers should be introduced to increase productivity of the application.
Short tandem repeats; Java; Combined DNA index system; Command line; Forensics
Revolutionary, Genetic fingerprinting is one of the emerging technologies that has drastically influenced the realm of forensic medicine and has profoundly altered forensic evidence forever . DNA fingerprinting (DNA profiling or forensic genetics are synonyms also used to designate the same methodology) provides a comparative analysis of DNA to solve legal problems that include paternity tests, the identification of individuality in criminal proceedings in which biological evidence is discovered at crime scenes, and distinguishing the victims of major disasters from the remains [2-3]. Historically, in the mid-eighties of the past century, a research team from the University of Leicester, UK, led by the founder of DNA fingerprinting, Sir Alec Jeffrey's, established the era of using DNA in forensic evidence . The microsatellite or Short Tandem Repeats (STRs) markers have been the most extensively used approach for detecting DNA profiles . They are ubiquitous throughout the DNA and reside on average 6-10 kb apart [6-7]. Attributed to their density, polymorphism, and PCR amplification, STRs were measured as reliable biomarkers for genomic mapping and genetic linkage assessment [8-9]. DNA profiling based on STR PCR amplification has the benefit of being more responsive than traditional methods. In addition, their negligible allele size (typically<300 bp) makes the STR system more likely to succeed with older or poorly preserved samples containing only degraded DNA [10-12]. It has been over four decades since the FBI Laboratory selected thirteen STR genetic markers for what is now known as the Combined DNA Index System (CODIS) [13-15]. The CODIS loci used in the US are TPOX, VWA, D3S1358, CSF1PO, FGA, TH01, D13S317, D16S539, D18S51, D5S818, D7S820, D8S1179, and D21S11 . These loci have become the conventional coinage of information exchange for verifying human identity for both judicial case studies and paternity testing due to their accessibility and utilization in the form of commercial STR kits [17-18]. Addressing profile sequence data is a struggle for many students and researchers . Despite, a wide range of programs capable of analyzing STR loci being available, all of them rely on the Command-Line Interface (CLI) commands or are not specifically directed at DNA markers used in forensic investigations. Moreover, they often rely on a set of complementary tools that are implemented in various script languages [20-23]. Some legacy applications for finding tandem repeats within a sequence include: Mreps, demonstrated by Kolpakov Roman and Gregory Kucherov (2003), it’s a sophisticated software for detecting tandem repeated structures in DNA sequences. Mreps could indeed detect all sorts of tandem repeats in a single run on an entire genomic sequence. It has a resolution setting that enables the software to detect 'fuzzy' repetitions . Marco Pellegrini and Alessio Vecchio (2010) developed TRStalker, an algorithm (christened TRStalker) with the intent of discovering Tandem Repeats (TRs) that are hard to identify, owing to their characteristic fuzziness, which is attributed to the high rates of base substitutions, insertions, and deletions . In 2010, Pokrzywa, Rafal, and Andrzej Polanski introduced the Burrows–Wheeler Tandem Repeat Searcher (BWTRS). It is an online web-based utility that scans for specific instances of tandem repeats in DNA sequences, BWTRS adopts the block-sorting compression algorithm . In this paper, we intend to provide a novel tool capable of detecting and determining the numbers of alleles of CODIS loci stored in a plain text FASTA format.
Zenobia (Figure 1) is a Java-based Graphical User Interface (GUI) tool, for CODIS 13 Alleles detection released under the GNU General Public License. The source code is freely available on GitHub.
Zenobia core dataset imported from STR base, a public dataset provided by the National Institute of Standards and Technology during September 2021. Only CODIS 13 STR markers data were chosen, namely, CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, and D21S11 (Table 1).
|Locus||Repeat motif||Repeat category||Chromosome location||Allele range|
Table 1: Common STR loci.
Case scenario and input files
A paternity dispute case based on matches of the alleles at the CODIS 13 STR loci between a child and mother and alleged father (trio cases), from the Arab Republic of Egypt in 2012 (Table 2), documented by Mr. Sherif H. El-Alfy, used to simulate and construct dummy profile files .
|STR locus||Child||Mother||Alleged father|
|D5S818||13,13||12, 13||10, 13|
|D13S317||8, 10||10, 13||8,8|
|D16S539||12, 12||12, 13||11,12|
|CSF1PO||11, 12||11,12||12, 12|
Table 2: Typing results of 13 autosomal STR loci analysis.
Since the program support only a plain text Fasta file format. To evaluate the performance of the tool we generate dummy files that contain random sequences with real allelic varia ion sequences imported from the Entrez database provided by The National Center for Biotechnology Information (NCBI) for locusspecific information (Table 3).
|Locus||Allele one||Allele two||Allele one||Allele two||Allele one||Allele two|
|Allele no.||Accession||Allele no.||Accession||Allele no.||Accession||Allele no.||Accession||Allele no.||Accession||Allele no.||Accession|
Table 3: Allele number and accession used to evaluate the performance.
The benchmarks were carried out on personal computers with intel core i5-3470, 3.20 GHz, 16.00 GB of RAM, Linux Ubuntu-20.04.3 64 bit. Zenobia was written in Java programing language using Oracle Java SE Development Kit 11 and Apache NetBeans IDE 12.1 Detection of the allelic type for each STR gene. Zenobia recruits the so-called brute force algorithm to match stored allele patterns to detect locus names and allele numbers (Figure 2).
A total of 78 alleles participated in the experiment, 61.5% of whom are representatives of a simple STR subgroup. Furthermore, 30.1% and 7.7% of candidates engaged with compound and complex STR subgroup correspondingly. The child profile's allele numbers fluctuated from 11 to 30, the mother profile's allele ranged from 8 to 29, while the alleged father allele numbers spanned from 8 to 30. The observed genotype for child profile was, D3S1358 (15,17), D5S818 (13,13), D7S820 (8,10), D8S1179 (11,12), D13S317 (8,10), D16S539 (12,12), D18S51 (16,16), D21S11 (30,30), FGA (23,24), TH01 (9,9), TPOX (8,8), VWA (18,20), CSF1PO (11,12). While mother shows, D3S1358 (15,16), D5S818 (12,13), D7S820 (10,10), D8S1179 (12,13), D13S317 (10,13), D16S539 (12,13), D18S51 (16,17), D21S11 (29,30), FGA (20,24), TH01 (8,9), TPOX (8,8), VWA (14,18), CSF1PO (11,12). Finally, the alleged father records, D3S1358 (17,18), D5S818 (10,13), D7S820 (8,10), D8S1179 (11,13), D13S317 (8,8), D16S539 (11,12), D18S51 (15,16), D21S11 (29,30), FGA (21,23), TH01 (8,9), TPOX (8,8), VWA (17,20), CSF1PO (12,12).
The purpose of this study was to develop a multi-platform, user-friendly, and open-source CODIS 13 STRs allele detector. Many methods for locating short tandem repeats over DNA sequences have been developed in response to their relevance in understanding STR loci . Some tools are out of date, and a handful of them are no longer accessible . There are, however, several programs available that operate either on the command line or as standalone web services. In this section, different tools will be surveyed for their capabilities to detect STR loci. TAREAN, a command-line, computational approach for automatically detecting satellite repeats in unassembled Next- Generation Sequencing (NGS) sequences. TAREAN is built with customized Python and R packages, to discover new satellite repeats, which were then confirmed on metaphase chromosomes using FISH with probes generated based on reconstructed monomer sequences . STRetch, a commandline tool written as python scripts directed to the analysis of STRs from Whole-Genome-Sequencing (WGS) results, was developed by Harriet, et al. (2018). TRetch seems to have a low False Discovery Rate (FDR) for deleterious STR expansions related to Mendelian disorder, It is designed for STR linked to genetic disorders . TandemTools, a python-based tool developed by Mikheenko, Alla, et al. (2020) detected Extra-Long Tandem Repeats (ETRs) . Contrarily, in comparison to other comparable programs, Zenobia adopts an entirely different approach. None of the tandem repeats detecting algorithms were implemented since the program's objective is to determine the allele number associated with each locus, not only the existence or absence of these repeats. This grants Zenobia an edge over other current programs, which are only capable of spotting tandem repetitions.
Genotyping criteria in Zenobia
Zenobia was implemented to identify readings for pre-defined CODIS 13 STR loci. For this aim, 13 distinct classes representing the major positions loci have been constructed, and each of them maintains the dataset of its alleles as described by the National Institute of Standards and Technology (Figure 3).
The brute force algorithm was used to achieve a perfect match between the alleles stored in the database, validate their appearance, and identify the precise number of the corresponding allele. It is regarded as one of the most logical choices for the string pattern-matching challenge. Simply matching the pattern in the target at consecutive positions from left to right is the focus of this method. If the comparison window fails, it shifts one letter to the right until the end of the target sequence is attained. Despite the algorithm's poor theoretical performance, our measurements show that it is one of the fastest techniques when the pattern is a short sequence.
Limitation of Zenobia
Zenobia supports only one type of file format, the so-called FASTA. Furthermore, the stored datasets do not only contain complementary sequences of the alleles.
We designed a Bioinformatics application using JAVA language version 11. It enables us in interpreting FASTA files, identify CODIS 13 loci, and determine the allelic number from a nucleotide sequence. Zenobia has done an excellent job at applying the boundary values in terms of precision and time consumption. When reading the 78 allelic profiles, no faults were encountered. However, additional STR loci are still required to be added.