Bioinformatics Tools and Applications for Rainbow Trout

No Thumbnail Available
Al-Tobasei, Rafet
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
Rainbow trout is one of the widely used aquaculture species for food worldwide. Due to its commercial importance, various genomic resources are available for the trout including a draft reference genome, microRNA repertoire, quantitative trait loci and single nucleotide polymorphisms (SNPs) associated with different production traits. However, many of these genomic resources still need improvement in terms of quality and quantity. The only available genome draft is not completely annotated, and lacks non-coding RNA and some protein coding genes. Similarly, majority of the previous work aimed at identification of trait-associated genetic markers were not robust due to limitation of genomic resources that were previously available.
In this study, we used genomics and transcriptomic approaches to identify missing genetic elements including long non-coding RNA and protein coding genes in the reference genome. In addition, we utilized these genomics resources to identify genes and genetic variations, especially SNPs, associated with growth and muscle quality traits in rainbow trout. In order to facilitate gene discovery and to improve the draft genome reference, we used deep transcriptome sequencing from 13 vital tissues. De novo assembly of ~1.167 billion paired-end reads from those 13 tissues identified a total of 474,524 protein coding transcripts, of them 11,843 transcripts were not previously reported in the genome reference. In order to discover long non-coding RNA repertoire, we used the same ~1.167 billion RNA sequencing reads in addition to RNA sequence data from 3 other published sources. Transcriptome assembly followed by various filtration steps identified 54,503 long non-coding RNA transcripts, which provided the first long non-coding RNA draft reference in rainbow trout. These long non-coding RNAs exhibited less sequence conservation, one exon biased structure and overall lower expression level compared to protein coding genes. The newly identified long non-coding RNAs showed differential expression in response to Flavobacterium psychrophilum infection, and their expression level strongly correlated with body bacterial load in selectively bred, resistant-, control-, and susceptible- genetic lines of rainbow trout. These findings suggest that the lncRNAs have importance roles in antibacterial immune response and disease resistance in rainbow trout. In addition, multiple bioinformatics algorithms were tested and successfully utilized to identify SNPs in protein coding genes and long non-coding RNAs that are associated with 5 important production traits: whole body weight (WBW), muscle yield, muscle crude fat content, muscle shear force (tenderness) and fillet whiteness. A total of 7,930 SNPs identified in protein coding genes and non-coding RNAs showed allelic imbalances (>2.0 as an amplification and <0.5 as loss of heterozygosity) between fish families showing contrasting phenotypes for above-mentioned traits suggesting their importance in the phenotypes. Validation of a small subset of the SNPs with allelic imbalances showed ~93% success rate of the pipelines in calling SNPs suggesting reliability of the algorithms.
This study provides new genomic resources to complement the genome annotation and facilitate functional genomics research in addition to genome-wide studies and selection in rainbow trout.
Computational Science