You are here
Ph.D., The University of Texas at Arlington
Office: Science Building 287K
Research Interests: Next-Generation Sequencing Pipeline Algorithm Development; Statistical Model Development; Comparative “Omics”; Genetics of Disease; Evolutionary Genomics; Development of Food Antimicrobials.
Recent research projects are focused on development and application of bioinformatics methods/tools to analyze NGS data:
1) Identifying genetic disease-causing variants using whole-genome sequencing and Exome-Seq data
In recent years significant amounts of genomic data have been accumulated. Next-generation sequencing technologies (NGS) have revolutionized the field and the amount of data generated from the various sequencing platforms have increased dramatically. Dr. Bai is interested in developing efficient algorithms and software tools for the analysis and annotation of genetic risk variants, including structural variants in different regions of the genome. He wants to apply informatics methods to explain how selected regulatory and coding variants affect genes’ expression and disease phonotype.
Regulatory elements and coding region sequences have been thought to be major genomic content affecting the genes’ expression. This is important because mutation(s) or INDEL(s) located in the regulatory region are critical for transcription factor(s) to bind, and consequently, a downstream gene’s transcription pattern could be changed. Since coding region sequences determine the amino acid sequence pattern, coding region variants could possibly affect protein functions. Therefore, variants located in the regulatory and coding region likely play more important roles in expression than non-coding variants do. Dr. Bai is utilizing a combinatory approach (traditional bioinformatics and next-generation sequencing) to develop an integrated variant annotation and analysis model for assessing candidate genes/variants’ functionality and their potential roles in genetic diseases.
2) Identifying non-canonical short spliced regions using RNA-Seq data
In collaboration with researchers from Sanford-Burnham Medical Research Institute and University of Michigan, Dr. Bai has developed a computational method (Bai et al., 2014) called Read-Split-Walk (RSW) for detecting a special type of non-canonical splicing event (short deletion in intron region) using RNA-Seq data and applied it to ER stress-induced Ire1α heterozygous and knockout mouse embryonic fibroblast (MEF) cell lines. The RSW pipeline has been applied to RNA-Seq data from the SKBR3 human breast cancer cell line. RSW identified a large number of non-canonical spliced regions in chromosome 17, which were also reported by an independent proteomics study (Menon et al., 2013). Dr. Bai is working to integrate biological information into the current pipeline to improve the power of detection and make the tool applicable to other eukaryotic species.
Reprints for some publlications are available as PDF files. By accessing the PDF file, the user agrees to abide by all copyright laws and education fair-use regulations.
J.P. Simmer, A.S. Richardson, S. Wang, B.M. Reid, Y. Bai, Y. Hu, Y. Zhang, N. Mackman, J. C-C. Hu. 2014. “Ameloblast Transcriptome Changes from Secretory to Maturation Stages”. Connective Tissue Res. 2014 Aug;55 Suppl 1:29-32. doi:10.3109/03008207.2014.923862
Bai, Y., J. Hassler, A. Ziyar, P. Li, Z. Wright, R. Menon, G.S. Omenn, J.D. Cavalcoli, R.J. Kaufman, M.A. Sartor. 2014. “Novel bioinformatics methods for identification of genome-wide non-canonical spiced regions using RNA-Seq data”. PLoS ONE 9(7): e100864. doi:10.1371/journal.pone.0100864.
Bai, Y.* & J. Cavalcoli. 2013. “SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data”. Bioinformation 9(17). 870-872.
X. Chen, H. Wang, K. Bajaj, P. Zhang, Z. Meng, D. Ma, Y. Bai, E. Adams, A. Baines, G. Yu, M.A. Sartor, B. Zhang, Z. Yi, J. Lin, S. Young, R. Schekman, and D. Ginsburg. 2013. “SEC24A Deficiency Lowers Plasma Cholesterol through Reduced PCSK9 Secretion”. eLife. 2:e00444.
Bai, Y.*, M. Sartor, and J. Cavalcoli. 2012. “Current Status and Future Perspectives for Sequencing Livestock Genomes” (Review). Journal of Animal Science and Biotechnology. 3:8.
J.K. Bedoyan, V.M. Schaibley, W. Peng, Y. Bai, K. Mondal, A.C. Shetty, M. Durham, J.A. Micucci, A. Dhiraaj, J.M. Skidmore, J.B. Kaplan, C. Skinner, C.E. Schwartz, A. Antonellis, M.E. Zwick, J.D. Cavalcoli, J.Z. Li, D.M. Martin. 2012. “Disruption of RAB40AL function leads to MartineProbst syndrome, a rare X-linked multisystem neurodevelopmental human disorder”. Journal of Med Genet. 49:332-340.
Bai, Y., C. Casola, and E. Betrán. 2009. “Quality of regulatory elements in Drosophila retrogenes”. Genomics 93(1):83-89.
Bai, Y., C. Casola, and E. Betrán. 2008. “Evolutionary origin of regulatory regions in Drosophila”. BMC Genomics 9(1):241.
Bai, Y., C. Casola, C. Feschotte, and E. Betrán. 2007. “Comparative genomics reveals a constant rate of origination and convergent acquisition of functional retrogenes in Drosophila”. Genome Biology 8(1):R11.
Bai, Y., K. R. Coleman, C. W. Coleman and A. L. Waldroup. 2007. “Effect of Cetylpyridinium Chloride (Cecure® CPC Antimicrobial) on the Refrigerated Shelf Life of Fresh Boneless, Skinless Broiler Thigh Meat”. International Journal of Poultry Science, 6(2):91-94.
Betrán, E., Y. Bai, and M. Motiwale. 2006. “Fast protein evolution and germline expression of a Drosophila parental gene and its young retroposed paralog”. Molecular Biology and Evolution, 23(11):2191-2202.
Bai, Y. and B. P. Weems. 2006. “Another Algorithms for Computing Longest Common Increasing Subsequence for Two Random Input Sequences”. Proceedings of the 2006 International Conference on Foundations of Computer Science, pp. 81-87.
Bai, Y. and E. Betrán. 2005. “Ovary-biased, Testis-biased, and Soma-biased Paralogs in the Drosophila melanogaster Genome”. The 2nd Biotechnology and Bioinformatics Symposium (BIOT), pp. 75-78.
Bai, Y and B. P. Weems. 2005. “The Longest Common Increasing Subsequence Problem”. Proc. of the 8th Joint Conference on Information Sciences, pp. 362-366.
Bai, Y. and B. P. Weems. 2005. “Finding Longest Common Increasing Subsequence for Two Different Scenarios of Non-random Input Sequences”. Proc. of the 2005 International Conference on Foundations of Computer Science, pp. 64-70.