EDUCATION

Ph.D. in Computer Science August 2008

Washington University in St. Louis, MO

Thesis: Designing Filtration Strategies for Fast Sequence Annotation

Advisor: Dr. Jeremy Buhler

Advisory committee: Dr. Jeremy Buhler, Dr. Gary Stormo, Dr. Michael Brent, Dr. Sally Goldman, Dr. Tao Ju, and Dr. Nan Lin.

M.S. and B.S. in Computer Science,

Xi’an JiaoTong University, Xi’an, China

Awards & Peer Recognitions

Faculty Early Career Development (CAREER) Award, National Science Foundation, 2010.

Publications

JOURNAL PAPERS

Jiao Chen, Yingchao Zhao, and Yanni Sun, "De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding", Bioinformatics, 2018. IF: 7.307.

Daewoo Pak, Nan Du, Yunsoon Kim, Yanni Sun, and Zachary F. Burton, "Rooted tRNAomes and evolution of the genetic code", Transcription, 9 (3): 137-151, 2017

Jiao Chen, DongXiao Zhu, and Yanni Sun, "Cap-seq reveals complicated miRNA transcriptional mechanisms in C. elegans and mouse", Quantitative Biology, 5 (4): 352-367, 2017.

Prapaporn Techa-Angkoon, Yanni Sun, and Jikai Lei, "A sensitive short read homology search tool for paired-end read sequencing data", BMC Bioinformatics, 18 (Suppl 12): 414. 2017. IF: 3.450

Nan Du and Yanni Sun, "Improve homology search sensitivity of PacBio data by correcting frameshifts", Bioinformatics 2016,32 (17): i529-i537

Jikai Lei and Yanni Sun, "Assemble CRISPRs from metagenomic sequencing data", Bioinformatics, 2016, 32 (17): i520-i528

Laura Kirby, Yanni Sun, David Judah, Scooter Nowak, Donna Koslowsky, Analysis of the Trypanosoma brucei EATRO 164 Bloodstream Guide RNA Transcriptome, PLOS Neglected Tropical Deceases 2016

Qiong Wang, Jordan A Fish, Mariah Gilman, Yanni Sun, Titus Brown, James M Tiedje, James R Cole, "Xander: Employing a novel method for efficient gene-targeted metagenomic assembly", Microbiome, 2015

Rujira Achawayantakun, Jiao Chen, Yanni Sun, and Yuan Zhang, "LncRNA-ID: Long non-coding RNA IDentification using balanced random forests", Bioinformatics, 2015

Mingyu Shao, Yanni Sun, and Shuigeng Zhou, "Identifying TF-MiRNA Regulatory Relationships Using Multiple Features", PLOS ONE, 2015

Cheng Yuan, Jikai Lei, James R. Cole, and Yanni Sun. "Reconstructing 16S rRNA genes in metagenomic data", Bioinformatics (special issue of ISMB 2015)

Yuan Zhang, Yanni Sun, and James Cole, "A Scalable and Accurate Targeted gene Assembly tool (SAT-Assembler) for next-generation sequencing data.", PLOS Computational Biology, 2014

Jikai Lei and Yanni Sun, "miR-PREFeR: an accurate, fast, and easy-to-use plant miRNA prediction tool using small RNA-Seq data", Bioinformatics, 2014

Campbell M, Law M, Holt C, Stein J, Moghe G, Hufnagel D, Lei J, Achawanantakun R, Jiao D, Lawrence C, Ware D, Shiu SH, Childs K, Sun Y, Jiang N, Yandell M., “MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations.” Plant Physiol. 2013 Dec 6.

Roy M, Kim N, Kim K, Chung WH, Achawanantakun R, Sun Y, Wayne R. “Analysis of the canine brain transcriptome with an emphasis on the hypothalamus and cerebral cortex.” Mamm Genome, 24(11-12):484-99. 2013, doi: 10.1007/s00335-013-9480-0

Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. “Ribosomal Database Project: data and tools for high throughput rRNA analysis.” Nucleic Acids Res. Jan 1;42(1):D633-42, 2014, doi: 10.1093/nar/gkt1244

Cheng Yuan and Yanni Sun, "RNA-CODE: a noncoding RNA Classification tOol for short reaDs in NGS data lacking rEference genomes", PLOS ONE, 8(10):e77596, 2013

Jordan A. Fish, Benli Chai, Qiong Wang, Yanni Sun, C. Titus Brown, James M. Tiedje, and James R. Cole, "FunGene: the functional gene pipeline and repository", Front. Microbiol., 01 October 2013, doi: 10.3389/fmicb.2013.00291

Donna J. Koslowsky, Yanni Sun, Jordan Hindenach, Terence Theisen, Jasmine Lucas, "The Insect-phase gRNA Transcriptome in Trypanosoma brucei", Nucleic Acids Research, accepted, 2013

Qiong Wang, John F. Quensen III, Jordan A. Fish, Tae Kwon Lee, Yanni Sun, James M. Tiedje, James R. Cole, "Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool", mBio, accepted, 2013

Yuan Zhang, Yanni Sun, and James R. Cole, “A Sensitive and Accurate protein domain cLassification Tool (SALT) for short reads.” Bioinformatics, June 2013, 9 pages, doi:10.1093/bioinformatics/btt357.

Rujira Achawanantakun and Yanni Sun. "Shape and secondary structure prediction for ncRNAs including pseudoknots based on linear SVM." BMC Bioinformatics, 2013.

Cheng Yuan and Yanni Sun. "Efficient known ncRNA search including pseudoknots." BMC Bioinformatics, 2013.

A. Vieler, G. Wu, et. al. "Genome, functional gene annotation, and nuclear transformation of the heterokont oleaginous alga Nannchloropsis oceanica CCMP1779", Plos Genetics, 8(11), 25 pages, 2012.

Jikai Lei, Prapaporn Techa-Angkoon, Yanni Sun, "Chain-RNA: a comparative ncRNA search tool based on the two-dimensional chain algorithm", IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012. Impact factor: 2.25

Yanni Sun, Osama Aljawad, Jikai Lei, and Alex Liu, "Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy", BMC Bioinformatics, 13(Suppl 3):S12, 13 pages, 2012

Yanni Sun, Jeremy Buhler, Cheng Yuan, "Designing Filters for Fast Known NcRNA Identification." IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011. Impact factor: 2.25

Rujira Achawanantakun, Yanni Sun, Seyedeh Shohreh Takyar, "Using a novel secondary structure representation for consensus ncRNA secondary structure derivation." Journal of Bioinformatics and Computational biology, 9(2): 317-337, 2011. Impact factor: 1.0

Yuan Zhang and Yanni Sun, "HMM-FRAME: accurate protein domain classification for metagenomic sequences in the presence of frameshift errors." BMC Bioinformatics, 12:198-213, 2011. Imact Factor: 3.43

Yanni Sun and Jeremy Buhler, "Designing patterns and profiles for profile HMM search." IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(2):232-43, 2009 Apr-Jun.

Yanni Sun and Jeremy Buhler. "Designing patterns for profile HMM search", Bioinformatics 23:e36-43, 2007, special issue of ECCB 06. Impact Factor: 4.926. Citations: 9

Yanni Sun and Jeremy Buhler, "Choosing the best heuristic for seeded alignment of DNA sequences." BMC Bioinformatics 7:133, 2006. Impact Factor: 3.43. Citations: 17

Jeremy Buhler, Uri Keich, and Yanni Sun, "Designing seeds for similarity search in genomic DNA." Journal of Computing and Systems Science 70:342-363, 2005. Citations: 42

Yanni Sun and Jeremy Buhler, "Designing multiple simultaneous seeds for DNA similarity search." Journal of Computational Biology 12:847-861, 2005. Impactor Factor: 1.694. Citations: 41

Mei Zheng and Yanni Sun, "Fast rendering of multiple Iso-potential surfaces for 3-D physical fields." Journal of Xi'an JiaoTong Universtiy, China, October 1999.

CONFERENCE PAPERS

Nan Du and Yanni Sun, "Improve homology search sensitivity of PacBio data by correcting frameshifts", Proceeding of ECCB 2016, the Hague, Netherlands, September 4, 2016

Jikai Lei and Yanni Sun, "Assemble CRISPRs from metagenomic sequencing data", Proceeding of ECCB 2016, the Hague, Netherlands, September 4, 2016

Prapaporn Techa-Angkoon, Yanni Sun and Jikai Lei, "Improve Short Read Homology Search using Paired-End Read Information", Proceeding of ISBRA, Minsk, Belarus, June 2016

Cheng Yuan, Jikai Lei, James R. Cole, and Yanni Sun. "Reconstructing 16S rRNA genes in metagenomic data", Proceedings of ISMB 2015, Dublin, Ireland, July 10th, 2015

Prapaporn Techa-Angkoon and Yanni Sun. "glu-RNA: aliGn highLy strUctured ncRNAs using only sequence similarity." Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB 2013), Washington D.C., USA, 2013

Jikai Lei, Prapaporn Techa-Angkoon, and Yanni Sun. " ChainKnot: a comparative H-type pseudoknot prediction tool using multiple ab initio folding tool." Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB 2013), Washington D.C., USA, 2013

Rujira Achawanantakun and Yanni Sun. "Shape and secondary structure prediction for ncRNAs including pseudoknots based on linear SVM." Proceedings of the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013), Vancouver, Canada, 2013.

Cheng Yuan and Yanni Sun. "Efficient known ncRNA search including pseudoknots." Proceedings of the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013), Vancouver, Canada, 2013.

Yuan Zhang and Yanni Sun. "PseudoDomain: identification of processed pseudogenes based on protein domain classification.", Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB 2012), Orlando, FL, USA, 2012, regular paper, acceptance ratio: 19%.

Jikai Lei, Prapaporn Techa-Angkoon, and Yanni Sun. "NCRNA homology search based on an extended two-dimensional chain algorithm. " Proceedings of the Tenth Asia Pacific Bioinformatics Conference (APBC 2012), Melbourne, Australia, 2012.

Yuan Zhang and Yanni Sun. "MetaDomain: a profile HMM-based protein domain classification tool for short sequences. " Proceedings of the Pacific Symposium on Biocomputing (PSB) 2012, Big Island, HI, USA, 2012.

Osama Aljawad, Yanni Sun, Alex Liu, and Jikai Lei. "NcRNA Homology Search Using Hamming Distance Seeds. " Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB 2011), Chicago, USA, 2011, regular paper, acceptance ratio: 19%.

Rujira Achawanantakun, Seyedeh Shohreh Takyar, Yanni Sun. "Grammar String: A Novel ncRNA Secondary Structure Representation." Proceedings of the Ninth Annual International Conference on Computational Systems Bioinformatics (CSB 10), CA, USA, 2010, acceptance ratio: 22%.

Stuart King, Yanni Sun, James Cole, and Sakti Pamanik. "BLAST Tree: Fast Filtering for Genomic Sequence Classification. " 10th IEEE International Conference on "Bioinformatics and Bioengineering (BIBE-2010), Philadelphia, PA, USA, 2010.

Yanni Sun and Jeremy Buhler, "Designing Secondary Structure Profiles for Fast ncRNA Identification." Proceedings of the Seventh Annual International Conference on Computational Systems Bioinformatics (CSB 08), CA, USA, 2008, acceptance ratio: 22%.

Yanni Sun and Jeremy Buhler. "Designing patterns for profile HMM search." Proceedings of the 5th European Conference on Computational Biology (ECCB06), Eilat, Israel, acceptance ratio: 18%.

Yanni Sun and Jeremy Buhler, "Designing multiple simultaneous seeds for DNA similarity search." Proceedings of the Eighth Annual International Conference on Computational Molecular Biology (RECOMB04), 76-84, San Diego, CA USA, 2004, acceptance ratio: 18%. Citations:51

Jeremy Buhler, Uri Keich, and Yanni Sun, "Designing seeds for similarity search in genomic DNA." Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB03), 67-75, Berlin, Germany, April 2003, acceptance ratio: 20%. Citations: 99

Mei Zheng and Yanni Sun, "Volume rendering method for Iso-surface distribution in 3D field domain." Proceedings of International Conference on Advanced Manufacturing Technology, Xi'an, China, 1999.

TALKS

“Functional and composition analysis of microbial communities using next-generation sequencing data”, The Michigan Branch of the American Society of Microbiology Meeting, Grand Rapids, MI, March 24, 2018

“Analyze microbial communities to aid health and ecology studies using next-generation sequencing data”, Department of Computer Science and Engineering, TAMU, Texas, US, September, 2017

“Functional and composition analysis of microbial community using NGS data”, Department of Computing, Hang Seng Management College, Hong Kong, Feb. 13, 2017

“Gene-centric functional analysis of NGS data”, Joint CSE-BME Seminar, Chinese University of Hong Kong, Hong Kong, Dec. 20, 2016

“Gene-centric functional analysis of NGS data”, Department of Computer Science, Xi’an Jiao Tong University, Xi’an, China, Oct. 12, 2016

“Composition and functional analysis of metagenomics data”, Department of Computer Science, Nanjing University of Aeronautics and Astronautics, Nan Jing, China, June 20, 2016

“Finding CRISPRs in metagenomic data”, the 15th European Conference on Computational Biology (ECCB’16), the Hague, the Netherlands, Sept 4th, 2016

“Improving metagenomic gene identification by combining profile homology search and de novo assembly”, The European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK, July 17th, 2015

“Assembling 16S rRNA genes in metagenomic data”, the 23rd Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2015), Dublin, Ireland, ), July 10th, 2015

"Gene-centric interpretation and analysis of NGS data", Dept. of Bioengineering, UIC, March 2013

"NcRNA secondary structure representation and derivation using grammar strings", Dept. of Mathematics, MathBio Seminars, MSU, Nov. 2010

"Metagenomic Data Annotation: Phylogenetic Classification and NcRNA Identification", Michigan State University, Metagenomic Journal Club, Nov. 2009

"NcRNA Identification and Protein Homology Search with Sequencing Errors", Dept. of Plant and Soil Science, Plant Computational Biology Seminar, Dec. 2009

"Designing filtering strategies for faster ncRNA annotation", Dept. of Computer Science and Engineering, Sichuan University, China, Aug. 2009

"NcRNA search and protein domain classification", MSU Bioinformatics Symposium, Dec. 2008

"Designing Filters for Fast Protein and RNA Annotation", Dept. of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong, Mar. 2008

"Designing Secondary Structure Profiles for Fast ncRNA Identification." Proceedings of the Seventh Annual International Conference on Computational Systems Bioinformatics (CSB 08), Stanford University, CA, Aug. 2008

"Designing Secondary Structure Profiles for Fast ncRNA Identification", Dept. of Compute Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Dec. 2007

"Designing patterns for profile HMM search." the 5th European Conference on Computational Biology (ECCB06), Eilat, Israel, Jan. 2007

"Designing filters to speed up alignment programs." Colloquium, Computer Science Dept., City University of Hong Kong, Hong Kong, Dec. 2006

"Designing multiple simultaneous seeds for DNA similarity search." the Eighth Annual International Conference on Computational Molecular Biology (RECOMB04), San Diego, CA USA, 2004