Course Descriptions
BIOI601: Probability and Statistics, 3 credits. An introduction to the fundamental concepts of probability theory and statistics. The course covers the basic probabilistic concepts such as probability space, random variables and vectors, expectation, covariance, correlation, probability distribution functions, etc. Important classes of discrete and continuous random variables, their inter-relation, and relevance to applications are discussed. Conditional probabilities, the Bayes formula, and properties of jointly distributed random variables are covered. Limit theorems, which investigate the behavior of a sum of random variables, are discussed. The main concepts of random processes are then introduced. The latter part of the course concerns the basic problems of mathematical statistics of point and interval estimation and hypothesis testing. Core.
BIOI602: Principles of Data Science, 3 credits. An introduction to the data science pipeline, i.e., the end-to-end process of going from unstructured, messy data to knowledge and actionable insights. Provides a broad overview of what data science means and systems and tools commonly used for data science and illustrates the principles of data science through several case studies. Core.
BIOI603: Principles of Machine Learning, 3 credits. A broad introduction to machine learning and statistical pattern recognition. Topics include the following. Supervised learning: Bayes decision theory; discriminant functions; maximum likelihood estimation; nearest neighbor rule; linear discriminant analysis; support vector machines; neural networks; deep learning networks. Unsupervised learning: clustering; dimensionality reduction; principal component analysis; auto-encoders. The course will also discuss recent applications of machine learning, such as computer vision, data mining, autonomous navigation, and speech recognition. Core.
BIOI604: Principles of Molecular Biology, Genetics and Genomics, 3 credits. Provides a review of basic concepts in molecular biology, genetics, and genomics. Topics include the following: prokaryotic and eukaryotic genome structure and organization (including 3D architecture); Mendelian genetics, recombination, linkage and linkage disequilibrium, genome-wide association studies; review of genome projects, comparative genomics, genome variation, single nucleotide polymorphisms and genotyping; gene expression and the transcriptome, transcriptional regulation, gene regulatory networks; translation and translational regulation; proteomics approaches; integrative genomics. Core.
BIOI605: Data Sources and Data Management in Bioinformatics, 3 credits. An introduction to the different types of data generated for bioinformatics analyses and data management principles required for scientific rigor and reproducibility. Data sources include, but are not limited to, sequencing data, ‘omics data (e.g., proteomics, metabolomics, lipidomics), imaging data, and clinical data. Data organization will cover topics such as management and curation of metadata, downloading data from and submitting data to public repositories, and using databases versus spreadsheets and tables. Prerequisite: BIOI 604. Core.
BIOI606: Sequence Alignment, 3 credits. In-depth coverage of biological sequence alignment including the following: definitions, algorithms, and statistics for local, global, pairwise, and multiple alignments; scoring schemes; BLAST, BLAST variants, and similar programs; motif finding; and related topics. Prerequisites: BIOI 601 and BIOI 604. Core.
BIOI607: Data Structures and Algorithms for Bioinformatics, 3 credits. An introduction to the fundamental data structures and algorithms underlying many parts of Bioinformatics. Standard data structures for efficient indexing and sequence search will be covered, including the suffix array and the FM-index, as will alignment-free methods for sequence comparison. This course will also introduce the fundamental algorithms in computational phylogenomics and biological network analysis. Finally, bioinformatics-oriented applications of classic unsupervised learning algorithms (e.g., clustering and dimensionality reduction) and database techniques (e.g., sorting, selection, joining) will be examined. The focus will be both on formal understanding of computational efficiency as well as the practical applications of these concepts. Prerequisite: BIOI 604. Core.
BIOI610: Genome Annotation, 3 credits. An introduction to approaches for the structural and functional annotation of genome content. Topics covered include the following: ab initio gene/coding sequence discovery; signals and signal sensors (including regulatory sequences); non-protein coding genes and other structural features of genome sequences; similarity searches (orthologs, paralogs, xenologs); clustering of genes by sequence similarity; clusters of orthologous genes; phylogenetic classification of genes; gene ontologies, gene set enrichment analyses; NGS functional assays; integrated genomics circuits; and annotation databases. Prerequisite: BIOI 604. Core.
BIOI611: Analysis of Gene Expression Data, 3 credits. This course focuses on the analysis of transcriptomics data, and specifically on the analysis of gene and transcript-level expression. Material covered includes transcript and gene expression estimation from RNA-seq data (short and long-read), basic experimental design and statistical methods for differential expression analysis, discovery of novel transcripts via reference-guided and de novo assembly, and the analysis of single-cell gene expression data (e.g., single-cell expression quantification, dimensionality reduction, clustering, pseudotime analysis). Prerequisite: BIOI 604. Core.
BIOI621: Genome Assembly and Annotation, 3 credits. An introduction to the algorithms and tools used to reconstruct genome sequences from shotgun sequencing data and to annotate the resulting sequence. The first part of the course will cover the theoretical underpinnings of core assembly paradigms and discuss the practical use of these paradigms in the context of current sequencing technologies. Also discussed will be approaches for scaffolding the reconstructed sequences along chromosomes using mate-pair and other types of information such as mapping data. An important focus of the course will be on approaches for validating the output of sequence assemblers, also discussing the impact assembly errors can have on downstream analyses such as genome annotation and comparative analyses. The second part of the course will discuss approaches for interpreting sequence annotations in the context of a reconstructed genome, focusing on genome browsers and other visualization and analytical tools and approaches for analyzing and interpreting gene synteny information. A particular focus will be on the impact of repetitive sequences on the quality of genome assemblies and ability to effectively analyze gene synteny and to conduct comparative genomic analyses. Prerequisite: BIOI 604. Elective.
BIOI622 Metagenomics Data Analysis, 3 credits. An introduction to metagenomics, the study of sequence data derived from environmental samples without the cultivation of individual organisms. The course will provide an overview of the entire process of obtaining and analyzing metagenomic data including sample collection, DNA isolation strategies, sequencing strategies, and initial data processing. Additionally, taxonomic analysis, the determination of the identity of organisms within a metagenomic sample and the analysis of whole metagenome shotgun sequencing with metagenomic assembly and functional annotation will be discussed. Diversity metrics used to summarize the ecological structure of microbial communities in terms of richness or distance as well as the visualization of these metrics will be discussed. Finally, methods to identify features that differ between microbial communities will be reviewed. Prerequisite: BIOI 604. Elective.
BIO699 Capstone Research, 3 credits. The course provides an opportunity for a more in-depth research experience focusing on an original research project. Expected learning outcomes include that the student should be able to: design and conduct a bioinformatics or computational biology project; place the research in the context of biological problems, develop a written report and other deliverables if applicable. Elective.