Penn Summer | Genomics and Bioinformatics

Session:

Session A: July 13 – July 23, 2020

Time:

9:30 a.m. - 12 p.m.

Category:

Science

Instructor:

Vinayak Mathur

Description:

In this class we will examine the intersection of biology, computer science, and statistics, with special emphasis on the biological aspects. Advances in Next Generation Sequencing technologies have led to the production of large amounts of sequencing data and a growing need to develop researchers to analyze it. There are a variety of bioinformatics tools available for students to learn, enabling them to participate in authentic research. This course introduces these tools through hands-on training. The course will focus on using analytical methods to understand the features, functions and evolution of genomics. The overall aims of the course are for students to 1) learn the underlying theory behind bioinformatics tools for genomic analysis and to 2) develop an understanding of how the analysis of sequence data informs the study of biology. Students will get an opportunity to work with biological datasets, perform authentic research and contribute to an ongoing research project.

It is required that students bring a laptop to class.

Rough syllabus

Day	Topic
1	Introduction to BLAST and Sequencing Technologies
2	Databases and Genome Annotation
3	Comparative Genomics
4	Phylogenetics
5	Community Science Project and Data Visualization
6	Introduction to Galaxy and walk-through
7	FASTQ Quality Analysis
8	Gene Ontology Classification
9	Metagenome analysis

The course will be divided into two projects:

Day 1-5: Students work through a pipeline to identify Horizontal Gene Transfer (HGT) in bacteria and bacteriophages. After learning the necessary bioinformatics tools, student pairs will be assigned phage proteins that they will search through the database and look for instances of HGT. The data will be deposited in the Community Science Project database, developed by the Genome Solver team

Day 6-9: Students will be introduced to the GALAXY pipeline and will work with large biological datasets to perform quality analysis, Gene Ontology classifications and Metagenome analysis.

Learning outcomes

By the end of the course students will:

Be able to diagram/explain the various types genome sequencing technologies and explain their strengths and weaknesses.
Gain facility with important general databases, focusing on those housed at the National Center for Biotechnology Information (NCBI). Students will also gain facility with the prokaryotic database at the Joint Genomic Institute (JGI)
Be able to use web tools such as BLAST, MUSCLE, and MEGA6 to examine DNA and protein sequences and to explain in general terms how they work.
Be able to annotate genes in terms of both structure and function.
Be able to compare gene/protein sequences and draw inferences about evolutionary history.
Be able to interpret metagenomic data.
Gain facility in reading and interpreting primary literature.