Penn Summer COVID-19 Update
Penn Summer staff are not onsite, but we are still available Monday through Friday from 9 a.m. - 5 p.m. by phone and online in case you need support: (215) 898-7326 or Visit, the University's dedicated coronavirus COVID-19 web page, for the latest updates.
close alert box button

Genomics and Bioinformatics

Session A: July 13 – July 23, 2020
9:30 a.m. - 12 p.m.
Vinayak Mathur

In this class we will examine the intersection of biology, computer science, and statistics, with special emphasis on the biological aspects. Advances in Next Generation Sequencing technologies have led to the production of large amounts of sequencing data and a growing need to develop researchers to analyze it. There are a variety of bioinformatics tools available for students to learn, enabling them to participate in authentic research. This course introduces these tools through hands-on training. The course will focus on using analytical methods to understand the features, functions and evolution of genomics. The overall aims of the course are for students to 1) learn the underlying theory behind bioinformatics tools for genomic analysis and to 2) develop an understanding of how the analysis of sequence data informs the study of biology. Students will get an opportunity to work with biological datasets, perform authentic research and contribute to an ongoing research project.

It is required that students bring a laptop to class.

Rough syllabus 

Day   Topic  
1 Introduction to BLAST and Sequencing Technologies
Databases and Genome Annotation
Comparative Genomics
Community Science Project and Data Visualization
Introduction to Galaxy and walk-through
FASTQ Quality Analysis
Gene Ontology Classification 
Metagenome analysis 

The course will be divided into two projects:

Day 1-5: Students work through a pipeline to identify Horizontal Gene Transfer (HGT) in bacteria and bacteriophages. After learning the necessary bioinformatics tools, student pairs will be assigned phage proteins that they will search through the database and look for instances of HGT. The data will be deposited in the Community Science Project database, developed by the Genome Solver team

Day 6-9: Students will be introduced to the GALAXY pipeline and will work with large biological datasets to perform quality analysis, Gene Ontology classifications and Metagenome analysis. 

Learning outcomes 

By the end of the course students will:

  1. Be able to diagram/explain the various types genome sequencing technologies and explain their strengths and weaknesses. 
  2. Gain facility with important general databases, focusing on those housed at the National Center for Biotechnology Information (NCBI). Students will also gain facility with the prokaryotic database at the Joint Genomic Institute (JGI)
  3. Be able to use web tools such as BLAST, MUSCLE, and MEGA6 to examine DNA and protein sequences and to explain in general terms how they work.
  4. Be able to annotate genes in terms of both structure and function.
  5. Be able to compare gene/protein sequences and draw inferences about evolutionary history.
  6. Be able to interpret metagenomic data.
  7. Gain facility in reading and interpreting primary literature.