BMI 731/IBGP 731 Biomedical Data Management

WINTER QUARTER 2004

(http://columbus.bmi.ohio-state.edu/~umit/bmi731/bmi731.html)

 

 

Umit Catalyurek, PhD, Department of Biomedical Informatics e-mail: catalyurek.1@osu.edu

Joel H. Saltz, M.D., Ph.D., Department of Biomedical Informatics

Ilya Ioshikhes, PhD, Department of Biomedical Informatics

Daniel Janies, PhD, Department of Biomedical Informatics

Catalin Barbacioru, PhD, Department of Biomedical Informatics

 

 

Tuesday & Thursday 9:00am – 10:30am

Location: 3167 Graves Hall

 

Tentative Reading Material: http://columbus.bmi.ohio-state.edu/~umit/bmi731/reading.html

 

 

Date

Lecturer

Topic

January 6, 2004

 

Daniel Janies

Biology

January 8, 2004

 

Catalin Barbacioru

Haplotype - 1

January 13, 2004

 

Catalin Barbacioru

Haplotype - 2

January 15, 2004

 

Daniel Janies

Sequence Databases

Readings: Philips’00 and Wheeler’04

January 20, 2004

 

Ilya Ioshikes

Sequence Analysis

Readings: [1] [2] [3] [4]

January 22, 2004

 

Daniel Janies

Tree-based Alignment

January 27, 2004

 

Daniel Janies

Phylogenetics

January 29, 2004

 

Ilya Ioshikes

Gene Regulation

Readings: [1] [2] [3] [4]

February 3, 2004

 

Catalin Barbacioru

Microarray - 1

February 5, 2004

 

Catalin Barbacioru

Microarray - 2

February 10, 2004

 

Ilya Ioshikes

Mass Spectrometry

February 12, 2004

 

Joel Saltz

Translational Research

February 17, 2004

 

Ilya Ioshikes

RLGS

February 19, 2004

Joel Saltz

Radiology Imaging

February 24, 2004

 

Joel Saltz

DCE-MRI

February 26, 2004

 

Joel Saltz

Software systems: Data Warehouse

March 2, 2004

 

Umit Catalyurek

Multiple Sequence Alignment in Multi-Client Environment

March 4, 2004

 

Umit Catalyurek

Virtual Microscopy

March 9, 2004

 

Umit Catalyurek

Protein Structures and Related Database Searches

March 11, 2004

 

Umit Catalyurek

Sequence Compression [1] & Indexing and Filtering for Similarity Search [2]

March 16, 2004

Exam Week

 

March 18, 2004

Exam Week

Project presentations

 

Student Evaluation

 

Class participation               20%

Homework                          30%

Term project                       50%

 

 

Homework #1 - Due Tuesday, January 13th

            The table (HW1Table.xls) contains genotypic information about 4 (biallelic) SNPs (loci) in 927 samples. Missing data is denoted by NA (make sure you treat it appropriately)!!! The questions you have to answer are:

 

a) Are these SNPs in HW equilibrium?

b) Test linkage disequilibrium for AA haplotypes for all 6 possible combinations of 2 SNPs.

 

 

Homework #2 - Due Thursday, January 29th

1.      Find protein homologs to the protein query sequence (alpha-fetoprotein precursor from Homo Sapiens) using BLASTP program (http://www.ncbi.nlm.nih.gov/).

2.      Find DNA homologs to the same query sequence using TBLASTN.

3.      Perform a global pairwise sequence alignment of the protein query sequence from 1. with the top homolog found in 1. using GAP (SeqWeb program from http://gene.med.ohio-state.edu/gcg-bin/seqweb.cgi)

4.      Find the segments of best similarity between top DNA homologs found in 2. using local optimal sequence alignment program BestFit of SeqWeb.

5.      Make multiple sequence alignment of the top protein homologs found in 1. using progressive pairwise alignments (program PileUp of SeqWeb).

6.      Find a consensus sequence based on results obtained in 5. using PRETTY program of SeqWeb.

7.      Using programs for pair-wise and multiple sequence alignment and database searches, perform a comparative analysis of similarity of conserved segments in promoters and coding regions of genes, most similar to a gene encoding entire protein with a given fragment:

 >query

MAKNTAIGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVALNPQNTVFDAKRLIGRKFGDAVVQSDMKHWPFQVVNDGDKPKVQVNYKGESRSFFPEEISSMVLTKMKEIAEAYLGHPVT

About GCG/SeqWeb:

Input Sequences: since you all use the same account, the sequences you added stay together, which is confusing. One way to solve this is, for each alignment operation, put all input sequences (in FASTA format, including header line) to be aligned into one txt file (with at least one empty line between each sequence), and then use Add From Local File to load the sequences. This should lead to the newly added sequences be highlighted in the box. Don't change or click anything in the box, and directly click Run to run the alignment. Also pay attention to the sequence type (DNA or peptide) and choose the right service in the menu.

What to submit: email to: ioschikhes-1@medctr.osu.edu

For each of the seven exercises:

1,2.: The top four sequences (better from different species) in the search result, in FASTA format. Better in one txt file.

3.-6.: The alignment result, better in html format.

7.: The alignment results, better in html format. And analysis report.

 

 

Homework #3 - Due Tuesday, March 9th

 

Pick 7 sequences of 50S ribosomal protein L1 from different organisms using NCBI Web-site (http://www.ncbi.nlm.nih.gov). Perform multiple sequence alignment of the sequences using following programs: 3w (http://3w.molgen.mpg.de/3w.html), DCA (http://bibiserv.techfak.uni-bielefeld.de/dca/), ClustalW (http://www.ebi.ac.uk/clustalw/), and Dialign (http://bibiserv.techfak.uni-bielefeld.de/dialign/). Try to analyze the results obtained by each program and explain the differences.

Email the results by each program (preferably in HTML format) and analysis report (in Word format) to ioschikhes-1@medctr.osu.edu by March 9th.

 

 

 

Sample project topics:

  1. Image Analysis: Implement a software system that will automatically retrieve images from a PACS system (BMI will run a software PACS system on its own machines) and will execute an image analysis code, such as segmenting nuclei
  2. DICOM/PACS: Design, architect and implement a picture archival system that will run on distributed heterogeneous cluster of machines.
  3. Classification of tumors and identification of differentially expressed genes using gene expression data.
  4. Search for new elements in promoters of human genes using sequence analysis.