BMI 731/IBGP 731 Biomedical
Data Management
WINTER QUARTER
2004
(http://columbus.bmi.ohio-state.edu/~umit/bmi731/bmi731.html)
Umit Catalyurek, PhD, Department of Biomedical Informatics e-mail: catalyurek.1@osu.edu
Joel
H. Saltz, M.D., Ph.D., Department of Biomedical Informatics
Ilya Ioshikhes, PhD, Department of Biomedical Informatics
Daniel Janies, PhD, Department of Biomedical Informatics
Catalin Barbacioru, PhD, Department of Biomedical Informatics
Tuesday & Thursday
Location: 3167 Graves Hall
Tentative
|
Date |
Lecturer |
Topic |
|
|
Daniel Janies |
|
|
|
Catalin Barbacioru |
|
|
|
Catalin Barbacioru |
|
|
|
Daniel Janies |
Sequence Databases |
|
|
Ilya Ioshikes |
|
|
|
Daniel Janies |
Tree-based Alignment |
|
|
Daniel Janies |
Phylogenetics |
|
|
Ilya Ioshikes |
|
|
|
Catalin Barbacioru |
|
|
|
Catalin Barbacioru |
|
|
|
Ilya Ioshikes |
|
|
|
Joel Saltz |
Translational Research |
|
|
Ilya Ioshikes |
RLGS |
|
|
Joel Saltz |
Radiology Imaging |
|
|
Joel Saltz |
DCE-MRI |
|
|
Joel Saltz |
Software systems: Data Warehouse |
|
|
Umit Catalyurek |
|
|
|
Umit Catalyurek |
|
|
|
Umit Catalyurek |
|
|
|
Umit Catalyurek |
Sequence Compression [1] & Indexing and Filtering for Similarity Search [2] |
|
|
Exam Week |
|
|
|
Exam Week |
Project presentations |
Class
participation 20%
Homework 30%
Term project 50%
The table (HW1Table.xls)
contains genotypic information about 4 (biallelic) SNPs (loci) in 927 samples. Missing data is denoted by NA
(make sure you treat it appropriately)!!! The questions you have to answer are:
a) Are these SNPs in HW equilibrium?
b) Test linkage disequilibrium for AA haplotypes
for all 6 possible combinations of 2 SNPs.
1. Find protein homologs
to the protein query sequence (alpha-fetoprotein precursor from Homo Sapiens)
using BLASTP program (http://www.ncbi.nlm.nih.gov/).
2. Find DNA homologs
to the same query sequence using TBLASTN.
3. Perform a global pairwise
sequence alignment of the protein query sequence from 1. with
the top homolog found in 1. using GAP (SeqWeb program
from http://gene.med.ohio-state.edu/gcg-bin/seqweb.cgi)
4. Find the segments of best similarity
between top DNA homologs found in 2. using local optimal sequence alignment program BestFit of SeqWeb.
5. Make multiple sequence alignment of the
top protein homologs found in 1. using
progressive pairwise alignments (program PileUp of SeqWeb).
6. Find a consensus sequence based on
results obtained in 5. using PRETTY program of SeqWeb.
7. Using programs for pair-wise and multiple
sequence alignment and database searches, perform a comparative analysis of
similarity of conserved segments in promoters and coding regions of genes, most
similar to a gene encoding entire protein with a given fragment:
>query
MAKNTAIGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVALNPQNTVFDAKRLIGRKFGDAVVQSDMKHWPFQVVNDGDKPKVQVNYKGESRSFFPEEISSMVLTKMKEIAEAYLGHPVT
Input Sequences: since you all use the same account, the sequences you added stay together, which is confusing. One way to solve this is, for each alignment operation, put all input sequences (in FASTA format, including header line) to be aligned into one txt file (with at least one empty line between each sequence), and then use Add From Local File to load the sequences. This should lead to the newly added sequences be highlighted in the box. Don't change or click anything in the box, and directly click Run to run the alignment. Also pay attention to the sequence type (DNA or peptide) and choose the right service in the menu.
For each of the seven exercises:
1,2.: The top four sequences (better from different species) in the search result, in FASTA format. Better in one txt file.
3.-6.: The alignment result, better in html format.
7.: The alignment results, better in html format. And analysis report.
Pick 7 sequences of 50S ribosomal protein L1 from different organisms using NCBI Web-site (http://www.ncbi.nlm.nih.gov). Perform multiple sequence alignment of the sequences using following programs: 3w (http://3w.molgen.mpg.de/3w.html), DCA (http://bibiserv.techfak.uni-bielefeld.de/dca/), ClustalW (http://www.ebi.ac.uk/clustalw/), and Dialign (http://bibiserv.techfak.uni-bielefeld.de/dialign/). Try to analyze the results obtained by each program and explain the differences.
Email the results by each program (preferably in HTML format) and analysis report (in Word format) to ioschikhes-1@medctr.osu.edu by March 9th.
Sample project topics: