Doruk Bozdag

Projects


Parallel short sequence mapping

With the advent of next-generation high throughput sequencing instruments, large volumes of short sequence data are generated at an unprecedented rate. Processing and analyzing these massive data requires overcoming several challenges including mapping of generated short sequences to a reference genome. Our research in this area focuses on parallelization of short sequence mapping process. We also develop analytical models to explain run time costs of various parallelization schemes in order to understand the trade-offs among them.
Related Publications

Biclustering algorithms for mining gene expression data

Biclustering is a very popular method to detect hidden co-regulation patterns among genes. We develop novel biclustering algorithms to identify subsets of genes with high correlation by reducing false negatives due to spurious or unrelated samples in microarray datasets. One of the major applications of our biclustering methods is to reveal genes and proteins important in the complex process of breast tumor formation.
Related Publications

Scalable parallel graph coloring algorithms

One of our recent projects is development of distributed memory graph coloring algorithms. In many parallel scientific computing applications coloring is used to identify independent tasks that can be performed concurrently. In such cases graph is typically distributed across processors and hence the coloring needs to be computed in parallel. We have started with distance-1 and distance-2 coloring algorithms, and currently we are investigating how we can scale this algorithms to machines with tens of thousands of processors.
Related Publications

Load balancing through parallel hypergraph partitioning

Today, many large scientific computations are carried out on large distributed memory parallel computers. One of the first steps is to compute a distribution of computation and data across the processors in a manner that achieves high parallel efficiency. This problem is known as the (static) load-balancing (a.k.a partitioning). In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load-balancing (repartitioning) of the changed computational structure is required. In collaboration with Zoltan Team of Sandia National Laboratories, we develop parallel hypergraph partitioning-based methods and tools for these load-balancing problems.
Related Publications

DAG scheduling for distributed memory systems

Directed Acyclic Graph (DAG) scheduling is a very important problem in distributed memory computing. There are three main metrics that determine the performance of a scheduling algorithm: (1) runtime complexity, (2) length of generated schedules and (3) number of processors used by the generated schedules. It is extremely difficult to simultaneously minimize these three metrics, hence there are trade-offs offered by various algorithms. We develop new DAG scheduling algorithms by carefully accounting for mentioned trade-offs to best address the needs of the applications being considered.
Related Publications

Task and data-transfer scheduling for distributed memory systems

In many fields of science, engineering, and medicine, data is being collected and generated at an increasing rate, thanks mainly to high-resolution measurements made possible by advanced sensor technologies and large scale simulations enabled by inexpensive, high-performance computing through commodity PC clusters. Hence, scientific research is increasingly becoming data driven. In most of the cases, data is stored in shared data resources and, usually parts of it, needs to be transferred to compute resource for analysis. We develop locality-aware scheduling algorithms for data-intensive tasks, and efficient data transfer mechanisms for cordinated data-transfer over wide-area networks.
Related Publications

Mobility-based information delivery in wireless sensor networks

Among numerous challenges faced while designing WSNs and protocols, maintaining connectivity and maximizing the network lifetime stand out as critical considerations. Mobile devices can also be used to address the connectivity and lifetime problems in WSNs. We develop schedule-based controlled mobility schemes to address these two questions for best effort and differentiated information delivery services.
Related Publications