Loading Events
This event has passed.

Zoom link: https://helsinki.zoom.us/j/448035135

 

14.15 Erik Garrison: Graphical pangenomics in the time of COVID19

Abstract: We consider a graph-based approach to representing pangenomes, the variation graph, and demonstrate it’s application to variety of basic bioinformatic problems. We describe an approach to build variation graphs from any set of alignments, and demonstrate it as a tool to build compressed representations of viral genomes obtained during the ongoing pandemic.

CV: http://hypervolu.me/~erik/erik_garrison.html

 

15.15 Paul Medvedev: Representation of k-mer sets using spectrum-preserving string sets

Abstract: Given the popularity and elegance of k-mer based tools, finding a space-efficient way to represent a set of k-mers is important for improving the scalability of bioinformatics analyses. One popular approach is to convert the set of k-mers into the more compact set of unitigs. We generalize this approach and formulate it as the problem of finding a smallest spectrum-preserving string set (SPSS) representation. We show that this problem is equivalent to finding a smallest path cover in a compacted de Bruijn graph. Using this reduction, we prove a lower bound on the size of the optimal SPSS and propose a greedy method called UST that results in a smaller representation than unitigs and is nearly optimal with respect to our lower bound. We demonstrate the usefulness of the SPSS formulation with two applications of UST. The first one is a
compression algorithm, UST-Compress, which we show can store a set of k-mers using an order-of-magnitude less disk space than other lossless compression tools. The second one is an exact static k-mer membership index, UST-FM, which we show improves index size by 10-44% compared to other state-of-the-art low memory indices. Our tool is publicly available at: https://github.com/medvedevgroup/UST/.

Bio: Paul Medvedev is an Associate Professor in the Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology and the Director of the Center for Computational Biology and Bioinformatics at the Pennsylvania State University. His research focus is on developing computer science techniques for analysis of biological data and on answering fundamental biological questions using such methods. Prior to joining Penn State in 2012, he was a postdoc at the University of California, San Diego and a visiting scholar at the Oregon Health & Sciences University and the University of Bielefeld. He received his Ph.D. from the University of Toronto in 2010, his M.Sc. from the University of Southern Denmark in 2004, and his B.S. from the University of California, Los Angeles in 2002.