2012年7月10日下午学术报告信息

报告题目:Mini-workshop: next generation sequencing data analysis

报告人

Dr Wei Chen, Group leader, Senior Scientist, Berlin Institute for Medical Systems Biology; Max-Delbrueck-Center for Molecular Medicine

Dr Haiyan Huang, Associate Professor, Department of Statistics, UC Berkeley

报告时间:2012-07-10 13:30-16:15

报告地点:FIT大楼1-312

主办单位:清华大学自动化系/清华信息科学与技术国家实验室

简介:

Dr Wei Chen, 13:30-14:45, Dissection of genetic disorder by using next generation sequencing - a case study

Abstract: The recent introduction of massively parallel sequencing technology has revolutionized the research in medical genomics. These so-called next generation sequencing platforms, such as Roche/454, Illumina/solexa and ABI/Solid system can sequence DNA orders of magnitude faster and at much lower cost than conventional Sanger method. With their incredible sequencing capacity, in additional to genomic DNA sequencing, a variety of functional genomic assays based on this new generation of sequencers have been developed. In this talk, I will use one case to demonstrate how we applied the technology in dissecting the genetic disorders, from mutation detection to gene functional characterization.

Dr Haiyan Huang, 15:00-16:15, Sparse linear modeling of RNA-seq data for isoform discovery and abundance estimation

Abstract: Since the inception of next-generation mRNA sequencing (RNA-Seq) technology, various attempts have been made to utilize RNA-Seq data in assembling full-length mRNA isoforms de novo and estimating abundance of isoforms. However, for genes with more than a few exons, the problem tends to be challenging and often involves identifiability issues in statistical modeling. We have developed a statistical method called “sparse linear modeling of RNA-Seq data for isoform discovery and abundance estimation” (SLIDE) that takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. SLIDE is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with deterministic isoform assembly algorithms (e.g., Cufflinks), SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data such as RACE, CAGE, and EST into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation.