A novel statistical method for handling zeros in microbiome data

Project Leader

Problem Statement

Modern sequencing technologies, such as 16S rRNA sequencing, provide a valuable approach to large-scale profiling of microbial communities. However, the sequencing data are compositional, over-dispersed, and zero-inflated due to the limitations of the sequencing technologies. There has been an extensive amount of work on how to tackle these challenges. This project focuses on how to handle zeros. The importance of handling zeros cannot be overstated because almost all different types of downstream analyses, such as network analysis, rely on the quality of imputed data.

To our knowledge, none of the existing zero-imputation methods use phylogenetic distances. The proposed project aims to fill this gap in the literature. We will first identify the sources of zeros, i.e., biological zeros or sampling zeros. We will only impute sampling zeros by borrowing information from the taxa that are phylogenetically close.

Details

  • Expected team size: 2
  • Student Experience Level: Advanced: students who have taken multiple upper-level mathematics courses
Prerequisites

Where course numbers are give, students should look for the closest equivalent course given at their home institution

Skills
  • R programming
Juxin Liu
Juxin Liu
Professor of Mathematics, University of Saskatchewan
Huokai Wu
Huokai Wu
Graduate Mentor
Mingyang Chen
Mingyang Chen
Undergraduate Student
Peixuan Chen
Peixuan Chen
Undergraduate Student
Ang Li
Ang Li
Undergraduate Student