Department of Gynecology and Obstetrics

Current Studies:

Linear Decomposition Model (LDM)

LDM: We developed the Linear Decomposition Model (LDM) to be a workhorse for microbiome association studies.  It is a powerful, permutation-based analysis approach that we have extended to matched studies, presence-absence data, survival outcomes, and, soon, log-linear (compositional) models.  The LDM package runs in R and is freely available (see software links).  Our studies show that LDM is one of the only analysis methods that controls the false discovery rate (FDR).  (Yijuan Hu)

LOCOM, a logistic compositional regression method

LOCOM: We developed LOCOM, a logistic compositional regression method with the goal of having a method that would give unbiased inference even when data were biased.  LOCOM works because, given the current understanding of microbiome biases, odds ratios can be estimated without bias even when relative abundances cannot be.  Our studies show that LOCOM performs well in terms of FDR and size (type I error rate) even when data are subject to large biases.  We plan to update LOCOM by switching to conditional logistic regression.  (Yingtian Hu, Yijuan Hu, Mengyu He)

MIDAS

It is important to be able to generate realistic microbiome data to test new methods and compare them with existing methods.  Surprisingly, there have been very few packages for this, which seem to perform poorly or require huge computational effort.  To fill this need, we developed a microbiome data simulator as an R package.  MIDAS uses simple methods (tetrachoric correlations, Gaussian copulas) to generate data that looks like a template data set.  Systematic changes (e.g., shifts in relative abundances for cases or controls) can be easily incorporated.  The MIDAS package in R is available from github. (Mengyu He, Ni Zhao)

Beta Diversity

Many methods that distinguish between the microbial composition of two groups (e.g., case and control participants) look for shifts in the mean relative abundances of taxa, but differences in the variability about the mean relative abundance may also be important. We have discovered that the standard approach to this problem can give misleading results.  Our new approach works better and allows for a wider range of alternative hypotheses than the standard approach.  (Jiuyao Lu, Ni Zhao)

Hidden Batch Effects

Measurements of taxon relative abundances can exhibit systematic differences across ‘batches’ (sets of samples processed under slightly different conditions). Sometimes batches are identified in the data, but sometimes a batch effect may occur without being noticed.  We are developing methods to find these ‘hidden’ batch effects using quantile regression and a novel approach to matrix decomposition.  (Jiuyao Lu, Ni Zhao, Wodan Ling)

Joint Analysis of 16S and Shotgun Microbiome Data

Most shotgun metagenomics studies also conduct a 16S rRNA survey. We are developing methods that allow for simultaneous inference on taxon relative abundance data using these two data types within the LOCOM framework.  Our approach could also be used to give a combined analysis of two or more 16S data sets, as the LOCOM framework allows each dataset to have different biases. (Ye Yue, Yijuan Hu)

Maternal Mortality

Working with the Maternal and Infant Health Branch in CDC’s Division of Reproductive Health, we are working with hospital discharge and other data sources to study how Severe Maternal Morbidity (SMM) correlates with maternal mortality, with the goal of better surveillance for poor maternal outcomes. (Charlan Kroelinger)

Funding Support

NIH grant (Yijuan Hu PI)

NIH grant (Ni Zhao PI)

Echo grant (Anne Dunlop PI)