Welcome to the Hongzhe Li's Statistical Genetics and Genomics Laboratory. Our lab is within the Department of Biostatistics and Epidemiology at the University of Pennsylvania Perelman School of Medicine and is conducting both methodological and collaborative research in the area of statistical genetics/genomics and metagenomics, with the goal of understanding the genetic and genomic bases of complex biological systems, including initiation and development of complex human diseases.
Working with Penn collaborators, we are currently developing methods for analysis of high-throughout genomic data. My application areas include genome-wide association studies of neuroblastoma, eQTL analysis of human heart failure data and metagenomic data analysis of human gut microbiome. In the area of statistical genomics, our recent research has focused on developing statistical and computational methods for analysis of genetic pathways and networks and novel methods for analysis of eQTL data. These collaborations have led to publications in Science, Nature, Nature Genetics, Developmental Cell, PNAS etc and have motivated many of our methodological research projects.
The focus of our methodological research is to formulate the problems in genetics and genomics as interesting statistcal problems and to develop novel statistical models and computational methods to solve these problems. We are in particuarly interested in developing high dimensional statistical methods for analysis of genomic data.
The human microbiome plays an important role in human disease and health. Identification of factors that affect the microbiome composition can provide insights into disease mechanism as well as suggest ways to modulate the microbiome composition for therapeutical purposes. Distance-based statistical tests have been applied to test the association of microbiome composition with environmental or biological covariates. The unweighted and weighted UniFrac distances are the most widely used distance measures. However, these two measures assign too much weight either to rare lineages or to most abundant lineages, which can lead to loss of power when the important composition change occurs in moderately abundant lineages.
We develop generalized UniFrac distances that extend the weighted and unweighted UniFrac distances for detecting a much wider range of biologically relevant changes.