eagle-i University of PennsylvaniaUniversity of Pennsylvania
See it in Search

Wang Laboratory


Our lab focuses on Alzheimer’s disease and other neurodegenerative disorders, aging, and psychiatric disorders including autism and bipolar disorder. Ongoing projects in our lab can be divided into the following three main directions:

• Genetics and genomics of Alzheimer’s disease and other neurodegenerative disorders.
• Informatics and algorithm development for genome-scale experiments.
• Biomarker development for aging and neurodegenerative disorders.





  • DASHR ( Database )

    "The DASHR database provides the most comprehensive information to date on human small non-coding RNA (sncRNA) genes, precursor and mature sncRNA annotations, sequence, expression levels and RNA processing information across 42 normal tissues and cell types in human. The content of the database derives from integrating annotation data with curation, annotation, and computational analysis of 187 small-RNA deep sequencing datasets with over 2.5 billion reads from over 30 independent studies. DASHR contains information on over 48,000 precursor and mature sncRNA annotations in the human genome, of which 82% are expressed in one or more of the curated tissues and cell types."

  • NIAGADS Genomics Database ( Database )

    "The NIAGADS GenomicsDB annotation resource provides a simple, but powerful, workspace to explore, analyze, and discover genes, SNPs, and genomics locations of interest or with special relevance to Alzheimer’s Disease."


  • Bayesian differential allocation algorithm ( Algorithmic software component )

    The Bayesian differential allocation algorithm is a non-parametric approach "to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead."

  • BLAST ( Algorithmic software component )

    "The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families."

  • CoRAL ( Software )

    "CoRAL is a machine learning package that can predict the precursor class of small RNAs present in a high-throughput RNA-sequencing dataset. In addition to classification, it also produces information about the features that are most important for discriminating different populations of small non-coding RNAs."

  • DRAW ( Algorithmic software component )

    "DRAW stands for DNA Resequencing Analysis Workflow. DRAW automates the entire process of mapping sequence reads, various quality control steps and calling variants. We developed DRAW following Best Practice Variant Detection with the Genomic Analysis Toolkit. DRAW accepts both single-end and pair-end reads in FASTQ format from a variety of DNA-seq experiments including: Whole Genome Sequencing, Whole Exome Sequencing, and target capture sequencing."

    DRAW can be run locally or as an Amazon Machine Image. Before running DRAW locally, the following third party programs must be installed and working properly:

    BWA (Burrows-Wheeler Aligner) is available from http://bio-bwa.sourceforge.net/

    Picard and Samtools are both available from http://picard.sourceforge.net/

    GATK (Genome Analysis ToolKit) is available from http://www.broadinstitute.org/gatk/

  • DRAW+SneakPeek ( Algorithmic software suite )

    "DRAW and SneakPeek are two computer programs that we use for analyzing whole-genome and whole-exome DNA-seq experiments. The documentation is still being developed, although the software is pretty robust for production. If you are interested in using this software and have questions or if you find any bugs, please send email to Otto Valladares."

  • HAMR ( Algorithmic software component )

    "HAMR (High-throughput Annotation of Modified Ribonucleotides) is a web application that allows you to detect and classify modified nucleotides in RNA-seq data."

    "HAMR scans RNA-sequencing data for sites showing potential signatures of nucleotide modification. Simply point it to your RNA-seq data it will scan the entire transcriptome. You can also limit the analysis to particular genomic regions of interest, either by entering one or providing a BED file containing the intervals. The output is a table containing the list of sites with nucleotide patterns that deviate from expectation at a statistically significant rate."

  • HIPPIE: A High-Throughput Identification Pipeline for Promoter Interacting Enhancer elements ( Algorithmic software suite )

    "HIPPIE is the workflow for analyzing Hi-C paired-end reads in FASTQ format and predict enhancer–target gene interactions. HIPPIE streamlines the entire processing phase including reads mapping, quality control and enhancer–target gene prediction as well as characterizing the interactions."

    HIPPIE can be run locally, designed to run on a high-performance computing cluster with Oracle Grid Engine, or as an Amazon Machine Image.

  • Naive Bayes age estimation model ( Algorithmic software component )

    "… gene expression-based chronological age estimation by treating each gene as a binary classifier for whether an individual is older than a particular age threshold, and combining all such age-informative genes using a Naïve Bayes (NB) model."

  • RNAfold/RNAplot ( Algorithmic software component )

    "The RNAfold web server will predict secondary structures of single stranded RNA or DNA sequences. Current limits are 7,500 nt for partition function calculations and 10,000 nt for minimum free energy only predictions."

  • SAMTools ( Algorithmic software component )

    "SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format."

  • SAVoR ( Algorithmic software suite )

    "SAVoR is an easy-to-use web application that allows the user to visualize RNA-seq data and other genomic annotations on RNA secondary structures. SAVoR is designed to help researchers visualize sequencing data in the context of RNA secondary structures. You will find SAVoR useful if:

    • you want to see the distribution of smRNA-seq reads along a microRNA precursor
    • you want to see how a set of SNPs might impact RNA structure
    • you have data from control vs. treatment sequencing experiments and would like to see regions of enrichment along an RNA structure

    SAVoR has the following features:

    • Directly retrieve read alignments from web-accessible BAM files
    • Compute four per-nucleotide scores from read alignments:
       - Read abundance
       - Endpoint frequency
       - Log-ratio of abundance from two RNA-seq experiments
       - Frequency of sequence variants (read mismatches)
    • Accept custom genomic annotations in the UCSC BED file format
    • Predict RNA secondary structure using multiple methods, or use existing Rfam consensus structures"

  • SneakPeek ( Algorithmic software component )

    "SneakPeek is a web-based diagnostic tool for reviewing quality metrics generated by our DNA Resequencing Analysis Workflow (DRAW). We have developed a seamless interface which allows the user to access their assigned projects, generate charts, compare metrics, export data, all on the fly. SneakPeek also gives the user many viewing options when creating a data grid, from being able to select any number of flowcells in any order, to transposing the entire grid itself."

Web Links:

Last updated: 2016-02-01T10:18:48.086-05:00

Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016