eagle-i University of PennsylvaniaUniversity of Pennsylvania
See it in Search

Bioinformatics Facility (Wistar)

Directors: Kossenkov, Andrew., PhD; Showe, Louise C., PhD



The Bioinformatics Shared Resource continuously develops new and efficient approaches to data analysis as a response to emerging research needs. Facility functions include: statistical analyses and computational modeling for all types of high-throughput data; advanced bioinformatics tools for integrative cancer biology; and data management. Routine data analyses include large scale information datasets (omics data) generated by high-throughput technologies, which address the following areas:
Gene expression (RNA-seq, smRNA-seq, microarrays)
Gene regulation (ChIP-seq, ATAC-seq, epigenetic profiling, promoter methylation arrays)
Genome and transcriptome sequencing (alternate splicing, RNA editing, gene fusion, SNP and INDEL mutation detection, CNV)
Biomarkers (discovering markers in mRNA ,miRNA and protein expression data)
Proteomic analyses (mass spectrometry-based spectra, LCMS, DIGE, RPPA, etc.)
Pathway and network analysis
Integration of multi-platform data
Other customized data analysis projects





  • Consulting support in customized bioinformatics services ( Support service )


    - High-throughput data analysis:
    -- Next generation sequencing data analysis
    -- All microarray platforms
    -- Low-density PCR arrays (mi-RNA, pathways, custom)
    -- Proteomics data
    -- Enrichment Analysis (Ingenuity, DAVID)
    - Comprehensive analysis of complex projects that require multi-platform data integration
    - Consultation and support of experimental design and customized bioinformatics services
    - Statistical consultation and predictive model building
    - Computational support for data management, high performance computing, and custom programming
    - Centralized computation resources, including data management and collaboration tools, sequence databases, homology algorithms, and other sequence manipulation tools
    - Web-based application/database development and management
    - Training: Genome Browser, Ingenuity, DAVID
    - Grant and publication support: results, methods, figures
    - Advanced biological models and illustration

  • Custom programming ( Support service )

    "This support is provided for researchers who wish to use the software systems developed and deployed by the shared resource or develop their own software or tools. The facility provides users with basic training to set up and use the existing software system, develop new tools and web application. Consulting support is also provided to investigators who want to develop databases and workflow in their labs. Our facility staff analyze the data handling requirements of the investigator’s lab and help them choose the best software solution for their studies.

    Additionally, the bioinformatics facility employs the caBIG™ compatible products, the caBIG™ Life Science Distribution, and the caBIG™ Data Sharing and Security Framework, being developed by the NCI Center for Bioinformatics."

  • Data management ( Data maintenance service )

    "Large volumes of high-dimensional data are generated by Cancer Center shared facilities as well as other research programs such as microarray and sequencing data, tissue related data, image data, and pharmacodynamics data. The bioinformatics facility uses a combination of locally installed and public databases, and provides consulting support to design and maintain databases for various datasets, securely share data within or across Cancer Centers, store and backup data generated by the users."

  • High throughput data analysis ( Data analysis service )

    "With acquisition of a massive parallel sequencer (Illumina [SOLEXA] Genome Analyzer) and microarray platform (Illumina BeadStation) along with other high-throughput experimental technologies, the Wistar Institute's Cancer Center is now well positioned to pursue new avenues of cancer research, particularly in understanding and modeling the genomic changes in cancer development and progression. These technologies generate huge genome-wide multiple data-sets that require equally complex and sophisticated databases, and analyses tools. The bioinformatics shared facility collaborates with the Center for Systems and Computational Biology to develop integrative analytical frameworks for the analysis of the data sets generated by Wistar investigators. The bioinformatics shred facility will provide consulting and integrative data-mining support for:

    • Analyzing SOLEXA data including ChIP-seq, RNA-seq (digital gene expression), small RNA-seq, RNAs associated with RNA binding proteins, SNP genotyping, genome re-sequencing, and de-novo sequencing.

    • Analyzing microarray data including gene expression, ChIP-chip, methylation profiling, copy number variation (CNV), SNP genotyping, miRNA profiling, protein/peptide array data.

    • Analyzing proteomics data (e.g. mass spectrometry-based spectra, LCMS, DIGE).

    • Analyzing molecular screening data by working with the molecular screening facility."

  • High-performance computing ( Access service )

    "The bioinformatics shared facility consists of an ever evolving group of clusters collaboratively administered by the Center for Systems and Computational Biology. The clusters are utilized as a collective resource for serial and parallel applications that would be computationally too demanding for smaller research groups to implement. Where one researcher could purchase a small cluster in a grant and hire a system administrator to set it up, it is much more efficient to add computing power to existing infrastructure. Bioinformatics clusters are regularly used for large scale problems usch as:

    • Large Scale Sequence Alignment
    • Predictive model development
    • 3D Molecular modeling
    • Mass Spec models
    • Phylogenetic inference"

  • Software Access ( Access service )

    Access to locally supported commercial and open source bioinformatics software

  • Statistics consultation and predictive model building ( Data analysis service )

    "Typical tasks and applications (from raw data to functional analysis) such as:

    • Advice on experimental design and sample size estimation
    • Point and Confidence Interval estimation
    • Comparative data analysis such as t-test, ANOVA, SAM, Non-parametric test
    • Association studies/Contingency table analysis (e.g. chi-square test)
    • High dimensional data analysis such as repeated measurement, dimension reduction (e.g. SVD, PCA, MDS), permutation test
    • Survival analysis such as Kaplan-Meier or Cox Proportional Hazards models
    • Time series data analysis
    • Statistical modeling/Predictive modeling/Machine learning - Data mining in multivariate settings (supervised and unsupervised learning from data, Regression, Classification, Clustering, Generalized Linear Model)"


  • BeadStudio ( Software )

    "Illumina’s BeadStudio delivers high-quality software for cutting-edge data analysis and advanced visualization tools for the following applications: Genotyping, Gene Expression, and Loss of Heterozygosity (LOH)."

  • BLAST ( Software )

  • Ingenuity IPA ( Software )

    "IPA is software that helps researchers model, analyze, and understand the complex biological and chemical systems at the core of life science research.

    IPA helps you understand biology at multiple levels by integrating data from a variety of experimental platforms and providing insight into the molecular and chemical interactions, cellular phenotypes, and disease processes of your system. Even if you don’t have experimental data, you can use IPA to intelligently search the Ingenuity®Knowledge Base for information on genes, proteins, chemicals, drugs, and molecular relationships to build biological models or get up to speed in a relevant area of research. IPA provides the right biological context to facilitate informed decision-making, advance research project design, and generate new testable hypotheses."

  • NBmiRTar microRNA Target Prediction ( Software )

    "A web-based software which uses machine learning algorithm to find new microRNA genes in mammalian genomes as well as in newly sequenced species."

  • Primer Express ( Software )

    "The Primer Express® Software v3.0.1 allows you to design your own primers and probes using TaqMan® and SYBR® Green I dye chemistries for gene quantitation and allelic discrimination (SNP) real-time PCR applications. Developed specifically for use with our 7300, 7500, 7500 Fast, and 7900HT Fast Real-Time PCR Systems, Primer Express® Software provides customized application-specific documents for absolute/relative quantitation and allelic discrimination."

  • TRANSFAC ( Software )

    "TRANSFAC® is a unique knowledge-base containing published data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes. Based on its broad compilation of binding sites, positional weight matrices are derived which can be used with the included Match™ tool to search DNA sequences for predicted transcription factor binding sites. Promoter analysis of high-throughput data based on TRANSFAC® positional weight matrices is provided in the companion ExPlain™ Analysis System, while the companion Genome Trax™ provides a platform for mapping next generation sequencing variations to transcription factor binding sites characterized in TRANSFAC®.

    Transcription factors are recognized as important components of signaling cascades controlling all types of normal cellular processes as well as response to external stimulus, conditions of disease drug treatment, and more. While functional studies of transcription factors can provide indirect clues to the genes regulated by a single transcription factor under a specific set of experimental conditions, it’s only through transcription factor binding site analysis that we can (1) understand the mechanism of regulation, including coordinate regulation by multiple transcription factors acting together, and (2) effectively identify and characterize mutations that disrupt the regulatory mechanism. As there are comparatively few experimentally characterized binding sites relative to the total number of expected binding sites, the ability to reliably predict as yet uncharacterized binding sites is a critical and unparalleled tool in the quest to understand normal as well as disease processes."

Web Links:

Last updated: 2020-03-16T09:53:52.765-04:00

Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016