eagle-i University of PennsylvaniaUniversity of Pennsylvania
See it in Search

Bioinformatics Facility (Wistar)

Director: Kossenkov, Andrew, Ph.D.


The mission of the Bioinformatics Facility is to support the continuing research and education mission of The Wistar Institute and to grow and evolve in response to emerging research needs.

The Bioinformatics Facility is located in the Center for Systems and Computational Biology, which provides a state-of-the-art server room, office space, and educational and conference room space. The Facility provides Cancer Center investigators with database management, software application support, expertise in statistical analyses and computational modeling of biomedical research data and has recently grown to include statistical specialists and programmers as well as computational biologists. The Facility is supported and advised by members of the Center for Systems and Computational Biology.

Functions of the Facility reflect the research requirements of the three Cancer Center programs and are broadly divided into three areas: (i) data-management; (ii) statistical analyses and computational modeling; and (iii) advanced bioinformatics tools for integrative cancer biology. Typical data analyses include large scale information datasets (omics data), generated by high-throughput technologies addressing the following complex area:

• Genome sequencing (alternate splicing, RNA editing, mutation detection)
• Gene regulation (ChIP-chip, ChIP-seq, epigenetic profiling, promoter methylation arrays)
• Biomarkers (e.g. mRNA and miRNA microarray expression data)
• Proteomic analyses (mass spectrometry-based spectra, LCMS, DIGE, etc.)
• Polymorphism genotyping (e.g. Single Nucleotide SNP and Copy Number variations CGH, LOH).

The Facility has placed a high priority on integrating cancer research information representing a variety of data types, including clinical data, microarray data, massively-parallel sequence data, protein data, RT-PCR and functional assays. Data security is a primary focus of the Bioinformatics Facility in designing and implementing software systems.





  • Consulting support in customized bioinformatics services ( Support service )

    "The bioinformatics shared facility works closely with the Wistar Cancer Center investigators to assist them with use of computational bioinformatics tools and methods for processing and interpretation of genomic, molecular, and proteomic data. Bioinformatics facility staffs also help investigators in integrating data processing results in their reports and proposals. The facility uses publicly available tools, database and in-house developed software for the analyses and offers consultation and training in the areas of bioinformatics, such as:

    • Sequence analysis, provide assistance with annotation of protein sequences, genes and gene regulatory regions predictions, such as promoters, transcription factor binding sites, and motifs.
    • Phylogenetic analysis.
    • Gene Ontology and Pathway analysis.
    • 3D molecular modeling, particularly homology modeling, analysis of protein structure properties such as electrostatic potential, surface area, protein-ligand docking, small molecule screening, protein-protein interaction, molecular dynamic simulation."

  • Custom programming ( Support service )

    "This support is provided for researchers who wish to use the software systems developed and deployed by the shared resource or develop their own software or tools. The facility provides users with basic training to set up and use the existing software system, develop new tools and web application. Consulting support is also provided to investigators who want to develop databases and workflow in their labs. Our facility staff analyze the data handling requirements of the investigator’s lab and help them choose the best software solution for their studies.

    Additionally, the bioinformatics facility employs the caBIG™ compatible products, the caBIG™ Life Science Distribution, and the caBIG™ Data Sharing and Security Framework, being developed by the NCI Center for Bioinformatics."

  • Data management ( Data maintenance service )

    "Large volumes of high-dimensional data are generated by Cancer Center shared facilities as well as other research programs such as microarray and sequencing data, tissue related data, image data, and pharmacodynamics data. The bioinformatics facility uses a combination of locally installed and public databases, and provides consulting support to design and maintain databases for various datasets, securely share data within or across Cancer Centers, store and backup data generated by the users."

  • High throughput data analysis ( Data analysis service )

    "With acquisition of a massive parallel sequencer (Illumina [SOLEXA] Genome Analyzer) and microarray platform (Illumina BeadStation) along with other high-throughput experimental technologies, the Wistar Institute's Cancer Center is now well positioned to pursue new avenues of cancer research, particularly in understanding and modeling the genomic changes in cancer development and progression. These technologies generate huge genome-wide multiple data-sets that require equally complex and sophisticated databases, and analyses tools. The bioinformatics shared facility collaborates with the Center for Systems and Computational Biology to develop integrative analytical frameworks for the analysis of the data sets generated by Wistar investigators. The bioinformatics shred facility will provide consulting and integrative data-mining support for:

    • Analyzing SOLEXA data including ChIP-seq, RNA-seq (digital gene expression), small RNA-seq, RNAs associated with RNA binding proteins, SNP genotyping, genome re-sequencing, and de-novo sequencing.

    • Analyzing microarray data including gene expression, ChIP-chip, methylation profiling, copy number variation (CNV), SNP genotyping, miRNA profiling, protein/peptide array data.

    • Analyzing proteomics data (e.g. mass spectrometry-based spectra, LCMS, DIGE).

    • Analyzing molecular screening data by working with the molecular screening facility."

  • High-performance computing ( Access service )

    "The bioinformatics shared facility consists of an ever evolving group of clusters collaboratively administered by the Center for Systems and Computational Biology. The clusters are utilized as a collective resource for serial and parallel applications that would be computationally too demanding for smaller research groups to implement. Where one researcher could purchase a small cluster in a grant and hire a system administrator to set it up, it is much more efficient to add computing power to existing infrastructure. Bioinformatics clusters are regularly used for large scale problems usch as:

    • Large Scale Sequence Alignment
    • Predictive model development
    • 3D Molecular modeling
    • Mass Spec models
    • Phylogenetic inference"

  • Software Access ( Access service )

    Access to locally supported commercial and open source bioinformatics software

  • Statistics consultation and predictive model building ( Data analysis service )

    "Typical tasks and applications (from raw data to functional analysis) such as:

    • Advice on experimental design and sample size estimation
    • Point and Confidence Interval estimation
    • Comparative data analysis such as t-test, ANOVA, SAM, Non-parametric test
    • Association studies/Contingency table analysis (e.g. chi-square test)
    • High dimensional data analysis such as repeated measurement, dimension reduction (e.g. SVD, PCA, MDS), permutation test
    • Survival analysis such as Kaplan-Meier or Cox Proportional Hazards models
    • Time series data analysis
    • Statistical modeling/Predictive modeling/Machine learning - Data mining in multivariate settings (supervised and unsupervised learning from data, Regression, Classification, Clustering, Generalized Linear Model)"


  • BeadStudio ( Software )

    "Illumina’s BeadStudio delivers high-quality software for cutting-edge data analysis and advanced visualization tools for the following applications: Genotyping, Gene Expression, and Loss of Heterozygosity (LOH)."

  • BLAST ( Software )

  • Ingenuity IPA ( Software )

    "IPA is software that helps researchers model, analyze, and understand the complex biological and chemical systems at the core of life science research.

    IPA helps you understand biology at multiple levels by integrating data from a variety of experimental platforms and providing insight into the molecular and chemical interactions, cellular phenotypes, and disease processes of your system. Even if you don’t have experimental data, you can use IPA to intelligently search the Ingenuity®Knowledge Base for information on genes, proteins, chemicals, drugs, and molecular relationships to build biological models or get up to speed in a relevant area of research. IPA provides the right biological context to facilitate informed decision-making, advance research project design, and generate new testable hypotheses."

  • NBmiRTar microRNA Target Prediction ( Software )

    "A web-based software which uses machine learning algorithm to find new microRNA genes in mammalian genomes as well as in newly sequenced species."

  • Primer Express ( Software )

    "The Primer Express® Software v3.0.1 allows you to design your own primers and probes using TaqMan® and SYBR® Green I dye chemistries for gene quantitation and allelic discrimination (SNP) real-time PCR applications. Developed specifically for use with our 7300, 7500, 7500 Fast, and 7900HT Fast Real-Time PCR Systems, Primer Express® Software provides customized application-specific documents for absolute/relative quantitation and allelic discrimination."

  • TRANSFAC ( Software )

    "TRANSFAC® is a unique knowledge-base containing published data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes. Based on its broad compilation of binding sites, positional weight matrices are derived which can be used with the included Match™ tool to search DNA sequences for predicted transcription factor binding sites. Promoter analysis of high-throughput data based on TRANSFAC® positional weight matrices is provided in the companion ExPlain™ Analysis System, while the companion Genome Trax™ provides a platform for mapping next generation sequencing variations to transcription factor binding sites characterized in TRANSFAC®.

    Transcription factors are recognized as important components of signaling cascades controlling all types of normal cellular processes as well as response to external stimulus, conditions of disease drug treatment, and more. While functional studies of transcription factors can provide indirect clues to the genes regulated by a single transcription factor under a specific set of experimental conditions, it’s only through transcription factor binding site analysis that we can (1) understand the mechanism of regulation, including coordinate regulation by multiple transcription factors acting together, and (2) effectively identify and characterize mutations that disrupt the regulatory mechanism. As there are comparatively few experimentally characterized binding sites relative to the total number of expected binding sites, the ability to reliably predict as yet uncharacterized binding sites is a critical and unparalleled tool in the quest to understand normal as well as disease processes."

Web Links:

Last updated: 2016-06-27T13:25:01.169-04:00

Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016