eagle-i The University of PennsylvaniaThe University of Pennsylvania
See it in Search
This page is a preview of the following resource. Continue onto eagle-i search using the button on the right to see the full record.

Protein Cassette Discovery

eagle-i ID


Resource Type

  1. Algorithmic software component


  1. Related grant number
    Human Microbiome Roadmap Demonstration Project UH2DK083981
  2. Resource Description
    After clustering ORFs using UCLUST and comparing protein families to the Conserved Domain Database using rpsblast, protein families are grouped into cassettes. Cassettes are defined as multiple protein families that can be found together on contigs. More details are given in the related publication: "Each protein family was classified according to the list of contigs that encoded it. Next, all of the protein families were compared, seeing how many of those occurred on common contigs. A given pair of protein coding families was grouped into a cassette when the smaller of the two families was found on a shared contig at least 80% of the time. This process was performed iteratively, recalculating the overlap scores after each pair of protein families was merged together. In subsequent iterations, protein families could also merge in the same way with cassettes that formed earlier. If a pair of proteins formed a cassette found on multiple contigs, we expect shared ORFs to be in the same relative orientations. To calculate the consistency of orientation across contigs, we used a simple co-orientation score, calculated in the following way. Any two genes have four possible relative orientations. For every pair of protein clusters in a module, we calculate the proportion of contigs that contain the orientation found most commonly."
  3. Used by
    Bushman Laboratory
  4. Operating System
  5. Data Input
  6. Data Output
    Protein cassettes
  7. Software purpose
    Sequence analysis objective
  8. Related Publication or Documentation
    Conservation of Gene Cassettes Among Diverse Viruses of the Human Gut
  9. Website(s)
  10. Website(s)
  11. Related Technique
    Metagenomics analysis
  12. Developed by
    Bushman, Frederic D., PhD
  13. Algorithm used
    de Bruijn graph-based method
  14. Coded in
    R language
Provenance Metadata About This Resource Record
  1. workflow state
  2. contributor
    ggrant (Gregory Grant)
  3. created
  4. creator
  5. modified
Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016