eagle-i The University of PennsylvaniaThe University of Pennsylvania
See it in Search
This page is a preview of the following resource. Continue onto eagle-i search using the button on the right to see the full record.


eagle-i ID


Resource Type

  1. Algorithmic software suite


  1. Resource Description
    BROCC is a flexible software pipeline for classifying single cell eukaryotes in microbiome samples that easily interfaces with the popular QIIME pipeline. BROCC classifies amplicons using BLAST searches against large and relatively uncurated databases. BROCC uses blastn, but output from other versions of BLAST such as blastx can be substituted. BROCC first filters input BLAST hits for sufficient coverage and identity to the query sequence. If a query sequence has too many hits that are below the preset coverage threshold (70% default), or BLAST did not return a hit, it is not classified, and a message is written to the output file. BROCC then determines the identity and taxonomic hierarchy of each high quality hit using a local user installed sql database and NCBI’s e-fetch tool. BROCC then votes on the quality filtered BLAST hits, starting at the species level. At each level of the taxonomy BROCC requires the taxon with the most votes to surpass a user specified threshold for that level in order to accept it as a valid classification. If a sufficient majority is not reached, BROCC will not make a classification for that level and iterate to the next higher taxonomic level for another round of voting. BROCC filters are independently configurable at the genus and species levels, and another filter can be assigned for the remaining taxonomic levels. BROCC also contains a user modifiable list of high level and partial assignments in its configuration file. These assignments are ignored at lower taxonomic levels where they are uninformative and can distort voting, but included in higher levels. For example, a sequence read with a kingdom level assignment only is excluded up to the kingdom level, at which point the vote is counted in the kingdom assignment. In cases where the proportion of high level and partial assignments exceeds a given threshold (default 0.70), the query sequence is unassigned and marked accordingly. BROCC output includes both files containing classifications with standardized taxonomy (domain, kingdom, phylum, class, order, family, genus, species) and a second with the complete NCBI taxonomy, which includes subtaxa, supertaxa, and unranked intermediate taxonomic levels. The third file contains a log of the voting record, including how many votes were cast, how many votes the winning taxon received, and how many generic classifications were ignored for each query sequence. This file also indicates those queries that were unclassified. Both taxonomy files are suitable for use in the QIIME pipeline (i. e. they are in the same format as the output classifications as the QIIME assign_taxonomy.py script). The BROCC program is implemented in Python version 2.7. It queries the NCBI taxonomy and requires local installations of SQL and BLAST.
  2. Additional Name
    BLAST Read and OTU Consensus Classifier
  3. Used by
    Bushman Laboratory
  4. Version
  5. Operating System
  6. Data Input
    Amplicon-based sequence sets
  7. Data Input
    BLAST results, output format 7
  8. Data Output
    QIIME-formatted taxonomy map
  9. Data Output
    Log file with full classification and voting details
  10. Data Output
    NCBI taxonomy file
  11. Software purpose
    Sequence-based phylogenetic analysis objective
  12. Related Publication or Documentation
    A tool kit for quantifying eukaryotic rRNA gene sequences from human microbiome samples
  13. Website(s)
  14. Related Technique
    Metagenomics analysis
  15. Developed by
    Bushman, Frederic D., PhD
  16. Developed by
    Bittinger, Kyle., PhD
  17. Developed by
  18. Software license
    GNU General Public License
  19. Coded in
Provenance Metadata About This Resource Record
Copyright © 2016 by the President and Fellows of Harvard College
The eagle-i Consortium is supported by NIH Grant #5U24RR029825-02 / Copyright 2016