TANGO: Taxonomic Assignment in Metagenomics
Copyright © 2010 J. C. Clemente, J. Jansson, G. Valiente
Perl Script and Data Sets
Usage Notes
# -------------------------------------------------------------------------
# TANGO: Taxonomic Assignment in Metagenomics (tango.pl version 1.2.0)
# Copyright (c) 2010 Jose C. Clemente, Jesper Jansson, and Gabriel Valiente
# -------------------------------------------------------------------------
# usage:
# perl tango.pl [tree.tre] [reads.txt] [q = 0.5] > [output.txt]
#
# input:
# [tree.tree] is a taxonomy tree in Newick format, corresponding to a
# classification of biological species in seven taxonomic ranks:
# kingdom, phylum, class, order, family, genus, and species
# [reads.txt] is a text file containing the parsed output of a mapping
# program, in the format: [read_id] [species_id_1] ... [species_id_n]
# [q = 0.5] is a parameter that allows balancing the taxonomic assignment
# between precision (q = 0) and recall (q = 1), with q = 0.5 (default)
# corresponding to the F-measure (harmonic mean of precision and
# recall)
#
# output:
# for each line in the input [reads.txt] file:
# read_id
# for each node_id in [tree.tre] with optimal precision and recall
# node_id to which read_id was assigned
# taxonomic_rank at which read_id was assigned
# -------------------------------------------------------------------------
Sample Usage
- $ perl tango.pl lineages.tre reads.match 0 (maximize precision, assign at the species)
EKQJ6TS02I7JEY S000389918 (species) S001168699 (species)
EKQJ6TS02FQP2N S000021184 (species) S000001991 (species) S000003122 (species) S000324392 (species) S000381747 (species) S000458520 (species) S000009313 (species)
EKQJ6TS02I3P0I S000004313 (species) S000013935 (species) S000133399 (species) S000139289 (species)
EKQJ6TS02GCJ9K S000004313 (species) S000013935 (species) S000133399 (species) S000139289 (species)
EKQJ6TS02GSX4C S000004313 (species) S000013935 (species) S000133399 (species) S000139289 (species)
- $ perl tango.pl lineages.tre reads.match (maximize the harmonic mean of precision and recall, F-measure)
EKQJ6TS02I7JEY S000389918 (species) S001168699 (species)
EKQJ6TS02FQP2N Klebsiella (genus)
EKQJ6TS02I3P0I S000004313 (species) S000013935 (species) GENUS (genus) S000133399 (species) S000139289 (species)
EKQJ6TS02GCJ9K S000004313 (species) S000013935 (species) GENUS (genus) S000133399 (species) S000139289 (species)
EKQJ6TS02GSX4C S000004313 (species) S000013935 (species) GENUS (genus) S000133399 (species) S000139289 (species)
- $ perl tango.pl lineages.tre reads.match 1 (maximize recall, assign at the lowest common ancestor of the species)
EKQJ6TS02I7JEY Lactobacillus (genus)
EKQJ6TS02FQP2N Enterobacteriaceae (family)
EKQJ6TS02I3P0I Enterobacteriaceae (family)
EKQJ6TS02GCJ9K Enterobacteriaceae (family)
EKQJ6TS02GSX4C Enterobacteriaceae (family)
Gabriel Valiente
valiente@
(lsi.upc.edu)