FASTA Files

From wgs-assembler
Jump to: navigation, search

The generated consensus sequences are written in FASTA format. This file format is accepted by most bioinformatics analysis software. Celera Assembler generates FASTA files (like *.scf.fasta) with consensus sequence in upper case letters. Celera Assembler writes its own encoding of QV values (in files like *.scf.qv) and the NCBI encoding of quality values (in files like *.scf.qual). In both quality files, byte encodes the quality value of the corresponding base in the associated FASTA file.

Note that the *.qv files will break most FASTA parsers, as the encoded QV values contain the FASTA record separator character '>'.

Celera Assembler generates FASTA files for these entitities:

singletons
unassembled reads
$prefix.singleton.fasta
unitigs
uniquely assemblably contigs used to build contigs and scaffolds
$prefix.utg.fasta
$prefix.utg.qv
$prefix.utg.qual
degenerates
unitigs not placed in any contig or scaffold
$prefix.deg.fasta
$prefix.deg.qv
$prefix.deg.qual
contigs
ungapped multiple sequence alignments
$prefix.ctg.fasta
$prefix.ctg.qv
$prefix.ctg.qual
scaffolds
ordered and oriented contigs that constitute the assembly.
$prefix.scf.fasta
$prefix.scf.qv
$prefix.scf.qual