From wgs-assembler
Jump to: navigation, search

The generated consensus sequences are written in FASTA format. This file format is accepted by most bioinformatics analysis software. Celera Assembler generates FASTA files (like *.scf.fasta) with consensus sequence in upper case letters. Celera Assembler writes its own encoding of QV values (in files like *.scf.qv) and the NCBI encoding of quality values (in files like *.scf.qual). In both quality files, byte encodes the quality value of the corresponding base in the associated FASTA file.

Note that the *.qv files will break most FASTA parsers, as the encoded QV values contain the FASTA record separator character '>'.

Celera Assembler generates FASTA files for these entitities:

unassembled reads
uniquely assemblably contigs used to build contigs and scaffolds
unitigs not placed in any contig or scaffold
ungapped multiple sequence alignments
ordered and oriented contigs that constitute the assembly.