FastaToCA

From wgs-assembler
Jump to: navigation, search

The fastaToCA utility reads sequence and quality values from fasta files, and writes Celera Assembler FRG format output. If you are converting data from NCBI Trace Archive use of tracearchiveToCA is suggested instead.

usage: $0 [options] -l libraryname -s seq.fasta -q qlt.fasta > new.frg
-v vector-clear-file     A file of 'readUID vecBeg vecEnd', one per line, that is the vector clear range.
-noobt                   Set the 'doNotOverlapTrim' library feature.
-454                     Set library features appropriate for 454 reads (see also sffToCA).
-idregex pattern         Use this perl regex to extract the read name from the seq defline.
-l libraryname           Name of the library; freeformat text.
-mean m                  Insert has mean size of m.
-stddev s                Insert has std dev of s.
-s seq                   Fasta file of sequences.
-q qual                  Fasta file of quality values.
-m matepairing           A file of pairs of read UIDs for mated reads, one pair per line, whitespace separated.

An optional 'vector-clear-file' can supply any known vector clear range. The format is a list of "readName clear-begin clear-end", one triple per line. The clear ranges are expected to be space-based, i.e., bases start at 0. The clear ranges denote sequence that is free of any vector contamination. Any number of fragments can be listed, in any order.

An example file, where the first read has a clear range that covers the first 310 bases, is:

571965411  0 310
571965412 10 300
571965413  0 541
571965414  3 423

The 'matepairing' file will link pairs of reads into mate pairs. The format is two read ID's per line separated by any amount of white space. For example:

571965411 571965412
571965413 571965414

Sequences and quality values must be in the same order. Quality values should have the format (leading whitespace is optional):

>571965412
  9  9 10 10 13 10  6  4  4  4  4  4  6  4  4  6  4  0  4  8
  6  6  8  9 13 17 19 16 16 11 19 11  7  7 19 24 29 32 39 39
 29 31 29 34 34 34 34 34 39 39 35 35 35 35 35 35 39 35 35 35
 35 45 45 45 45 45 45 45 40 40 40 40 40 40 40 40 40 40 45 45
 45 35 35 35 35 35 35 45 45 39 39 39 35 35 40 40 40 40 39 39
 35 39 51 40 40 40 40 40 40 40 40 39 39 39 39 39 39 40 40 40