SpecFiles

From wgs-assembler
Jump to: navigation, search

The spec file is an optional input to the runCA executive that launches the Celera Assembler pipeline. The spec files provides a convenient way to generate assemblies while documenting their parameters faithfully. The use of spec files is STRONGLY recommended.

See also the RunCA Examples.

Format

A specFile contains a list of runCA option=value pairs, one per line. Unlike options on the command line, white space is allowed.

The runCA command:

perl $ASMBIN/runCA \
  -p bigfoot \
  -d bigfoot1 \
  useGrid=1 \
  scriptOnGrid=1 \
  ovlMemory="2GB --hashload 0.8 --hashstrings 110000" \
  ovlHashBlockSize=600000 \
  ovlRefBlockSize=7630000 \
  frgCorrBatchSize=1000000 \
  frgCorrThreads=4 \
  fragments1.frg \
  fragments2.frg.gz \
  fragments3.frg.bz2 \
  fragments4.frg

can be equivalently executed using the following runCA command and specFile:

perl $ASMBIN/runCA -p bigfoot -d bigfoot1 -s bigfoot1.spec

where 'bigfoot1.spec' contains:

#  Spec file for the bigfoot1 assembly.

useGrid          = 1
scriptOnGrid     = 1

ovlMemory        = 2GB --hashload 0.8 --hashstrings 110000
ovlHashBlockSize = 600000
ovlRefBlockSize  = 7630000
frgCorrBatchSize = 1000000
frgCorrThreads   = 4

/local/assembly/bigfoot-fragments/fragments1.frg
/local/assembly/bigfoot-fragments/fragments2.frg.gz
/local/assembly/bigfoot-fragments/fragments3.frg.bz2
/local/assembly/bigfoot-fragments/fragments4.frg

Unlike specifying options on the command line, specFiles let us:

  1. Add comments (lines starting with the '#' character) describing what the option does, and why the value was picked.
  2. Use white space to enhance readability.
  3. Record options and input files for later use.

Format Details

Options: Any line with an '=' symbol is assumed to be a runCA option, of the form option = value.

Filenames: Lines without an '=' symbol are assumed to be input filenames. Filenames should be absolute paths (/home/work/FRAGS/godzilla.frg). Relative paths (../FRAGS/godzilla.frg) may or may not work.

Comments: Comments start with the # symbol. Comments may begin anywhere in a line, or on their own line.

White space: Blank lines are ok. Whitespace is trimmed from both ends of the option value, but not from within the value itself (this should have no impact).

Locations

runCA will look for specFiles in three places, with later specFiles having precedence:

  1. The directory where runCA is installed, in $ASMBIN/spec/runCA.default.specFile.
  2. The users home directory, ~/.runCA.
  3. The command line, -s project.specFile.