Version 8.1 Release Notes

From wgs-assembler
Jump to: navigation, search

These are release notes for Celera Assembler version 8.1, which was released on December 16th, 2013.

This distribution package provides a stable, tested, documented version of the software. The distribution is usable on most Unix-like platforms, and some platforms have pre-compiled binary distributions ready for installation.

The source code package includes full source code (revision r4490), Makefiles, and scripts. A subset of the kmer package (http://kmer.sourceforge.net/, version r1969), used by some modules of Celera Assembler, is included.

This package was prepared by scientists at the J. Craig Venter Institute (http://www.jcvi.org/) with funding provided by the National Institutes of Health (http://www.nih.gov/).

Full documentation can be found online at http://wgs-assembler.sourceforge.net/.

Citation

Please cite Celera Assembler in publications that refer to its algorithm or its output. The standard citation is the original paper [Myers et al. (2000) A Whole-Genome Assembly of Drosophila. Science 287 2196-2204]. More recent papers describe modifications for human genome assembly [Istrail et al. 2004; Levy et al. 2007], metagenomics assembly [Venter et al. 2004; Rusch et al. 2007], haplotype separation [Levy et al. 2007; Denisov et al. 2008], a Sanger+pyrosequencing hybrid pipeline [Goldberg et al. 2006] and native assembly of 454 data [Miller et al. 2008]. There are links to these papers, and more, in the on-line documentation (http://wgs-assembler.sourceforge.net/).

Compilation and Installation

Users can download Celera Assembler as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To use the source code, execute these commands on any unix-like platform:

bzip2 -dc wgs-8.1.tar.bz2 | tar -xf -
cd wgs-8.1
cd kmer && make install && cd ..
cd samtools && make && cd ..
cd src && make && cd ..
cd ..

To use the binary distributions, choose a platform, download that package, then unpack it with some unix command like this:

bzip2 -dc wgs-8.1-*.tar.bz2 | tar -xf -

In both cases, you can run the assembler with:

wgs-8.1/*/bin/runCA

Changes in CA 8.1

Celera Assembelr 8.1 reduces the memory requirements of the unitig consensus (utgcns) module from 16GB to 1GB.

Backward Compatibility

Celera Assembler 8.1 is compatible with Celera Assembler 8.0. Its intermediate files are generally incompatible with earlier versions of CA. Users should not run 8.1 software against earlier pipeline files, or earlier software on 8.1 pipeline files. Users should launch CA 8.1 assemblies from scratch.

New Features

  1. Mostly bug fixes.

Improvements

(list not yet compiled)

Bug Fixes

(list not yet compiled)

Known Problems

  • Scaffolder (CGW) is slow
  1. Some data sets are exhibiting enormous run times in the scaffolding module.
  • Invalid Results
  1. In sffToCA, paired-end read lengths are calculated incorrectly in rare cases when '-trim hard' is used. Until the issue is resolved, users should always use '-trim chop' on 454 paired-end libraries.
  • Scaling
  1. Creating an overlap store for billions of fragments is a significant bottleneck. This process can take multiple days, sometimes longer than the (parallel) computation of the overlaps.
  • Algorithmic limitations
  1. There is no explicit support for high-coverage. Coverage such as 80X induces combinations of sequencing errors that confound Celera Assembler. At best this leads to higher reported rates of allelic variation. At worst this leads to a fractured assembly. Sampling from high-coverage reads can yield a better assembly.
  2. There is little support for assembly of data sets with a small ratio of mate pairs to unpaired fragments. Mate pairs are used to detect misassmblies and to form contigs and scaffolds. Too few mate pairs can result in shattered assemblies and many bases in degenerate contigs.
  3. There is no support for bar-coded 454 data. Users with bar-coded data may use some other utility to remove the bar code sequence and partition reads by bar code into separate SFF files.
  4. There is no support for data from ABI SOLiD.
  5. There is no support for cDNA, exon-enriched DNA, or DNA amplified with bias of any sort.

Legal

Copyright 1999-2004 by Applera Corporation. Copyright 2005-2013 by the J. Craig Venter Institute. The Celera Assembler software, also known as the wgs-assembler and CABOG, is open-source and available free of charge subject to the GNU General Public License, version 2.