Version 6.1 Changes

From wgs-assembler
Jump to: navigation, search

This is a nearly complete list of the changes made in Celera Assembler 6.1 since the last release (Celera Assembler 5.4). See also Version 6.1 Release Notes.

BROKEN COMMITS

The software changed a lot between the releases of CA 5.4 and CA 6.1. Some users checked out source code between releases and built their own binaries. For those brave users, we provide a list of source code changes that are known to have introduced bugs. This could help users decide whether to re-assemble their data with the 6.1 release package.

There was a large scaffolding bug introduced on 2010/02/23 and fixed on 2010/03/28.

We assembled a small genome using a copy of the Celera Assembler after nearly every change was made. The results track the numbers of contigs and scaffolds, the amount of sequence in each, and the status of mate pairs.

WGS-ASSEMBLER CHANGES

This list is derived from the CVS commit logs. This list is technical. For a more readable list, see Version 6.1 Release Notes.

  1. 2009/07/24 -- (meryl/overmerry) - Detect invalid clear range specifications in meryl and overmerry.
  2. 2009/07/24 -- (meryl) - Reduce memory usage on assemblies with large numbers of reads.
  3. 2009/07/26 -- (utgcns) - fix a problem where contained fragments would fail to generate a untigi consensus sequence.
  4. 2009/07/27 -- (cgw) - allow the minimum number of mate edges to create or merge scaffolds to be set at run time.
  5. 2009/07/27 -- (deduplicate) - Fix an error where where deleted reads were used as evidence for future deletes. This resulted in deleting all instances of a duplicate read.
  6. 2009/07/27 -- (runCA) - Inconsistent deduplicate summary file name allowed the deduplicator to run more than once.
  7. 2009/07/29 -- (toggle) - Detect and toggle surrogate unitigs on the ends of scaffolds.
  8. 2009/07/30 -- (cgw) - Rebuild scaffold edges after each merge iteration. Instead of decrementing the minimum weight by one each iteration with no merge, take larger steps when the weight is large. This resolves an observed slow down in large and deep assemblies.
  9. 2009/07/31 -- (overmerry) - change the way the size of the input queue is determined. This should (slightly) improve performance, and reduce memory consumption.
  10. 2009/07/31 -- (utgcns) - Fix an error in computing the expected size of an alignment, which occasionally resulted in an inferior fragment alignment being used.
  11. 2009/08/03 -- (sffToCA) - reads longer than 2048bp were causing a crash.
  12. 2009/08/05 -- (gatekeeper) - Add option to dump all bases, being those outside the clear range. Non-clear bases are in lower case. Deleted reads are all lower case.
  13. 2009/08/06 -- (build) - Automagically 'rename' files from *.c to *.C. In reality, a symlink is created.
  14. 2009/08/07 -- (BOG) - Clean up command line processing, fix a few compile warnings. Add support for reading overlaps from a second overlap store.
  15. 2009/08/08 -- (utgcns) - Fixes to allow fragments with (slightly) negative ahangs. This was typically observed at the start of a unitig, when multiple fragments all claimed to start at or near the origin. Due to imprecise placements, we either needed to rearrange the fragments (to maintain non-negative ahangs) OR allow the slightly negative ahangs.
  16. 2009/08/09 -- (buildPosMap) - replace buildFragContigPosMap.pl with buildPosMap.
  17. 2009/08/09 -- (general) - remove unsupported stripCGBReads.pl, which would remove fragments from unitigs (needs AMOS perl libraries, writes fragments using a hard coded format string).
  18. 2009/08/11 -- (sffToCA / DP aligner) - Adjust MATCH, GAP, MISMATCH scores to more aggressively penalize errors. If ahang,bhang == alen,blen assume that we're fishing for an alignment, and do not restrict the alignment to any particular ahang,bhang -- only require that it be an overlap alignment.
  19. 2009/08/11 -- (utgcns) - fix several problems when using a fragment in the unitig multialignment as a template for aligning another fragment
  20. 2009/08/12 -- (chimera) - base our minimum length for using an overlap on the assembler global minimum overlap length. Previously, we used 20bp as the minimum length regardless of the assembler minimum length. Now we use ASM_MIN_LEN/2.
  21. 2009/08/14 -- (build) - remove the -p option from 'cp'; some filesystems do not support this option.
  22. 2009/08/14 -- (closure) - Added PLC message to assembly to represent finishing (or other) constrained reads. Also fixed some warnings about uint32 being compared to int
  23. 2009/08/14 -- (cgw) - When splitting probably chimeric input unitigs, greatly simplify the algorithm for placing fragments into the new unitigs.
  24. 2009/08/16 -- (consensus) - Add an option to allow verbose logging when creating a multialignment. This was previously only possible by a compile-time option.
  25. 2009/08/16 -- (cgw) - When splitting probably chimeric input unitigs, estimate the insert sizes of libraries using the input unitigs. Previously, the estimate would be updated only if the 6-clonesize stage was run.
  26. 2009/08/16 -- (cgw) - When splitting probably chimeric input unitigs, be less aggressive about splitting near the ends of unitigs. These splits usually resulted in either total shattering (the end became a bunch of singletons) or a degenerate contig (the end had no mate pairs pull it back in).
  27. 2009/08/25 -- (gatekeeper) - Add '--revertclear' to copy an existing clear range into the 'latest' clear range stored in fragment metadata. This clear range becomes the current clear, and all later clear ranges are removed. This functionality was fixed on 2009/09/25.
  28. 2009/08/26 -- (eCR) - Partition eCR based on the number of fragments in scaffolds instead of the number of scaffolds. This results in all eCR jobs being more equally load balanced, where before the last eCR job would usually take significantly longer than the first eCR jobs.
  29. 2009/08/26 -- (convert-fasta-to-v2.pl) - Require a mean and standard deviation be provided if mate pairs are specified.
  30. 2009/08/28 -- (tracedb-to-frg.pl) - correctly detect the end of an XML file.
  31. 2009/08/28 -- (overlap error correction) - Increase the internal representation of a fragment IID from 28 bits (256 million) to 32 bits (4 billion). Assemblies with more than 256 million fragments would fail to update error rates in overlaps. This change breaks binary file compatibility in error correction.
  32. 2009/08/28 -- (runCA) - Look for compressed AND uncompressed overlap files.
  33. 2009/08/28 -- (runCA) - Improve restarting runCA; do not compute overlap seeds (overmerry) if the merStore exists.
  34. 2009/08/28 -- (closure) - Cleanup support for closure reads (fix comparisons of int and uint, remove debug output) and set default options for how these reads are processed. Optionally, add #defined component to try undoing Jiggle_Positions while searching for contig overlaps.
  35. 2009/08/29 -- (mer overlapper) - use a multithreaded sort implementation when extending seed overlaps.
  36. 2009/09/02 -- (general) - more carefully check consistency when reading data from stores.
  37. 2009/09/02 -- runCA - Correctly refuse to run overlap jobs (overlap.sh) that are out of bounds.
  38. 2009/09/02 -- (general) - flush/sync when writing data to stores. This (attempts) to resolve a rare issue with specific filesystems. This change was reverted on 2009/09/09; it was FAR too expensive to sync/flush as often as it was.
  39. 2009/09/04 -- (cgw) - Return FALSE when incorrect input values are given to the ChiSquared test. Previous behaviour would crash with an assert failure in either the gcf() or gser() function.
  40. 2009/09/07 -- (general) - remove obsolete program AS_CNS_asmReBaseCall
  41. 2009/09/09 -- (runCA) - Fix an obscure big where 7-0-CGW would fail to find the last checkpoint written by 6-clonesize. This bug was exhibited only if runCA was restarted after 6-clonesize finished, but before 7-0-CGW started.
  42. 2009/09/10 -- (closure) - Fix bugs in finishing read placement. Also, do two rounds of fragment placement as the bounding reads for a finishing read may not be placed until the end of the first round so we avoid order dependency in this way.
  43. 2009/09/12 -- (ecr) - The new eCR partitioning scheme (2009/08/26) was omitting the last scaffold.
  44. 2009/09/24 -- (cgw) - ??BUG?? When computing least-squares gap estimates, a sign error was merging contigs whose actual overlap was more than 10 bases different than the expected overlap, and NOT merging contigs when the actual overlap was close to the expected overlap.
  45. 2009/09/24 -- (cgw) - ??HOW RECENT?? A recent regression, the checkpoint number was not being incremented after each eCR job, and we lost all but the final job.
  46. 2009/09/25 -- (cgw) - (CODE) refactor the routine that merges contigs (MergeMultiAligns) to be far simpler.
  47. 2009/09/25 -- (runCA / consensus) - Add option 'cnsPhasing' to control if VAR records are phased. Phasing is OFF by default.
  48. 2009/09/25 -- (gatekeeper) - Add support for very short reads from FastQ files.
  49. 2009/09/29 -- (general) - (CODE) Rework IntMultiVar to store data in independent arrays, instead of ASCII character strings.
  50. 2009/09/30 -- (runCA) - If unitigger finished successfully, do not query the gkpStore to see if the 'forceBOGunitigger' flag is set. On large assemblies, this lookup is expensive (though it shouldn't be).
  51. 2009/09/30 -- (chimera) -- remove uninformative logging.
  52. 2009/10/01 -- (cgw (BUG?)) - Fix a potential bug where a surrogate unitig is placed as a rock/stone.
  53. 2009/10/01 -- (cgw) - Added an advanced option to runCA (kickOutNonOvlContigs) to optionally remove a contig from a scaffold when it has no pairwise overlaps to surrounding contigs. If its neighbors do have a pairwise overlap, the middle contig is removed instead of being inserted with a gap of -20bp.
  54. 2009/10/05 -- (general) - fix crash when printing a mutlialignment when there is no consensus sequence generated.
  55. 2009/10/05 -- (cgw) - Fix a minor memory leak in an alignment routine.
  56. 2009/10/05 -- (general) (MAJOR) - replace the cgw-centric SeqStore (which stores unitig and contig multialignments) with an assembler-wide 'tigStore'.
  57. 2009/10/08 -- (runCA) - Change how the default value for 'unitigger' is treated. By default, the unitigger used is determined by the existence of 454 reads. If 454 reads are present, the BOG unitigger is used. If no 454 reads are present, the utg unitigger is used. In either case (present / not present) explicitly setting 'unitigger' will use that unitigger.
  58. 2009/10/09 -- (runCA) - Correctly notice when unitig consensus jobs fail.
  59. 2009/10/09 -- (general) - When storing unitig/contig multialignments to disk (in the tigStore) first copy the data to a byte array, then write the entire array in one block. This cuts down the number of fwrites from 19 to 2. This is another attempt at solving the rare filesystem glitch noted on 2009/09/02.
  60. 2009/10/10 -- (runCA) - Remove the big WARNING about using bog because some frags requested it (from 2009/10/08).
  61. 2009/10/10 -- (runCA) - Enable deduplication only if some fragments request it.
  62. 2009/10/10 -- (ctgcns/utgcns) - Add options to run just a single unitig/contig.
  63. 2009/10/10 -- (ctgcns) - Fragments 'ejected' during contig consensus were not being ejected when the MultiAlignT was being rebuilt (in GetMANodePositions()).
  64. 2009/10/12 -- (deduplication) - (THIS COMMIT INTRODUCED A BUG) Require the "fragment" deduplication rule to work only on unmated fragments. It was working on all fragments, regardless of mate-pair status.
  65. 2009/10/12 -- (OBT) - Report counts for the types of trimming we do in the mergeLog.
  66. 2009/10/12 -- (general) - Improve performance during rocks/stones by removing unnecessary disk access.
  67. 2009/10/14 -- (resolveSurrogates) - Fix to prevent a fragment from being placed twice in a single unitig.
  68. 2009/10/14 -- (cgw) - Restrict unitig-unitig overlaps to less than AS_CGW_DP_ERATE error.
  69. 2009/10/20 -- (sffToCA) - Remove the -log and -stats options, and instead always write those files.
  70. 2009/10/20 -- (sffToCA) - Correctly remove the tmpStore.
  71. 2009/10/20 -- (runCA) - Add option 'saveOverlaps'.
  72. 2009/10/20 -- (sffToCA) - Make statistics consistent. Treat reads with an invalid clear range as being "too short".
  73. 2009/10/25 -- (cgw) - Remove all tigStore flushes during rocks/stones. See also 2009/09/02.
  74. 2009/10/26 -- (general) - Allow increasing the maximum read length at compile time. Not widely tested. This change breaks compatibility with both the gatekeeper and overlap stores, and the version number on both stores reflects this incompatibility.
  75. 2009/10/27 -- (gatekeeper) - Allow loading of unmated reads from FastQ files.
  76. 2009/10/27 -- (runCA) - default cnsPhasing to off.
  77. 2009/10/27 -- (cgw) - Added an advanced option to runCA (doUnjiggleWhenMerging) to unjiggle contigs. When cgw fills gaps, it jiggles the surrounding contigs apart to enlarge the gap to fit the new sequence. However, the contigs are not always unjiggled back so an overlap between them may not be detected. This option looks for an overlap given both the jiggled and unjiggled positions of the contigs. Previously a compile time option (see 2009/08/28) is now sped up using OverlapContigs instead of a full MultiAlign.
  78. 2009/10/27 -- (runCA) - remove obsolete option cgwOutputIntermediate. Terminator can directly dump from a tigStore and a cgw checkpoint.
  79. 2009/10/30 -- (mer overlapper) - (IMPACT?) Long reads with thin overlaps were overflowing a fixed-width integer.
  80. 2009/11/02 -- (gatekeeper) - fix crash when a fragment of exactly 2048 bases was encountered.
  81. 2009/11/02 -- (mer overlapper) - The overlapper code does not handle negative offsets within reads. Negative offsets should never happen unless we have an integer overflow (if our reads are longer than the maximum hang we can store), add an assert to detect if this happens.
  82. 2009/11/04 -- (overlap store) - add option to dump a multialignment-style picture of overlaps to a single fragment.
  83. 2009/11/04 -- (tigstore) - allow dumping of all unitigs/contigs, not just a single one
  84. 2009/11/04 -- (utgcns) - Fix rare problem where a fragment is placed using a thin overlap to a previously placed fragment.
  85. 2009/11/09 -- (gatekeeper store) - Allow reading of read-only clear ranges; previous behavior would return the invalid clear range (1,0) in this case.
  86. 2009/11/09 -- (runCA) - Add the 'stopAfter' option. Make 'stopAfter' and 'stopBefore' case insensitive.
  87. 2009/11/10 -- (OBT initialTrim) - write the summary to stdout, not stderr.
  88. 2009/11/10 -- (runCA) - Read options from $bin/runCA.default.spec, then ~/.runCA, then from the supplied spec file, then from the command line.
  89. 2009/11/17 -- (sffToCA) - If the insert size is specified, but no linker sequence is specified, search for both FLX and Titanium linkers.
  90. 2009/11/19 -- (OBT) (BUG?) - Fix a few issues with spur/chimera trimming and detection. We now fix many more spurs, and detect a few more chimera. Also allow a more aggressive chimera detection pattern, off by default.
  91. 2009/11/19 -- (tigStore) - Remember if the last file operation was a write, which will let us skip a seek() on the next write. This is another attempt at solving the rare filesystem glitch noted on 2009/09/02.
  92. 2009/11/20 -- (UID) - Refactor the UID client code, remove the original unsupported and broken Celera UID server.
  93. 2009/11/20 -- (BOG) (BUG?) - Add a simple bubble popper, which merges short unitigs into larger unitigs as long as the short unitig has only best overlaps to the larger unitig.
  94. 2009/11/23 -- (UID) - Add a simple UID server, and client code to access it. No support exists to use it during assembly, however.
  95. 2009/11/24 -- (remove_fragment) - correct usage. This program will take a list of UIDs and a frag file. It will separate the fragments into two files based on existence of the fragment UID in the input list.
  96. 2009/11/24 -- (gatekeeper) - fix problems with whitespace at the end of UIDs (e.g., "acc:1514616 " instead of "acc:1514616").
  97. 2009/11/26 -- (general) - clean up the logging of the command line.
  98. 2009/11/26 -- (BOG) - log command line options at the start of execution.
  99. 2009/11/30 -- (BOG) - if, after positioning a fragment in the initial unitig, it violates (the dovetail) constraint, reset the positioning to the constraint.
  100. 2009/12/01 -- (cgw) - correct the partitioning of cgw output. CGW was not computing the number of fragments to place in a consensus partition correctly, resulting in too many partitions on large assemblies. This was introduced with the new tigStore, on 2009/10/05.
  101. 2009/12/01 -- (utgcns/ctgcns) - Multiple instances of -V to utgcns and ctgcns turn on much diagnostic output.
  102. 2009/12/01 -- (gatekeeper/runCA) - improve access time for querying if a given 'feature' is set for some library. These features control if deduplication is enabled, and which unitigger is needed.
  103. 2009/12/02 -- (gatekeeper) - add an option to NOT load the UID data when dumping a store. The UID data on large assemblies is expensive to load, and is sometimes not necessary (for example, when dumping the clear range of a fragment).
  104. 2009/12/02 -- (sffToCA) - clear ranges were not being transfered from the original fragment to the mate pair reads.
  105. 2009/12/03 -- (tigStore) - remove old versions before creating a new version. Solves problems when restarting cgw after it creates the partitioned output.
  106. 2009/12/03 -- (tigStore) - if we fail to open a partition, close any open partitions and try again. The closed partitions will be reopened on demand.
  107. 2009/12/03 -- (tigStore) - improve relaibility of loading unitigs/contigs from the 'layout' format.
  108. 2009/12/03 -- (tigStore) - optimize performance when loading a single partition; do not load metadata for other partitions.
  109. 2009/12/08 -- (convert-v1-to-v2.pl) - read metadata from the JCVI JTRACE Format directly.
  110. 2009/12/10 -- (tigStore) - version 2. Add magic and version numbers to the utg/ctg data files. Improve performance of partitioned stores on large data sets. utgcns and ctgcns now by default will not recompute a consensus sequence. utgcns and ctgcns now have a test mode, for testing edits on tigs. Fix numerous problems with editing tigs, like tigStore -R removing all previous tigs, and not being able to add a new unitig.
  111. 2009/12/10 -- (tigStore) - fix segfault on Linux.
  112. 2009/12/10 -- (caqc) - Paired ends with both fragments in surrogates are now correctly labeled as bothSurrogate. Placed surrogate fragments have their status set to good (if appropriate) not oneSurrogate. oneDegen takes precedence over oneSurrogate so that a paired end with one fragment in a surrogate and one in a degenerate is marked as oneDegen not not oneSurrogate.
  113. 2009/12/10 -- (tigStore) - Editing a unitig, and inserting into version 1, was not being seen by consensus, due to the store loading all of v002 over v001, even if the v002 data was out of date. Documented on the wiki.
  114. 2009/12/11 -- (tigStore) - 'layout' format is not space-delimtited, not tab-delimeted.
  115. 2009/12/14 -- (cgw) - handle an assert failure gracefully in Project_across_Agap_one_interval().
  116. 2009/12/18 -- (build) - Potential support for Intel 32- and 64-bit OS-X.
  117. 2009/12/18 -- (mer overlapper) - Add infrastructure for picking a per-fragment mer threshold when finding seeds.
  118. 2009/12/18 -- (cgw) - Changes for Bonobo i7: Disable (again) tigStore cache flushes. (ALREADY NOTED?)
  119. 2009/12/18 -- (cgw) - Changes for Bonobo i7: Accept nearly every alignment. (ALREADY NOTED?)
  120. 2009/12/18 -- (cns) - Changes for Bonobo i7: Don't phase alleles. (ALREADY NOTED?)
  121. 2009/12/18 -- (AS_global.c) - Set version to 5.5 (WHY IS THIS IN THE MAINLINE? - it isn't)
  122. 2009/12/19 -- (BOG) (FINISH) Initial implementation of unitig merging. Command line option controls it, disabled by default.
  123. 2010/01/04 -- (gatekeeper) - Report Illumina errors (SF bug #2886767)
  124. 2010/01/04 -- (gatekeeper) - Don't allow mated reads to be added to an unmated library. Also, check that the link type agrees with what the library is expecting.
  125. 2010/01/06 -- (cgw) - Fix subtle error in FoundTransitiveEdgePath(), where it was failing to return FALSE.
  126. 2010/01/07 -- (cgw) - (BUG) Refactor BuildUniqueCIScaffolds().
  127. 2010/01/13 -- (cgw) - (BUG) Refactor transitive edge removal, both the general case, and the two-hop case. This, and the last two refactoring commits, shows no change on drosophila willistoni. This commit includes new rules for pruning the depth-first path search during the general case.
  128. 2010/01/13 -- (runCA) - Use a better regex when 'ls'ing for checkpoints. Before: *ckp*, now $asm.ckp.[0-9]*.
  129. 2010/01/14 -- (cgw) - (BUG) Refactor isQualityScaffoldMergingEdge(). Fix a bug in same function, see comments for failsToGetHappier2. The result of the bug fix is that a very few scaffolds are now not merged. Instrument_CGW.h and InterleavedMerging.h changed in that a typed pointer replaced a generic pointer.
  130. 2010/01/14 -- (BOG) - "Bubble popping" / "short unitig merging" was non-deterministic due to an uninitialized variable.
  131. 2010/01/15 -- (cgw) (BUG) - In InsertScaffoldContentsIntoScaffold(), scaffold lengths are adjusted or corrected before and after the insert, based on the actual contig positioning. This fixes a crash where the scaffold was shorter than the contig positioning claimed. The root cause of this was not discovered. Both InsertScaffoldContentsIntoScaffold() and InsertCIInScaffold() were slightly cleaned up.
  132. 2010/01/15 -- (cgw) - When the inputs to the ChiSquared test are invalid, don't crash. Print lots of debugging information, and return that the test failed. (see also 2009/09/04)
  133. 2010/01/17 -- (tracedb-to-frg.pl) - Use the LIBRARY_ID if there is no SEQ_LIB_ID defined.
  134. 2010/01/17 -- (cgw) - Optimize tigStore flushing. In CGW, disable many of the lower-level flushes (those within loops), and add a few more higher-level flushes (in main()).
  135. 2010/01/17 -- (cgw) - Simplify (the code of) PairwiseChiSquare(). Stop asserting on invalid inputs, return "false" instead. See also 2010/01/15.
  136. 2010/01/17 -- (cgw) - Don't be so aggressive about asserting variance is positive.
  137. 2010/01/19 -- (cgw) - Update the message for "Variance Fixup Alert" to make more sense. Add a similar message for when the size is also screwed up, instead of simply asserting.
  138. 2010/01/20 -- (cgw) - In GapFillREZ.c, disable two asserts about negative variance. This has been our 'standard' fix for this problem for a while now anyway.
  139. 2010/01/20 -- (cgw) - Check for a very simple error condition before merging scaffolds, and don't merge if the condition is true.
  140. 2010/01/22 -- (tigStore - Fix a few quirks in partitioning; do not write unitigs if we are partitioned on contigs; do not load the current unitig version if we are partitioned on contigs and opened for append.
  141. 2010/01/22 -- (tigStore) - Integer overflow when attepting to read a single block of data (say a huge VAR) larger than 4GB.
  142. 2010/01/22 -- (convertOverlap) - The conversion routine from an ASCII overlap to a binary overlap was failing to read an ASCII overlap that contained the mer count (the 8th data item). Not used by runCA.
  143. 2010/01/22 -- (OBT) - Skip overlap based trimming if it is done, or if the ovlStore already exists.
  144. 2010/01/23 -- (AS_MSG) - Switch to the AS_MSG version claimed in the VER message we are writing.
  145. 2010/01/23 -- (general) - Add fastqToCA.
  146. 2010/01/25 -- (runCA) - Don't overwrite eCR launch scripts, unless the partitioning changes. This lets one fix eCR failures by editing the run script and relaunching runCA.
  147. 2010/01/25 -- (cgw) - Tune the depth first search in transitive edge reduction for very deep unitigs encountered in GOS III.
  148. 2010/01/25 -- (AS_MSG) - Make it easier to change the maximum buffer size used when reading messages (sync'd with CVS TIP) and increase the buffer size to 8GB, from 8MB, specifically for GOS III.
  149. 2010/01/26 -- (BOG) - Fix a buggy overlap length calculation (for when A is contained in B). In practice, this code path was never exercised; we never computed the length of a contained overlap.
  150. 2010/01/26 -- (BOG) - Be more paranoid about computing the length of an overlap.
  151. 2010/01/26 -- (BOG) - Add option -E to accept an overlap with a small number of errors, regardless of the actual percent error (default is 0 errors).
  152. 2010/01/26 -- (AS_MSG) - revert the maximum message length to 128MB, from 8MB set on 2010/01/25.
  153. 2010/01/28 -- (runCA) - Deprecate runCA option "doOverlapTrimming", replaced with "doOBT" or "doOverlapBasedTrimming". All three are accepted, the last is preferred.
  154. 2010/01/29 -- (overlapStore) - Require a clear range name be supplied when dumping an overlap picture.
  155. 2010/01/29 -- (BOG) - Reimplement the computation of the length of an overlap. Test methods to compute the length that aren't biased by indel, but none work well and are disabled.
  156. 2010/01/29 -- (gatekeeper) - Not returning EOF correctly, resulting in the last FastQ fragment being inserted into the gkpStore twice.
  157. 2010/02/02 -- (runCA) - Add sgeName option.
  158. 2010/02/02 -- (tracedb-to-frg.pl) - Fix clear ranges. The 'clq' clear range is no longer used, and we must compute the 'clr' range from VECTOR and QUALITY in the xml.
  159. 2010/02/03 -- (BOG) _ Fix problem in short unitig merging where a fragment couldn't be placed into the new unitig until other fragments are placed. This was crashing with: Assertion failed: ((blen5 > 0) || (blen3 > 0)), function addAndPlaceFrag, file AS_BOG_Unitig.cc, line 510.
  160. 2010/02/03 -- (gatekeeper) - When dumping frg format, the orientation of the library was incorrect, ALWAYS unmated.
  161. 2010/02/03 -- (gatekeeper) - The -randommated option was only dumping n/2 mate pairs, not n as advertised.
  162. 2010/02/04 -- (BOG/unitigger) - Restore the coverage stat histogram.
  163. 2010/02/04 -- (BOG/unitigger) - Add labels to the cga.0 histograms.
  164. 2010/02/05 -- (gatekeeper) - (RECENT REGRESSION) When inputting version 1 frg files, the mate type was being read as unknown, fixed to default to innie
  165. 2010/02/05 -- (tigStore) - do not output coverage stat when outputting contigs (coverage stat is for unitig only)
  166. 2010/02/12 -- (AS_MSG) - remove 'gui:' (isGuide) from ULK/CLK/SLK messages.
  167. 2010/02/12 -- (posmap) - write ULK, CLK, and SLK to posmap.
  168. 2010/02/13 -- (runCA) - After updating erates in the ovlStore, create an empty file 'corrected' in the store. If this file exists, we know we've already completed error correction.
  169. 2010/02/16 -- (general) - Solve a rare problem where data stored in a hash table is externalized incorrectly. Details: When adding a new entry to the table, if it fails to add because the key is already present, remove the new entry from the heap. Previous versions would leave it on the heap, and it would then get saved to disk, potentially causing a swap in values when the table is loaded back. For example, two values with the same key, A and B. A is in the table. We try to add B, but since the key is used, it fails. When we write the table to disk, we write B first, then A. Loading the table back into memory loads B first, then A fails to load becuase its key is already used.
  170. 2010/02/16 -- (cgw) - In the chunk overlapper, remove CI- and Contig-specific overlappers as CGW only uses one (and did so via a very ugly hack). Add many asserts against failures to add, delete, etc. Replace the hash function with a safer version.
  171. 2010/02/17 -- (general) - Replace OrientType and ChunkOrientationType -- both enums which intermixed values freely -- with a single class PairOrient. This is a large commit, with many moderate size changes to change from a switch(char) to a chain of if-else's. It has been tested against drosophila willistoni.
  172. 2010/02/19 -- (sffToCA) - Check for incomplete/truncated SFF files.
  173. 2010/02/22 -- (gatekeeper) - Simple utility to sample mated fragments from fastq.
  174. 2010/02/22 -- (runCA/MBT) - Add Mer Based Trimming.
  175. 2010/02/22 -- (tracedb-to-frg) - Don't bzip compress the output fragments. This really should be an option.
  176. 2010/02/23 -- (cgw) - Disable the LENGTH FIXUP ALERT assert.
  177. 2010/02/23 -- (cgw) - Disable the assert on loading a hash table with duplicate nodes. This was supposed to be fixed on 2010/02/16.
  178. 2010/02/23 -- (gatekeeper/fastqToCA) - Add support for translating QVs from fastq (three flavors) to CA.
  179. 2010/02/23 -- (cgw) (BUG) (BAD until 2010/03/28) - Refactor (slightly) ComputeOverlaps() to prevent it from changing the hash key of an overlap while the key is used in the hash table. Also, remove the (unused) partitioning of the compute ("inner" and "outer"). The other big block of change uses a local variable (again, an overlap) instead of an unknown overlap passed in (to prevent that unknown overlap from being one used in the hash table).
  180. 2010/02/26 -- (runCA) - redirect terminator logging to a file instead of the screen.
  181. 2010/02/26 -- (overlapStore) - Add some sanity checking on the input overlap. If either of the IDs is invalid, fail.
  182. 2010/02/26 -- (cgw) - Change an assert (on negative variance) into an action of not placing a rock in a scaffold.
  183. 2010/02/26 -- (cgw/terminator) - Fix (rare?) bug in terminator where it would want to merge contigs immediately after loading a checkpoint. In this particular case, terminator was loading a checkpoint, which would call a "scaffold cleanup". The cleanup detected that two overlapping contigs could be merged, and would do so. This failed, as the scaffold data is read only.
  184. 2010/02/26 -- (gatekeeper) - Assert that clear ranges are valid when set.
  185. 2010/02/27 -- (tigStore) - Fix incorrect parsing of 'layout' files on Linux.
  186. 2010/03/01 -- (posmap) - Fix printing of 'chimeric' in ULK and CLK. Fix so that large VARs do not crash when converting from contigs to scaffolds. VERY large VARs will fail an assert, however.
  187. 2010/03/03 -- (utgcns) - Defer aligning fragments that are difficult to align, until after most of the consensus sequence is constructed.
  188. 2010/03/04 -- (BOG) - Remove an assert from addContainedFrag() that would fail if the frag cannot be added; clients must do the assert themselves. In popBubbles() correctly (hopefully) detect when a unitig cannot be merged completely.
  189. 2010/03/05 -- (gatekeeper) - Allow the clear range to be set to clrBgn==clrEnd, used by sffToCA in a rare case. See also 2010/02/26.
  190. 2010/03/05 -- (sffToCA) - Add the options used and files processed to the stats file.
  191. 2010/03/05 -- (sffToCA) - Fix an assert when using -clear discard-n; reads with N were properly being discarded, but before their length was counted which caused the assert to fail. (`st.readsInSFF == st.lenTooShort + st.lenOK + st.lenTrimmedByN + st.lenTooLong')
  192. 2010/03/05 -- (sffToCA) - Fix a problem with "-clear pair-of-n" (and probably "-clear n") where any read that started with Ns would cause a crash because the clear range is now (0,0).
  193. 2010/03/11 -- (build) - Add support for BUILDPROFILE on FreeBSD.
  194. 2010/03/16 -- (BOG) - Optimize the space usage of DoveTailNode. Refactor the insert size estimation compute (slight changes). This change also affects mate-based splitting and unhappy containment ejection.
  195. 2010/03/16 -- (gatekeeper) - Add an orientation option (innie or outtie) to Illumina reads.
  196. 2010/03/16 -- (deduplicate) - Change error rate on allowable fragment overlaps from .025% to 2.5%. This was broken since XXXX.
  197. 2010/03/17 -- (deduplicate) - Fix major problem with deduplicating non-mated fragments...it just didn't work before.
  198. 2010/03/17 -- (utgcns) - Don't try to align to defered fragments that aren't in the multialign yet. See 2010/03/03.
  199. 2010/03/24 -- (closure) - Adjust positions of a closure contig by centering it in the gap rather than setting it to span the entire gap
  200. 2010/03/24 -- (cgw) - Fix bugs in the doUnjiggleWhenMerging option. See 2009/10/27.
  201. 2010/03/18 -- (utgcns) - Don't align fragments to the partial consensus sequence before it is updated.
  202. 2010/03/18 -- (runCA) - Add alias doECR for doExtendClearRanges.
  203. 2010/03/19 -- (posmap) - Adjust number of supporting links by -1 to account for an overlap edge (but only for ULK).
  204. 2010/03/22 -- (merTrim) - optimize and thread.
  205. 2010/03/22 -- (gatekeeper) - Better support for dumping fragments as FastQ. See WIKILINK_TO_FASTQ_PAGE.
  206. 2010/03/25 -- (gatekeeper) - Fix a memory corruption in programs that read the asm file without a gatekeeper store (terminator, buildPosMap).
  207. 2010/03/25 -- (merTrim) - Add -V (super verbose mode), -t (set number of compute threads), use two gatekeeper store instances, one for read-only, and one to update the clear ranges.
  208. 2010/03/28 -- (cgw) - Fix a problem introduced on 2010/02/23, where a potential overlap is not marked as 'false' when the overlap really doesn't exist. This led to cgw running very slowly, and lots of negative gaps in scaffolds caused by contigs that could not be merged.
  209. 2010/03/29 -- (posmap) - Adjust number of supporting links by -1 to account for an overlap edge (now for both ULK and CLK). SF bug #2976474.
  210. 2010/03/29 -- (gatekeeper) - For fragments longer than the maximum supported, correctly nul-terminate the seq/qlt strings, and correctly set the clear ranges.
  211. 2010/03/29 -- (gatekeeper) - Print useful error messages about invalid clear ranges in setClearRange, then fail.
  212. 2010/03/29 -- (runCA) - Report the SGE qsub command to stderr, just like all other commands run.
  213. 2010/03/29 -- (BOG) - Add a diagnostic that reports the number of good/bad mates at various spots throughout the algorithm.
  214. 2010/03/29 -- (BOG) - Initialize 'samples' in DistanceCompute on construction (a recent regression).
  215. 2010/03/29 -- (runCA) - Write mer trimming logging to 'merTrimLog', not stderr.
  216. 2010/03/29 -- (tigStore) - By default, dump ungapped consensus for "-d consensus". Use "-d consensusgapped" to get multialign gaps (the previous behavior).
  217. 2010/03/30 -- (overlap) - Replace a compile-time sized structure with a run-time sized structure (for storing hash table data).
  218. 2010/03/31 -- (runCA) - Change the way spec files are loaded. Now, multiple -s options are allowed, any spec file supplied is converted to an absolute path, and 'default' spec files in $bin/spec are handled better.
  219. 2010/04/02 -- (gatekeeper) - Report error messages to stderr when a fastq file cannot be opened.
  220. 2010/04/02 -- (tigStore) - Write 'properties' and 'consensus' to stdout, not stderr.
  221. 2010/04/02 -- (fastqSample) - Check that the file being sampled is actually FastQ. Emit warnings if the user-supplied number of reads in the file (-t) is not correct.
  222. 2010/04/02 -- (BOG) - Add more logging during unitig splits.
  223. 2010/04/02 -- (overlap) - Add command line option --maxreadlen to set the sizes used in the hash table bit-packed data. A continuation of 2010/03/30.
  224. DateUnknown -- The ASM format changed. The UTG message lost its 'src' field and gained an 'mhp' field. See ASM spec for details.

KMER CHANGES

  1. r1802 2009-06-13 20:00:53 -0400 -- Bugs in random sequence generation.
  2. r1803 2009-07-07 19:24:56 -0400 -- Check for version 4.2 of gnuplot, exit if not found.
  3. r1804 2009-07-07 19:25:19 -0400 -- Change in leaff command line options (and breakage there) was causing atac to find no matches.
  4. r1805 2009-07-21 00:54:15 -0400 -- Remove obsolete devel targets.
  5. r1806 2009-07-21 00:55:14 -0400 -- Transfer ownership of kmer and merstream so they get deleted.
  6. r1807 2009-07-21 00:59:28 -0400 -- Add null methods for future lazy loading of index data.
  7. r1808 2009-07-24 07:53:49 -0400 -- On wiki.
  8. r1809 2009-07-24 07:59:51 -0400 -- Small memory leak.
  9. r1810 2009-07-24 08:01:04 -0400 -- When given a memory limit, use the smaller of the limit and the number of mers in the file. Fixes quirk where small sequences run in large memory were sloooow.
  10. r1811 2009-07-24 08:02:00 -0400 -- Be more careful about using stdin.
  11. r1812 2009-07-24 08:03:24 -0400 -- Report memory usage.
  12. r1813 2009-07-24 08:03:43 -0400 -- Remove stdin fasta support from fastaFile. Make a new class, fastaStdin, to support that explicitly.
  13. r1814 2009-08-04 10:08:33 -0400 -- Add operation divide.
  14. r1815 2009-08-07 19:48:39 -0400 -- Add MAXEXIST. Change usage from large string to individual prints.
  15. r1816 2009-08-07 21:29:09 -0400 -- Add test directory, test-merstream-speed.
  16. r1817 2009-08-07 23:42:41 -0400 -- Finish adding MAXEXIST. Fix signedness problem with personality selection.
  17. r1818 2010-01-15 14:27:31 -0500 -- Fix stale pointer; meryl was crashing in the batch build mode.
  18. r1819 2010-02-12 19:36:48 -0500 -- (build) - Add python2.6.