QC Metrics

From wgs-assembler
Jump to: navigation, search

The N50 size of a set of entities (e.g., contigs or scaffolds) represents the largest entity E such that at least half of the total size of the entities is contained in entities larger than E. For example if we have a collection of contigs with sizes 7, 4, 3, 2, 2, 1, and 1 kb, the N50 length is 4 because we can cover 10 kb with contigs bigger than 4kb.

[Scaffolds]
TotalScaffolds = 26 the total number of scaffolds in the assembly.
TotalContigsInScaffolds = 38 the total number of contigs that made it into scaffolds.
MeanContigsPerScaffold = 1.46 the average number of contigs in a scaffold.
MinContigsPerScaffold = 1 the minimum number of contigs in a scaffold.
MaxContigsPerScaffold = 4 the maximum number of contigs in a scaffold.
TotalBasesInScaffolds = 2365510 the sum of all contig sizes for the contigs in scaffolds.
MeanBasesInScaffolds = 90981 the average scaffold size. The size of a scaffold is the sum of all contigs contained in that scaffold.
MinBasesInScaffolds = 485 the minimum size of a scaffold.
MaxBasesInScaffolds = 745692 the maximum size of a scaffold.
N25ScaffoldBases = 745692 the length of the largest scaffold for which the following is true: the sum of its length and the lengths of all larger scaffolds equals to 25% of the total assembly length.
N50ScaffoldBases = 460070 the length of the largest scaffold for which the following is true: the sum of its length and the lengths of all larger scaffolds equals to 50% of the total assembly length.
N75ScaffoldBases = 190089 the length of the largest scaffold for which the following is true: the sum of its length and the lengths of all larger scaffolds equals to 75% of the total assembly length.
ScaffoldAt1000000 = 460070 summing the lengths of all scaffolds in decreasing order, this is the length of the scaffold when total scaffold length is 1,000,000 bp.
ScaffoldAt2000000 = 49941 summing the lengths of all scaffolds in decreasing order, this is the length of the scaffold when total scaffold length is 2,000,000 bp.
TotalSpanOfScaffolds = 2366881 the sum of all contig sizes and gaps in all scaffolds.
MeanSpanOfScaffolds = 91034 the average span of a scaffold.
MinScaffoldSpan = 485 the minimum span of a scaffold.
MaxScaffoldSpan = 745672 the maximum span of a scaffold.
IntraScaffoldGaps = 12 the number of sequencing gaps in all scaffolds.
2KbScaffolds = 20 the count of scaffolds whose span >= 2kbp.
2KbScaffoldSpan = 2361688 the cummulative span of scaffolds whose span >= 2kbp.
MeanSequenceGapLength = 114 the average length of a sequencing gap.
[Top5Scaffolds=contigs,size,span,avgContig,avgGap,EUID]
0 = 2,745692,745672,372846,-20,9604
1 = 4,460070,460310,115018,80,9602
2 = 2,221845,221825,110922,-20,9599
3 = 1,190326,190326,190326,0,9601
4 = 3,190089,190089,63363,0,9595
total = 12,1808022,1808222,150668,29
[Contigs]
TotalContigsInScaffolds = 38 the total number of contigs that made it into scaffolds.
TotalBasesInScaffolds = 2365510 the sum of all contig sizes for the contigs in scaffolds.
TotalVarRecords = 4452 the total number of var records in the contigs. Each var record indicates a possible SNP or high quality difference between the underlying reads.
MeanContigLength = 62250 the average contig length.
MinContigLength = 485 the minimum contig length.
MaxContigLength = 744546 the maximum contig length.
N25ContigBases = 744546 the length of the largest contig for which the following is true: the sum of its length and the lengths of all larger contigs equals to 25% of the total contig length
N50ContigBases = 190326 the length of the largest contig for which the following is true: the sum of its length and the lengths of all larger contigs equals to 50% of the total contig length
N75ContigBases = 63054 the length of the largest contig for which the following is true: the sum of its length and the lengths of all larger contigs equals to 75% of the total contig length
ContigAt1000000 = 274067 summing the lengths of all contigs in decreasing order, this is the length of the contig when total contig length is 1,000,000 bp.
ContigAt2000000 = 38509 summing the lengths of all contigs in decreasing order, this is the length of the contig when total contig length is 1,000,000 bp.
[BigContigs_greater_10000]
TotalBigContigs = 28 the number of contigs bigger than 10kb.
BigContigLength = 2354038 the sum of the sizes of all contigs bigger than 10kb.
MeanBigContigLength = 84073 the minimum contig length in contigs over 10kb.
MinBigContig = 12338 the minimum contig size in contigs over 10kb.
MaxBigContig = 744546 the maximum contig size in contigs over 10kb. Should be the same as MaxContigSize.
BigContigsPercentBases = 99.52 the percentage of TotalBasesInScaffolds contained in contigs over 10kb.
[SmallContigs]
TotalSmallContigs = 10 the number of contigs smaller than 10kb.
SmallContigLength = 11472 the sum of the sizes of all contigs smaller than 10kb.
MeanSmallContigLength = 1147 the average length of contigs under 10kb.
MinSmallContig = 485 the minimum contig size in contigs under 10kb. Should be the same as MinContigSize.
MaxSmallContig = 2522 the maximum contig size in contigs under 10kb.
SmallContigsPercentBases = 0.48 the percentage of TotalBasesInScaffolds contained in contigs under 10kb.
[DegenContigs]
TotalDegenContigs = 3226 the number of degenerate contigs (contigs that do not appear in scaffolds).
DegenContigLength = 1461756 the sum of the sizes of all degenerate contigs.
MeanDegenContigLength = 453 the average length of degenerate contigs.
MinDegenContig = 68 the minimum size of a degenerate contig.
MaxDegenContig = 7612 the maximum size of a degenerate contig.
DegenPercentBases = 61.79 the ratio (as percentage points) between DegenContigLength and TotalBasesInScaffolds. Note that degenerate contigs are not counted as part of TotalBasesInScaffolds.
[Top5Contigs=reads,bases,EUID]
0 = 182445,744546,9578
1 = 63719,274067,9574
2 = 44864,190326,9572
3 = 33161,131399,9570
4 = 26026,118015,9573
total = 350215,1458353
[UniqueUnitigs]
TotalUUnitigs = 2100 total number of unitigs with A-stats higher than 5 (unique unitigs)
MinUUnitigLength = 64 the minimum unique unitig length
MaxUUnitigLength = 14743 the maximum unique unitig length
MeanUUnitigLength = 1400 the average unique unitig length
SDUUnitigLength = 2608 the standard deviation unique unitig lengths
[Surrogates]
TotalSurrogates = 988 total number of surrogates in the assembly. A surrogate is a contig containing repetitive or ambiguous reads.
SurrogateInstances = 1182 number of instances in contigs where surrogate reads are placed. One surrogate may be placed in multiple locations.
SurrogateLength = 658224 sum of all surrogate contig lengths.
SurrogateInstanceLength = 841946 sum of all surrogate instance lengths.
UnPlacedSurrReadLen = 22302386 sum of all unplaced surrogate read lengths.
PlacedSurrReadLen = 408114 sum of all placed surrogate read lengths.
MinSurrogateLength = 112 size of smallest surrogate.
MaxSurrogateLength = 10121 size of largest surrogate.
MeanSurrogateLength = 666 mean size of a surrogate.
SDSurrogateLength = 1034 standard deviation of surrogate sizes assuming a normal distribution.
[Mates]
ReadsWithNoMate = 572634(88.78%) number of reads (out of TotalReads) that did not have a mate
ReadsWithGoodMate = 52160(8.09%) number of reads (out of TotalReads) that had a good mate
ReadsWithBadShortMate = 0(0.00%) number of reads (out of TotalReads) that had a mate too close together
ReadsWithBadLongMate = 214(0.03%) number of reads (out of TotalReads) that had a mate too far apart
ReadsWithSameOrientMate = 416(0.06%) number of reads (out of TotalReads) that had a mate oriented the same direction
ReadsWithOuttieMate = 220(0.03%) number of reads (out of TotalReads) that had a mate oriented away from each other
ReadsWithBothChaffMate = 32(0.00%) number of reads where the reads in a mate are chaff (singleton)
ReadsWithChaffMate = 906(0.14%) number of reads where the mate is chaff (singleton)
ReadsWithBothDegenMate = 2690(0.42%) number of reads where both reads in a mate are degenerates
ReadsWithDegenMate = 2122(0.33%) number of reads where its mate is a degenerate
ReadsWithBothSurrMate = 0(0.00%) number of reads where both reads are surrogates
ReadsWithSurrogateMate = 9436(1.46%) number of reads where the mate is a surrogate
ReadsWithDiffScafMate = 4206(0.65%) number of reads where the mate resides in a different scaffold
ReadsWithUnassignedMate = 0(0.00%) number of reads where the mate is unassigneds
TotalScaffoldLinks = 0 number of links between scaffolds. These represent linking information currently conflicting with the existing scaffolds. The lower this number the better.
MeanScaffoldLinkWeight = 0.00 average weight (# of mate pairs) of links between scaffolds.
[Reads]
TotalReadsInput = NA the total number of reads supplied to the assembler. Paired end SFF files are NOT accurately accounted.
TotalUsableReads = 645036 the total number of reads included in the assembly.
AvgClearRange = 327 the average read clear range (i.e. the usable portion of each read - clear of vector and bad quality bases
ContigReads = 545225(84.53%) the number of reads that belong to contigs.
BigContigReads = 543465(84.25%) number of reads that belong to contigs over 10kb in size.
SmallContigReads = 1760(0.27%) number of reads that belong to contigs under 10kb in size.
DegenContigReads = 25606(3.97%) number of reads in degenerate contigs.
SurrogateReads = 70715(10.96%) number of reads in surrogates - potentially repetitive or ambiguously placed contigs.
PlacedSurrogateReads = 3526(0.55%) number of placed reads in surrogates.
SingletonReads = 7016(1.09%) number of reads that are neither in contigs, nor surrogates, nor degenerate contigs.
ChaffReads = 7007(1.09%) number of reads that are neither in contigs, nor surrogates, nor degenerate contigs.
[Coverage]
ContigsOnly = 75.28 coverage (redundancy) of all contigs in scaffolds: length of all the reads in contigs or surrogates divided by the size of all scaffolds
Contigs_Surrogates = 84.71 coverage of all contigs and surrogates: length of all the reads in contigs and surrogates divided by the size of all scaffolds.
Contigs_Degens_Surrogates = 54.57 coverage of all contigs, degenerates, and surrogates: length of all the reads in contigs, surrogates, and degenerates divided by the size of all scaffolds and degenerates.
AllReads = 89.14 coverage you paid for: length of all the reads divided by the size of the scaffolds.
[TotalBaseCounts]
BasesCount = NA Total count of all bases for all reads (inclues vector and bad quality regions
ClearRangeLengthFRG = NA Total clear range for all input reads (from frg file)
ClearRangeLengthASM = 210856220 Total clear range for all used reads (per asm file). This excludes reads trimmed by OBT
SurrogateBaseLength = 22710500 Total length of surrogate reads. (Same as UnPlacedSurrReadLen + PlacedSurrReadLen)
ContigBaseLength = 178069925 Total length of contig reads.
DegenBaseLength = 8471349 Total length of degenerate reads
SingletonBaseLength = 2012560 Total length of singleton reads
Contig_SurrBaseLength = 200372311 Total length of contig reads and unplaced surrogate reads. (Same as UnPlacedSurrReadLen + ContigBaseLength)
[gcContent]
Content = 42.34 The percentage of gc content in all the scaffold contigs.
[Unitig Consensus]
NumColumnsInUnitigs = 21322792
NumGapsInUnitigs = 1256051
NumRunsOfGapsInUnitigReads = 95810082
[Contig Consensus]
NumColumnsInUnitigs = 4213072
NumGapsInUnitigs = 385852
NumRunsOfGapsInUnitigReads = 31174926
NumColumnsInContigs = 4184982
NumGapsInContigs = 357749
NumRunsOfGapsInContigReads = 27813096
NumAAMismatches = 6059
NumVARRecords = 4452
NumVARStringsWithFlankingGaps = 3538
[Read Depth Histogram]
d < 3Kbp < 10Kbp < 1Mbp < inf
0 0 0 0 0
1 477 0 55458 0
2 168 0 34337 0
(and so on)