A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
This report has been generated by the Arcadia-Science/metagenomics analysis pipeline using the Nanopore workflow.
Report
generated on 2023-05-19, 13:05 UTC
based on data in:
/tmp/nxf.tq8He2WwdV
General Statistics
Showing 18/18 rows and 7/9 columns.| Sample Name | N50 (Kbp) | Assembly Length (Mbp) | Error rate | M Non-Primary | M Reads Mapped | % Mapped | M Total seqs |
|---|---|---|---|---|---|---|---|
| EL12weeks | 2.81% | 0.2 | 1.0 | 97.5% | 1.0 | ||
| EL12weeks_polished | 220.2Kbp | 67.3Mbp | |||||
| EL2weeks | 4.04% | 0.4 | 1.4 | 99.3% | 1.4 | ||
| EL2weeks_polished | 1026.8Kbp | 73.3Mbp | |||||
| EL4weeks | 2.46% | 0.4 | 1.2 | 99.4% | 1.2 | ||
| EL4weeks_polished | 832.1Kbp | 62.5Mbp | |||||
| OM2weeks | 2.69% | 0.3 | 1.8 | 99.5% | 1.8 | ||
| OM2weeks_polished | 266.9Kbp | 76.7Mbp | |||||
| OM4weeks | 2.74% | 0.3 | 1.2 | 98.4% | 1.2 | ||
| OM4weeks_polished | 210.7Kbp | 34.0Mbp | |||||
| OM8weeks | 2.61% | 0.2 | 1.2 | 98.5% | 1.2 | ||
| OM8weeks_polished | 272.7Kbp | 81.5Mbp | |||||
| WH1month | 4.51% | 0.4 | 1.6 | 98.5% | 1.6 | ||
| WH1month_polished | 91.6Kbp | 99.3Mbp | |||||
| WH2months | 3.73% | 0.5 | 1.7 | 98.4% | 1.7 | ||
| WH2months_polished | 73.2Kbp | 88.9Mbp | |||||
| WH4months | 3.11% | 0.2 | 1.0 | 98.6% | 1.0 | ||
| WH4months_polished | 195.8Kbp | 84.6Mbp |
NanoStat
NanoStat various statistics from a long read sequencing dataset in fastq, bam or sequencing summary format.DOI: 10.1093/bioinformatics/bty149.
Fastq stats
NanoStat statistics from FastQ files.
| Sample Name | Median length | Read N50 | Median Qual | # Reads (K) | Total Bases (Mb) |
|---|---|---|---|---|---|
| EL12weeks_nanoplot_stats | 2028 bp | 6990 bp | 20.4 | 1010.1 | 3941.9 |
| EL2weeks_nanoplot_stats | 2030 bp | 3705 bp | 15.7 | 1365.8 | 4051.3 |
| EL4weeks_nanoplot_stats | 2020 bp | 6758 bp | 19.9 | 1183.7 | 4640.5 |
| OM2weeks_nanoplot_stats | 2957 bp | 8371 bp | 18.4 | 1812.1 | 9181.5 |
| OM4weeks_nanoplot_stats | 2002 bp | 4967 bp | 19.0 | 1203.6 | 4020.6 |
| OM8weeks_nanoplot_stats | 2594 bp | 8563 bp | 20.0 | 1247.8 | 5959.5 |
| WH1month_nanoplot_stats | 3107 bp | 6150 bp | 17.1 | 1605.9 | 7218.9 |
| WH2months_nanoplot_stats | 1958 bp | 3623 bp | 17.7 | 1741.3 | 5000.4 |
| WH4months_nanoplot_stats | 1996 bp | 5587 bp | 20.1 | 995.7 | 3499.4 |
Reads by quality
Read counts categorised by read quality (phred score).
Sequencing machines assign each generated read a quality score using the Phred scale. The phred score represents the liklelyhood that a given read contains errors. So, high quality reads have a high score.
Data may come from NanoPlot reports generated with sequencing summary files or alignment stats. If a sample has data from both, the sequencing summary is preferred.
QUAST
QUAST is a quality assessment tool for genome assemblies, written by the Center for Algorithmic Biotechnology.DOI: 10.1093/bioinformatics/btt086.
Assembly Statistics
| Sample Name | N50 (Kbp) | L50 (K) | Largest contig (Kbp) | Length (Mbp) |
|---|---|---|---|---|
| EL12weeks_polished | 220.2Kbp | 0.0K | 4096.3Kbp | 67.3Mbp |
| EL2weeks_polished | 1026.8Kbp | 0.0K | 3926.7Kbp | 73.3Mbp |
| EL4weeks_polished | 832.1Kbp | 0.0K | 3928.8Kbp | 62.5Mbp |
| OM2weeks_polished | 266.9Kbp | 0.1K | 3928.1Kbp | 76.7Mbp |
| OM4weeks_polished | 210.7Kbp | 0.0K | 3931.7Kbp | 34.0Mbp |
| OM8weeks_polished | 272.7Kbp | 0.0K | 4456.2Kbp | 81.5Mbp |
| WH1month_polished | 91.6Kbp | 0.1K | 3823.9Kbp | 99.3Mbp |
| WH2months_polished | 73.2Kbp | 0.2K | 1956.0Kbp | 88.9Mbp |
| WH4months_polished | 195.8Kbp | 0.1K | 3932.6Kbp | 84.6Mbp |
Number of Contigs
This plot shows the number of contigs found for each assembly, broken down by length.
Samtools
Samtools is a suite of programs for interacting with high-throughput sequencing data.DOI: 10.1093/bioinformatics/btp352.
Percent Mapped
Alignment metrics from samtools stats; mapped vs. unmapped reads.
For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.
Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).
Alignment metrics
This module parses the output from samtools stats. All numbers in millions.
Arcadia-Science/metagenomics Software Versions
are collected at run time from the software output.
| Process Name | Software | Version |
|---|---|---|
| CHECK_SAMPLESHEET | python | 3.9.5 |
| CUSTOM_DUMPSOFTWAREVERSIONS | python | 3.10.6 |
| yaml | 6.0 | |
| FLYE | flye | 2.9-b1768 |
| MEDAKA | medaka | 1.4.4 |
| METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS | metabat2 | 2.15 |
| MINIMAP2_ALIGN | minimap2 | 2.24-r1122 |
| MINIMAP2_INDEX | minimap2 | 2.24-r1122 |
| NANOPLOT | nanoplot | 1.41.0 |
| PORECHOP_ABI | porechop_abi | 0.5.0 |
| PRODIGAL | pigz | 2.6 |
| prodigal | 2.6.3 | |
| SAMTOOLS_STATS | samtools | 1.16.1 |
| SOURMASH_COMPARE | sourmash | 4.6.1 |
| SOURMASH_GATHER | sourmash | 4.6.1 |
| SOURMASH_SKETCH | sourmash | 4.6.1 |
| SOURMASH_TAXANNOTATE | sourmash | 4.6.1 |
| Workflow | Arcadia-Science/metagenomics | 1.0dev |
| Nextflow | 23.04.1 |
Arcadia-Science/metagenomics Workflow Summary
- this information is collected when the pipeline is started.