A sequence read set is designed to hold large sets of short reads generated by next generation sequencing (NGS). Base sequences and their associated quality scores are stored for single-end and paired-end reads, originating from various high-throughput sequencing platforms such as Illumina, Ion Torrent, PacBio, Oxford Nanopore, etc.
Sequence read sets
Import links to data stored at the SRA and ENA repositories
Import links to FASTQ files stored on a local hard drive or file server
Calculate a de novo assembly on the external calculation engine
Importing links to data stored at SRA, ENA, Amazon S3 and BaseSpace
In this tutorial the steps to import links to following online repositories are described:
Importing FASTQ files and FASTQ file links
Essentially, there are two ways to import FASTQ files in your BioNumerics database: the default import method stores the sequence reads in the BioNumerics database and the second import method only imports the links to the location of the FASTQ files. In this tutorial both options are described.
Hotfix for CE Store Uploader issue
Recently, when I post jobs on the Cloud Calculation Engine for which the fastq.gz files are stored on a local file server, the job statuses often remain "WaitingQueued" for more than 24 hours after which the jobs fail. I haven't experienced this before, what could be the reason?
Hotfix for importing sequence read sets from NCBI
During import of a sequence read set as link via its NCBI SRA accession number, I get a “file not found” (HTTP 404) error.
Performing a de novo assembly on the external calculation engine
This tutorial illustrates how to import FASTQ file links into a BioNumerics database and finally how to perform a de novo assembly on the external calculation engine.
FASTQ files
This data set contains 10 gzipped fastq files of 5 paired end read data file pairs coming from Staphylococcus aureus and an Excel file containing some metadata on the sequence read sets. This data was generated by Illumina MiSeq whole genome sequencing and downloaded from NCBI.