There are several useful nanopore raw signal datasets uploaded publically by others, either in SRA or ENA and also as Amazon AWS public datasets. The basecalled data associated with those samples are sometimes not up to date. For benchmarking purposes, sometimes we rebasecall data, which takes considerable human effort as well as computational cost. Under this project, I will upload such rebasecalled data, so someone who needs such up-to-date data can save their time.
As ONT's own basecaller, Guppy, crashes on some old FAST5 datasets, we had to curate those datasets by converting them to BLOW5 [https://www.nature.com/articles/s41587-021-01147-4]. Once converted, buttery-eel [https://github.com/Psy-Fer/buttery-eel], the BLOW5 wrapper for Guppy, could be used for efficient basecalling. Data in BLOW5 format is consistent, smaller than FAST5 and faster than FAST5. Under this project, I will also upload those curated raw datasets in BLOW5 format.
Note that the original data uploaded under this project was generated by others. I will link to those original sources under the description of each run, so make sure to acknowledge those who originally generated the sources of these data. Less...