GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM7703591

Query DataSets for GSM7703591

Status

Public on Aug 14, 2023

Title

CH03 ONT-junctions

Sample type

SRA

Source name

PBMC

Organism

Homo sapiens

Characteristics

tissue: PBMC
cell type: CD34+
disease: Clonal Hematopoiesis

Extracted molecule

polyA RNA

Extraction protocol

Cryopreserved mononuclear cells isolated from bone marrow biopsies or peripheral blood from myelodysplastic syndrome patients with SF3B1 mutations were retrieved from Memorial Sloan Kettering, and University of Manchester. Additionally, cryopreserved G-CSF mobilized stem cell grafts (without additional mobilizing agents such as plerixafor or cyclophosphamide) from CH patients with SF3B1 mutations were retrieved from the Dana Farber Cancer Institute. Cryopreserved mononuclear cells and grafts were thawed and stained using standard procedures. Cells were first incubated with Human FcX blocking solution (Biolegend, #422302) and then incubated with the surface antibody CD34-PE-Vio770 (clone AC136, lot #5180718070, dilution 1:50, Miltenyi Biotec) and DAPI (Sigma-Aldrich) for 10 minutes at 4°C. Cells were then sorted for DAPI-negative, CD34+ cells using BD Influx at the Weill Cornell Medicine flow cytometry core. Additionally, cryopreserved G-CSF mobilized stem cell grafts (without additional mobilizing agents such as plerixafor or cyclophosphamide) from CH patients with SF3B1 mutations were retrieved from the Dana Farber Cancer Institute and the Weizmann Institute of Science. To confirm the absence of additional genetic mutations, CH samples were sequenced using a previously described panel135 that includes myeloma driver mutations as well as CH-specific mutations. All samples underwent ultra-low pass whole genome sequencing, rejecting the presence of tumor contamination. Cryopreserved mononuclear cells and grafts were thawed and stained using standard procedures. Cells were first incubated with Human FcX blocking solution (Biolegend, #422302) and then incubated with the surface antibody CD34-PE-Vio770 (clone AC136, lot #5180718070, dilution 1:50, Miltenyi Biotec) and DAPI (Sigma-Aldrich) for 10 minutes at 4°C. Cells were then sorted for DAPI-negative, CD34+ cells using BD Influx at the Weill Cornell Medicine flow cytometry core.
The standard 10x Genomics Chromium 3’ (v.3.1 chemistry) and CITE-seq protocols35,36 were carried out according to manufacturer’s recommendations for the generation of scRNA-seq and ADT libraries (Fig. 1A). At the cDNA amplification step in the 10x Genomics protocol, 1 μL of 1 μM spike-in primer (5’ –GATCCTCGTCCTCATTGAACCGC– 3’) was added to increase the yield of SF3B1 cDNA and 1 µL of 0.2 µM ADT PCR additive primer (5’ – CCTTGGCACCCGAGAATTCC – 3’) was added to amplify ADT. After cDNA amplification and a double-sided cleanup with SPRI beads to separate cDNA and ADT fractions, the ADT fraction was amplified for 10 cycles with SI-PCR oligo (10x Genomics) and TruSeq Small RNA RPI-x (Illumina) primers to index the samples. SPRI was used to clean up the ADT final products.In both samples in which CITE-seq was conducted and not conducted, cDNA was allocated for gene expression library creation (standard 10x protocol; 25% of cDNA), targeted genotyping (10% of cDNA), and ONT on-bead library preparation (10ng of cDNA). Any remaining cDNA was stored.For locus-specific amplification (GoT), two serial PCRs were performed with nested reverse primers, based on the SF3B1 mutation of interest. For mutations upstream of K700E, (5’ – GATCCTCGTGGTCATTGAACCGC – 3’ and 5’ – CACCCGAGAATTCCAGGCTACTATGATCTCTACCATGAGACCTG – 3’) and, for K700E mutations, (5’ – GTGCAAAAGCAAGAAGTCCT – 3’ and 5’ –CACCCGAGAATTCCATGAACATGGTCTTGTGGATGAGC – 3’) were used as reverse primers. These reverse primers and the generic forward SI-PCR amplify the site of interest from the cDNA template (10 PCR cycles each). The second locus-specific reverse primers contain a partial Illumina TruSeq Small RNA read 2 handle and a locus-specific region to allow SF3B1 specific priming. The SI-PCR oligo (10x Genomics) anneals to the partial Illumina TruSeq read 1 sequence, preserving the cell barcode (CB) and unique molecule identifier (UMI). After these rounds of amplification and SPRI purification to remove unincorporated primers, a third PCR was performed with a generic forward PCR primer (P5_generic, 5’ – AATGATACGGCGACCACCGAGATCTACAC – 3’) to retain the CB and UMI together with an RPI-x primer (Illumina) to complete the P7 end of the library and add a sample index (6 PCR cycles).Gene expression, ADT, and SF3B1 amplicon libraries were pooled to receive 25,000, 5,000, and 5,000 reads per cell, respectively, during Illumina sequencing. The cycle settings were as follows: 28 cycles for read 1, 90 cycles for read 2, 10 cycles for i7, and 10 cycles for i5 sample index. To examine splicing patterns broadly in the whole transcriptome, full length cDNA was sequenced using the Oxford Nanopore Technology sequencing platform on PromethION flow cells. To enrich for transcripts that contain CBs and UMIs and decrease the presence of PCR artifacts, on-bead PCR with a biotinylated primer selecting for an adapter upstream of the CB was completed. In brief, 10ng of full length cDNA was amplified with LongAmp master mix (NEB) and TSO (5’ – NNNAAGCAGTGGTATCAACGCAGAG – 3’) and biotinylated read 1 (5’ – /5Biosg/AAAAACTACACGACGCTCTTCCGATCT – 3’) primers for 5 cycles. After cleanup with SPRI, 1X and 5X SSPE-washed M270 streptavidin beads (ThermoFisher) were added to the PCR product. After a 15 minute incubation and washes with 1X SSPE and 10mM Tris-HCl (pH 8, the bead-bound PCR products and beads were resuspended in PCR master mix for further amplification. Samples were amplified with LongAmp master mix and TSO and read 1 (5’ – NNNCTACACGACGCTCTTCCGATCT – 3’) primers for 5 cycles. After cleanup with SPRI, 1000ng of each full length cDNA library was sequenced on one the PromethION flow cell.
Single-cell RNA-seq (10X and long-read ONT) with targeted amplification of a mutation of interest. Two runs of the experiment, A and B, were completed for sample MDS02.

Library strategy

RNA-Seq

Library source

transcriptomic single cell

Library selection

cDNA

Instrument model

PromethION

Description

ONT

Data processing

10X Illumina data was processed using Cell Ranger (v.3.1.0) with default parameters and reads were aligned to the human reference sequence. CITE-seq samples were processed in a similar way to the described in the ADT manual from 10x Genomics.
Genotyping of single cells was carried out with the IronThrone (v.2.1) pipeline. In brief, individual amplicon reads were assessed for the appropriate structure (i.e. presence of the primer sequence and the expected sequence between the primer and given mutation site) and all reads were assessed for a matching cell barcode to the list generated from the 10X paired GEX dataset. A Levenshtein distance of 0.1 was allowed for all sequence matching and collapsing steps and only UMIs with a minimum of 2 supporting reads were retained for final genotyping. Following UMI collapse, genotype assignment of individual UMIs was conducted as described previously with majority rule of supporting reads for wildtype or mutant status (using a 0.7 PCR read ratio, above which the majority of PCR reads must be in order for a UMI to be called definitively). Rare UMIs that did not pass this threshold were removed as ambiguous. Additionally, to remove reads that result from PCR recombination, UMIs in the amplicon library that match UMIs of non-SF3B1 genes in the gene expression library were discarded. Each single cell was assigned as either mutant or wildtype as follows: cells with at least 1 mutant UMI were assigned as mutant cells and cells with 0 mutant UMIs and at least 1 wildtype UMI were assigned as wildtype.
scRNA-seq ONT long-read sequencing data processing, alignment, junction calling and annotation
Guppy v3.0.6 - 4.0.11 was used for basecalling output from ONT sequencing to create FAST5 files that were then converted to FASTQ files. After generating FASTQ files from the long-read ONT sequencing, we then filtered for only reads containing a polyA tail within 100 base pairs of either 5’ or 3’ end using the `NanoporeReadScanner-0.5.jar` within the SiCeLoRe-1.0 workflow. Filtered reads are aligned to the primary human genome, assembly GRCh38.p12 using minimap2 (v.2.17). Minimap2 was used with the `-ax splice` flag to prioritize annotated splice junctions. Additionally we made use of the `--junc-bed` option, to increase alignment scores for those splice junctions found in the reference junction bed file. For our reference junctions, we used splice junctions from single-cell SMART-seq2 data from human CD34+ cells obtained from a CH sample with no SF3B1 mutation. Additionally we used `--secondary=no` to suppress multi-mappings.
In preparation to identify the cell barcodes and UMI’s present in the long-read sequencing, we used the `IlluminaParser-1.0.jar` in SiCeLoRe to parse the cell barcodes and UMI’s present in the complementary short-read sequencing library. We continued to use SiCeLoRe to tag the aligned bam files with cell barcodes and UMI’s identified in the short-read library, and generate consensus sequences for each unique cell barcode and UMI combination. Consensus sequences were used to create a gene by cell count matrix.Intron-junction calling is then performed on consensus sequence BAM files, adapted from the method used in the LeafCutter pipeline for short read RNA-seq data. In brief, the Intron junction calling pipeline utilizes the pysam.fetch() function and iterates through each transcript in the bam file, noting its cell barcode (CB) tag as well as the coordinates of each intron-junction for that transcript. On iterating through the bam file, counts for the usage of each unique intron-junction and the corresponding CB are recorded. This ultimately generates an Intron-Junction x Cell Barcode count matrix for the given bam file. Each intron-junction is then identified using annotations available in the GENCODE GRCh38.p12 v31 basic annotation reference file as either canonical 3’, canonical 5’, alternative 3’, alternative 5’. This outputs a metadata file with annotations for each junction corresponding to the junctions of the Intron-Junction x Cell Barcode count matrix.
Assembly: hg38
Supplementary files format and content: Processed 10X data is matrix table of gene counts per cell for each sample
Supplementary files format and content: Output matrices processed from amplicon data have five columns: number of wildtype calls (WT.calls), number of mutant calls (MUT.calls) , genotype (Genotype), cell assignments (Cell.Assignment). Row names are the corresponding barcodes..
Supplementary files format and content: Junction files contain a junction (rows) per cell barcode (columns) matrix of read counts. Similar structure for exon counts
Supplementary files format and content: ADT cell ranger output (barcodes, matrix features)

Submission date

Aug 10, 2023

Last update date

Aug 14, 2023

Contact name

Dan A. Landau

E-mail(s)

dlandau@nygenome.org

Organization name

New York Genome Center

Department

Molecular Pharmacology Program

Street address

101 6th Ave

City

New York

State/province

ZIP/Postal code

10013

Country

USA

Platform ID

GPL26167

Series (1)

GSE204845

Single-cell multi-omics defines the cell-type specific impact of splicing aberrations in human hematopoietic clonal outgrowths

Supplementary file	Size	Download	File type/resource
GSM7703591_CH03.jx.counts.tsv.gz	23.9 Mb	(ftp)(http)	TSV
Processed data provided as supplementary file
Raw data not provided for this record