NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM6725061 Query DataSets for GSM6725061
Status Public on Dec 11, 2022
Title sc_rep_mEB_series_oBC_CS1_repB_lane2_reseq
Sample type SRA
 
Source name WD44 (mouse embryonic stem cells)
Organism Mus musculus
Characteristics tissue: N/A
cell line: WD44 (mouse embryonic stem cells)
cell type: Mouse embryoid bodies
genotype: monoclonal CRISPRi (dCAS9-KRAB) cell line, high MOI (polyclonal) single-cell reporter cassette (89% devCRE library + 10% promoter series + 1% EEF1A1-mCherry) delivered by piggyBac
time: Day 21
Growth protocol Following integration of reporters via piggyBac (cells growing for >14 days for plasmid dilution), exponentially growing mESCs are lifted from the plate (aspirate medium, wash with PBS, add 2.5 mL [for 10 cm plate] 0.05% trypsin , incubate 2 minutes at 37C, deactivate trypsin and triturate to a single-cell suspension with 10 mL pre-warmed SL medium). Cells are then counted and spun down (5 min at 300 g). Supernatant is aspirated and cells are resuspended to 2 M/mL in CA medium (medium for EB induction: DMEM, 10% FBS, 1x MEM non-essential amino-acids, 1x Glutamax , 10-5 beta-mercaptoethanol). Cells are counted again, and density adjusted to 1 M/mL with CA medium. 3 mL (3 M cells) are added to 12 mL of CA medium in 10 cm plates (non gelatinized, non adherent). One the next day, plates are gently agitated to promote cell aggregation.
Following induction, embryoid bodies (mEBs) are passaged every two days (no daily medium change). mEBs are collected using a serological pipette and transferred to a 50 mL conical tube (typically three plates are pooled). Leftover mEBs on plates are recovered by a CA medium wash and pooled with in the conical tube. mEBs are left to settle (initially up to 15-20 min, faster as the mEBs grow in size). Once mEBs have settled, medium is aspirated from the top, carefully avoiding disturbing the loose pellet. Fresh, pre-warmed, CA medium is then added to 15 mL/plate and mEBs redistributed to plates.
Extracted molecule total RNA
Extraction protocol mEBs were processed at the three weeks end point as follows (for each replicate): 2 plates of mEBs were pooled into a 50 mL conical left to settle. Medium was aspirated and mEBs were washed twice with PBS, resuspended in 3 mL PBS in the second wash, and split in two 1.5 mL aliquots in 2 mL tubes. PBS was aspirated from the tubes, and 500 uL of trypsin 0.25% was added per tube. Tubes were then mixed on a thermomixer at 37C and 650 rpm for 4 minutes. Cells were then gently dissociated by pipetting up and down 10 times, and placed back on the thermomixer for 2 min. 1 mL of SL medium was then added per sample and pipetted to obtain a single-cell suspensions, the two samples were combined in a 15 mL conical, and passed through a 100 um strainer. The strained single-cell suspension was counted, and cells were spun down (300 g, 5 min), resuspended to 4 M/mL, and taken to FACS. >600k  cells were then FACS sorted (in <50 min) in pre-warmed SL medium to ensure the single-cell nature of the suspension (no gating on fluorescence proteins) prior to generating the emulsions for single-cell RNA-seq. Sorted cells were then spun down at 400 g at 4C for 5 min, the medium gently aspirated, and resuspended to an expected 2.5 M cells/mL (based on FACS sort event counts) in ice cold PBS + 0.04% BSA, cells were counted and volume adjusted to 1200 k/uL with ice cold PBS+BSA.
Single-cell suspensions in PBS+BSA were taken as the starting point for the 10x Genomics protocol (v3.1 with feature barcoding). Emulsion and reverse transcription were performed per the manufacturer’s instruction. Given prior empirical experience with mEBs processing, each 10x lane was slightly overloaded (by an additional 20%) to approach the expected recovery of 10k cells/lane. Each replicate was profiled with 2 lanes of 10x, for a total of 6 lanes.
For single-cell reporters, three libraries are generated: the standard 3’ gene expression library from 10x (GEx), and two custom derived libraries, one for each reporter RNA (oBC and mBC), obtained from nested PCRs from the amplified cDNA. Briefly, Single-cell library preparation proceeded following the manufacturer's protocol (v3.1 manual CG000205 Rev D, 10x Genomics), with some modifications. For cDNA amplification, primers specific to the mBC (oSR38) and oBC (o246) reporter transcripts were spiked-in the reaction (similar to TAP-seq) at final concentration of 0.5 uM to boost capture. Following cDNA amplification, both the bead and supernatant derived material (steps 2.3Ax and 2.3Bxiv respectively) were saved for downstream processing.
Gene expression libraries for all replicates were prepared following the manufacturer’s protocol from 25% of the bead fraction amplified cDNA.
oBC enriched libraries were prepared as follows. 25% of the supernatant fraction from the amplified cDNA was taken as input for semi-nested inner PCR was performed with Kapa Robust (50 uL 2x master mix, supernatant cDNA, 5 μL 10 μM NextP5_index1 primer, 5 μL 10 μM indexed primers [o501-o506, one per sample], 0.5 uL SYBr green, and water to 100 μL; run parameters: 3 min at 95C, and cycles 20 s at 95C, 20 s at 60C, 20 s at 72C) with tracking with qPCR and stopped before the inflection point. Libraries were purified by 1.5x ampure. To avoid loop-the-loop products in the oBC libraries, the lowest band in the circularized ladder amplicons was size selected on PAGE (6% TBE, 180V, 30 min) for each library and used for sequencing. 
Only the poly-dT captured libraries were generated for the mBC for the mouse embryoid body experiment. 25% of the bead-fraction of the purified amplified cDNA was used as template for PCR with Kapa Robust (50 uL 2x master mix, supernatant cDNA, 5 μL 10 μM o324 primer, 5 μL 10 μM o529, 0.5 uL SYBr green, and water to 100 μL; run parameters: 3 min at 95C, and cycles 20 s at 95C, 20 s at 65C, 50 s at 72C), with tracking by qPCR and purifying by 1x ampure. A final PCR (same condition as above) was performed to index amplicons with primers o076 and P7-indexed primers (o530-o533), and the resulting amplicons purified by 1x ampure.
Read structures:
RepA/B GEx: read 1, cell-barcode+UMI (28 cycles); index 1, library index (10 cycles); read 2, transcriptome (54 cycles)
RepA/B oBC: read 1, cell-barcode+UMI (28 cycles, no custom primers); index 1, library index (10 cycles, primer o432); read 2, oBC (54 cycles, primers o433)
RepA/B oBC (reseq): read 1, cell-barcode+UMI (28 cycles, primer SR40); index 1, library index (10 cycles, primer o432); read 2, oBC (20 cycles, primers o433)
RepA/B mBC: read 1, cell-barcode+UMI (28 cycles, no custom primers); index 1, library index(10 cycles, primer o534); read 2: mBC (54 cycles, o334)
Rep2B GEx: read 1, cell-barcode+UMI (28 cycles); index 1, library index (8 cycles); read 2, transcriptome (56 cycles)
Rep2B oBC: read 1, cell-barcode+UMI (28 cycles, no custom primers); index 1, library index (8 cycles, primer o432); read 2, oBC (56 cycles, primer o433)
Rep2B mBC: read 1, cell-barcode+UMI (28 cycles, no custom primers); index 1, library index (8 cycles, primer o534); read 2: mBC (56 cycles, primer o334)
scRNA-seq (with custom reporter libraries)
 
Library strategy RNA-Seq
Library source transcriptomic single cell
Library selection cDNA
Instrument model NextSeq 2000
 
Description 10X Genomics (custom oBC library)
oBC_counts_sc_rep_mEB_series.txt
assigned_oBC_CRE_mBC_joined_counts_sc_rep_mEB_series.txt
Data processing GEx libraries: Fastq files were generated using the makefastq command from cellranger (v6.0.1), and the gene expression count matrices were then generated with cellranger count command, with transcriptome reference mm10-3.0.0. Raw count matrices were then imported as a Seurat object (filtering gene expressed in less than 3 cells, and cell barcodes with less than 50 genes measured). Cell barcodes in the high total UMI mode with low mitochondrial RNA proportion were filtered as likely bona fide cells (fraction of mitochondrial UMI >1% and <15%, total gene expression UMI > 400 for samples from replicates A, B, and 2B lane1, and >1000 for 2B lane2, which was fortuitously sequenced more deeply).  The filtered count matrices were then used to evaluate doublet scores using scrublet (scrub_doublets command, 30 principal components, mean_center=true, normalize_variance=true), and cell barcodes with doublet score > 0.3 (separating the two modes of the simulated doublet distribution from scrublet) were filtered out. Datasets from all replicates were then combined in a single Seurat object, dimensionally reduced and clustered (NormalizeData, normalization.method= “LogNormalize”, scale.factor=10000; FindVariableFeatures with selection.method = “vst”, nfeatures=1000; ScaleData with all genes as features; RunPCA with identified variable features and 100 principal components; FindNeighbors, dims=1:50; FindClusters, resolution=0.2; RunUMAP, dims=1:50, n.neighbors=50) without batch correction given the good correspondence between replicates (Fig. S8B). The cluster identities were taken as categories for cell-type expression testing (see integration section below). 
The following additional quality filtering steps were applied to retain high confidence singlet cells. Clusters comprising less than 1% of cells were considered likely doublets/artifacts, and corresponding cells were removed. Cells members of each cluster identified were separately sub-clustered with the same procedure as above (except resolution 0.5 in FindNeighbors). Any sub-cluster with a median doublet score above 0.15 was deemed composed of likely doublets, and corresponding cells were removed. Cells with anomalously high gene expression UMI counts were removed (with 10x lane specific thresholds: >10k for A.1, >9k A.2, >12k B.1, 9k B.2, <15k 2B.2, no such cells in 2B.1). Finally, cells with an estimated MOI > 200 (roughly corresponding to the top 0.1% of the distribution, MOI estimated through oBC UMI > 10, see below) were filtered out. In the end, n=43799 cells passed all these quality filters (12859 replicate A, 15422 replicate B, 15518 replicate 2B). 
mBC libraries: Data was converted to fastq using bcl2fastq, and fastqs were minimally processed (e.g., trimming read 1 to 28 cycles) to be compatible with cellranger (version 6.0.1, 10x Genomics), which was run to perform error correction on cell barcodes. The resulting position sorted bam files were then parsed for the mBC reads as follows using a custom python script. Reads aligning to the reference genome or without either corrected cell barcode or UMI (tags CB and UB in the bam file) were discarded. Only reads with the exact expected 7 nt sequence (TCGACAA) downstream of the mBC (positions 16 to 22) were retained. List of all UMIs corresponding to a cell barcode and mBC pair were stored, discarding chimeric UMIs (taken to be UMIs for which the proportion of reads associated to a given mBC vs all other mBC in the specified cell barcode falls below 0.2). mBC comprised of all Gs (empty read) were discarded. Finally, the UMI count was error corrected as follows. For each given mBC and cell barcode, the Hamming distance between all UMIs was calculated, a graph created by connecting UMIs that were a Hamming distance ≤ 1, and the resulting the number of connected components in the graph was taken as the error-corrected UMI count for a given cell barcode-mBC pair. These error corrected UMI counts were used for the per single-cell quantification of the reporter mRNA expression.
oBC libraries: processed in an entirely analogous way to the strategy for mBC, with the following modifications: two sequencing runs were combined in a single fastq prior to processing, read 2 were trimmed to 23 cycles, and only reads with the GCTTTAA (constant region after the oBC) at positions 17 to 23 were retained. The number of UMIs per oBC per cell barcode was also taken as the error corrected (1 Hamming distance) count and our measure of oBC expression in single cells (see below for a normalization strategy to correct for gene expression UMIs). Given that cell barcodes derived from capture sequence vs. poly-dT reverse transcription primer are different (bases 8 and 9 reverse complemented) on the same bead (and not error corrected by cellranger in our application), we converted the CS1 cell barcode to its poly-dT counterpart to enable matching across the different libraries.
Only cell barcodes passing the QC filters from the GEx analysis were retained in the final count tables. Only mBC and oBC present in the subassembly library were retained.
Assembly: mm10
Supplementary files format and content: The four processed files are:
Supplementary files format and content: Seurat processed object containing transcriptome quantification and cell annotation metadata (combined for all replicates).
Supplementary files format and content: Raw oBC count table (per cellbarcode, restricted to oBC from the list determined in the subassembly and UMI counts > 1). Column 1: cell barcode; column 2: replicate ID; column 3: oBC; column 4: read counts
Supplementary files format and content: Raw mBC count table (per cellbarcode, restricted to mBC from the list determined in the subassembly). Column 1: cell barcode; column 2: replicate ID; column 3: mBC; column 4: read counts; column 5: UMI counts; capture modality.
Supplementary files format and content: Final assigned joined cell-oBC-CRE_mBC table. Restricting to oBC with >10 UMI counts, and to uniquely matchable oBC-CRE-mBC triplets. column 1: cell barcode; column 2: replicate ID; column 3: oBC; column 4: mBC; column 5: CRE class (promoters [exogenous series] or devCRE); column 6: CRE identity; column 7: read counts oBC; column 8: UMI counts oBC; column 9: read counts mBC; column 10: UMI counts mBC
 
Submission date Nov 10, 2022
Last update date Dec 11, 2022
Contact name Jean-Benoit Lalanne
E-mail(s) lalannej@uw.edu
Organization name University of Washington
Department Genome Sciences
Lab Jay Shendure
Street address 3720 15th Ave NE
City Seattle
State/province WA
ZIP/Postal code 98195
Country USA
 
Platform ID GPL30172
Series (2)
GSE217686 Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters [scQer_devCRE_mEBs]
GSE217690 Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters
Relations
BioSample SAMN31678180
SRA SRX18229573

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap