GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM4292447

Query DataSets for GSM4292447

Status

Public on Jun 01, 2020

Title

rLP5_TSS-Scramble_DNA2_2

Sample type

SRA

Source name

Escherichia coli

Organism

Escherichia coli str. K-12 substr. MG1655

Characteristics

strain/background: MG1655

Growth protocol

Luria-Bertani (LB) Rich Media

Extracted molecule

genomic DNA

Extraction protocol

Qiagen Puregene Yeast/Bact. Kit 2

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

Illumina NextSeq 500

Description

Sequencing DNA barcode counts. Genomic DNA extracted from an E. coli population with single barcoded promoter variants integrated into the nth-ydgr intergenic region. This is the 2nd technical replicate of the 2nd biological replicate.
processed data file: endo_scramble_expression_formatted_std.txt

Data processing

Counts for each unique barcode in the census files were complete in Unix as follows: Raw sequences were extracted from each fastq file and the first 20 bp (corresponding to the barcode) was extracted. This sequence was reverse complemented and the entire file was sorted before counting the number of counts of each barcode. Counts were normalized as a proportion of totals reads per sample and all samples aggregated together in R.
Barcode mapping was completed as follows: Demultiplexed reads were paired using Paired-End reAd mergeR (PEAR v0.9.1, default settings). Custom python code was used to identify reads corresponding to perfectly synthesized promoters and their respective barcodes. Briefly, this code searched the first 150 bp of each read for perfect matches to library variants. For reads with perfect matches, the last 20 bp of each read (the barcode) was extracted and a list was compiled mapping each barcode to the most frequently associated library variant. A single barcode appears many times in the sequencing data, and we took steps to ensure a barcode consistently mapped to the same variant. We required that all variants mapped to a single barcode be within an edit distance (Levenshtein distance) of 5 from one another (five single bp changes between the two sequences). We determined this number by bootstrapping a distribution of the edit distance between any two random sequences in our variant library, and setting the threshold to the first percentile (1%) of this bootstrapped distribution. Additionally, each barcode had to appear at least three times in order to be considered for downstream analysis. This step hopefully eliminates barcodes which contained sequencing errors.
Promoter variant expression was calculated in R by assigning barcodes to their mapped promoter, and for each promoter, dividing the sum of all RNA counts for all of its barcodes by the sum of all DNA counts for all of its barcodes.
Genome_build: U00096.2
Supplementary_files_format_and_content: *txt: Tab-delimited text files

Submission date

Jan 31, 2020

Last update date

Jun 01, 2020

Contact name

Guillaume Urtecho

E-mail(s)

gurtecho@g.ucla.edu

Organization name

University of California, Los Angeles

Department

Molecular Biology