|
Status |
Public on Jun 01, 2020 |
Title |
rLP5_frag_DNA2_2 |
Sample type |
SRA |
|
|
Source name |
Escherichia coli
|
Organism |
Escherichia coli str. K-12 substr. MG1655 |
Characteristics |
strain/background: MG1655
|
Growth protocol |
Luria-Bertani (LB) Rich Media
|
Extracted molecule |
genomic DNA |
Extraction protocol |
Qiagen Puregene Yeast/Bact. Kit 2
|
|
|
Library strategy |
OTHER |
Library source |
genomic |
Library selection |
other |
Instrument model |
Illumina NextSeq 500 |
|
|
Description |
Sequencing DNA barcode counts. Genomic DNA extracted from an E. coli population with single barcoded promoter variants integrated into the nth-ydgr intergenic region. This is the 2nd technical replicate of the 2nd biological replicate. processed data file: U00096.2_frag-rLP5_LB_expression.txt
|
Data processing |
Counts for each unique barcode in the census files were complete in Unix as follows: Raw sequences were extracted from each fastq file and the first 20 bp (corresponding to the barcode) was extracted. This sequence was reverse complemented and the entire file was sorted before counting the number of counts of each barcode. Counts were normalized as a proportion of totals reads per sample and all samples aggregated together in R. Barcode mapping was completed as follows: Demultiplexed reads were paired using Paired-End reAd mergeR (PEAR v0.9.1, default settings). Custom python code was used to identify reads corresponding to perfectly synthesized promoters and their respective barcodes. Briefly, this code searched the first 150 bp of each read for perfect matches to library variants. For reads with perfect matches, the last 20 bp of each read (the barcode) was extracted and a list was compiled mapping each barcode to the most frequently associated library variant. A single barcode appears many times in the sequencing data, and we took steps to ensure a barcode consistently mapped to the same variant. We required that all variants mapped to a single barcode be within an edit distance (Levenshtein distance) of 5 from one another (five single bp changes between the two sequences). We determined this number by bootstrapping a distribution of the edit distance between any two random sequences in our variant library, and setting the threshold to the first percentile (1%) of this bootstrapped distribution. Additionally, each barcode had to appear at least three times in order to be considered for downstream analysis. This step hopefully eliminates barcodes which contained sequencing errors. Promoter variant expression was calculated in R by assigning barcodes to their mapped promoter, and for each promoter, dividing the sum of all RNA counts for all of its barcodes by the sum of all DNA counts for all of its barcodes. Genome_build: U00096.2 Supplementary_files_format_and_content: *txt: Tab-delimited text files
|
|
|
Submission date |
Jan 31, 2020 |
Last update date |
Jun 01, 2020 |
Contact name |
Guillaume Urtecho |
E-mail(s) |
gurtecho@g.ucla.edu
|
Organization name |
University of California, Los Angeles
|
Department |
Molecular Biology
|
Lab |
Kosuri Lab
|
Street address |
607 Charles E. Young Drive
|
City |
Los Angeles |
State/province |
CA |
ZIP/Postal code |
90095 |
Country |
USA |
|
|
Platform ID |
GPL21117 |
Series (1) |
GSE144621 |
Genome-wide Functional Characterization of Escherichia coli Promoters and Sequence Elements Encoding Their Regulation |
|
Relations |
BioSample |
SAMN13957756 |
SRA |
SRX7656731 |