GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1227210

Query DataSets for GSM1227210

Status

Public on Dec 16, 2013

Title

SU-DHL4_ChIPSeq

Sample type

SRA

Source name

B cells

Organism

Homo sapiens

Characteristics

disease state: diffuse large B cell lymphoma (DLBCL)
dlbcl subtype: GCB
cell line: SU-DHL4
chip antibody: STAT3 (Santa Cruz, sc-482X)

Treatment protocol

Cells were suspended in their growing media at 1e6 per mL and crosslinked with formaldehyde at a final concentration of 1% for 10 minutes at room temperature, followed by quenching with glycine in PBS at a final concentration of 125 mM for 5 minutes.

Growth protocol

Cell lines OCI-Ly7, OCI-Ly19, SU-DHL2, SU-DHL4, SU-DHL6, SU-DHL10, and U-2932 were cultured in suspension in RPMI 1640, supplemented with 15% heat-inactivated FBS. Cell lines OCI-Ly3 and OCI-Ly10 were cultured in suspension in IMDM, supplemented with 15% heat-inactivated FBS and 55 uM beta-mercaptoethanol.

Extracted molecule

genomic DNA

Extraction protocol

Crosslinked cell pellets were dounce homogenized to collect nuclei. Nuclear lysate was sonicated, then immunoprecipitated. Antibody-protein-DNA complexes were collected using Protein A agarose beads.
Libraries were prepared according to Illumina's instructions using non-Illumina enzymes and kits. Briefly, DNA was end-repaired using a combination of T4 DNA polymerase, E. coli DNA Pol I large fragment (Klenow polymerase) and T4 polynucleotide kinase (End-It Repair Kit, Epicenter-Illumina). The blunt ends were treated with Klenow fragment (32 to 52 exo minus) and dATP to yield a protruding 3- 'A' base for ligation of Illumina's adapters. DNA was then gel purified and size selected for 150-300 bp fragments to exclude unligated adapters, PCR amplified with Illumina primers using Phusion high-fidelity DNA polymerase for 15 cycles, and size selected again from an agarose gel for 150-300 bp fragments. The purified DNA was captured on an Illumina flow cell for cluster generation. Libraries were sequenced on the Genome Analyzer IIx following the manufacturer's protocols.

Library strategy

ChIP-Seq

Library source

genomic

Library selection

ChIP

Instrument model

Illumina Genome Analyzer IIx

Description

Sample 7

Data processing

Base calls performed using Illumina ELAND
ChIP-Seq reads were aligned to hg19 using BowTie v0.12.5, parameters "-p 8 -S -q -v 2 -m 1 --best --strata encodeHg19Male" (or encodeHg19Female).
The SPP peak caller v1.10.1 was used with a relaxed peak calling threshold (FDR = 0.9) to obtain a large number of peaks (maximum of 300,000) that span true signal as well as noise (false identifications).
Used the Irreproducible Discovery Rate (IDR) framework in order to identify high confidence and reliable regions of enrichments (peaks) in the ChIP-seq datasets. Specifically, we followed the ENCODE uniform processing pipeline as outlined at http://anshul.kundaje.net/projects/idr. The number of peaks with IDR scores better than 0.02 (2%) was used as the cross-replicate peak rank threshold.
Overlapping or abutting high confidence STAT3 peaks were merged into broader “binding regions” to facilitate comparison between cell lines (output file: ChIPSeq_STAT3BindingRegions.txt). The ChIP-Seq data for each line was then rescored to determine how many fragments mapped to each binding region. Binding regions that occurred in only one cell line were eliminated from further analysis.
Replicates were normalized using DESeq: The effective library size of each sample was estimated based on the pooled count data. First, a reference sample was defined in which the reference count of each binding region is its geometric mean over all samples. Second, for each sample, a vector was calculated as the ratios of the read counts over the reference counts for all the binding regions. Third, the median of these ratios across all the binding regions was defined as the “size factor” of each sample. Lastly, each sample was normalized by dividing the real counts by its size factor.
Applied the negative binomial model in DESeq to assess the significance levels of differential STAT3 binding. The dispersion parameter in the model is estimated from the data by examining the relationship between the mean and variance of read counts across all the BRs. To contrast two conditions, we used the parameterized negative binomial model for each gene and obtained the p-values. To correct for multiple comparison, we adjusted the p-values with the Benjamini-Hochberg procedure which controls false discovery rate. We compared 24 GCB replicates versus 11 ABC replicates. Cell line OCI-Ly19 was excluded from the analysis: RNA-Seq data showed that its gene expression clustered in between the subtypes, probably due to its EBV+ status.
STAT3 binding regions (BRs) were associated with genes that they might regulate via the Genomic Regions Enrichment of Annotations Tool v2.0.2 using its default settings (http://great.stanford.edu). To determine whether a BR and a given gene are linked, GREAT first determines a putative regulatory domain for every gene. This consists of a basal regulatory domain (BRD) from 5 kb upstream to 1 kb downstream, plus an extended domain calculated by elongating the BRD both upstream and downstream for 1000 kb or until reaching another gene’s BRD, whichever occurs first. Once this set of extended regulatory domains is established, GREAT associates the list of ChIP-Seq STAT3 binding regions with all of the genes whose regulatory domains they overlap.
Genome_build: hg19
Supplementary_files_format_and_content: *ChIPSeq_STAT3_IDRpeaks.bed: BED files were generated using the IDR pipeline (http://anshul.kundaje.net/projects/idr). Files are in the UCSC-supported ENCODE narrowPeak format (http://genome.ucsc.edu/FAQ/FAQformat.html#format12). NOTE: BED column 4 reports ranks, as the IDR analysis pipeline uses rank as an important measurement of peak significance.
Supplementary_files_format_and_content: ChIPSeq_STAT3BindingRegions.txt: File was generated by a custom script. "IDRpeaks.bed" files for all 9 cell lines were concatenated. Any overlapping or abutting regions were merged together to form one larger "binding region" (BR). Regions that occurred in only one cell line were discarded. Headers are "chr/start/end/numCellLines", where "numCellLines" is "number of cell lines in which this peak is found". Linked as supplementary file on Series record.
Supplementary_files_format_and_content: ChIPSeq_ReadCountsByReplicate.txt: Abundance measurements. The total fragment count for 10,337 STAT3 binding regions in each of the 35 replicates, after normalization. Linked as supplementary file on Series record.
Supplementary_files_format_and_content: ChIPSeq_StatisticalResults.txt: Mean fragment count for 10,337 STAT3 binding regions in each subtype (ABC, GCB) after normalization; calculated fold change; p-value and FDR for differential gene expression between the two subtypes. Linked as supplementary file on Series record.
Supplementary_files_format_and_content: ChIPSeq_AssociatedGenes.txt: Location-based association between 10,337 STAT3 binding regions and annotated RefSeq genes (analyzed via GREAT). In many cases, STAT3 BRs fall in regions where the regulatory domains of two genes overlap; consequently, GREAT associates these BRs with both genes. For purposes of downstream comparison, we treated these multiple associations as separate table entries, to allow the greatest sensitivity in detecting associations with gene expression data. Linked as supplementary file on Series record.

Submission date

Sep 10, 2013

Last update date

May 15, 2019

Contact name

Jennifer Marion Hardee

E-mail(s)

jenn.hardee@gmail.com

Organization name

Stanford University

Street address

300 Pasteur Dr., M-344

City

Stanford

State/province

California

ZIP/Postal code

94305

Country

USA

Platform ID

GPL10999

Series (2)

GSE50723	Whole genome mapping of STAT3 binding sites in the two major subtypes of diffuse large B cell lymphoma
GSE50724	Correlation between STAT3 binding presence and gene expression levels in subtypes of diffuse large B cell lymphoma

Relations

BioSample

SAMN02351679

SRA

SRX347427

Supplementary file	Size	Download	File type/resource
GSM1227210_SU-DHL4_ChIPSeq_STAT3_IDRpeaks.bed.gz	145.9 Kb	(ftp)(http)	BED
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record