NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE123917 Query DataSets for GSE123917
Status Public on Mar 28, 2020
Title Construction of brest cancer cell line BT20-specific interactome
Sample organism Homo sapiens
Experiment type Expression profiling by array
Expression profiling by high throughput sequencing
Third-party reanalysis
Summary Experiment type: Expression profiling by microarray and RNA-seq data, Third-party re-analysis

The purpose of our study is to construct breast cancer cell line BT20-specific interactome by aggregating publically available microarray or RNA-seq samples, and identify novel transcriptional regulators that control breast cancer progression.

We collected 101 samples of BT20 cell line microarray or RNA-seq from 12 prior studies and our own study data, GSE120919. We performed normalization for Affymetrix microarray platform and Illumina HiSeq and NovaSeq platform datasets. For the normalization, we used SCAN.UPC R package (ver.2.24.0, PMID: 22959562, https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html) on Affymetrix microarray platform datasets. We use Rsubread R package (ver.1.30.5, PMID: 23558742, http://bioconductor.org/packages/Rsubread/), TPM log2 values on Illumina HiSeq and NovaSeq platform datasets. And then, we performed batch effect adjusting to combine the datasets which came from different laboratories. The matrix data we deposited in GEO has normalized log2 signal intensity for 12,360 genes, and the genes were mapped to human Entrez Gene IDs that overlapped across 13 datasets (a total number of 101 samples).

BT20-specific interactome was assembled by Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe, PMID: 16723010). Then, we used Master Regulator Analysis-Fisher?s Extact Test (MRA-FET, PMID: 20531406) to infer transcription factors which control a gene signature dysregulated by a fusion gene in BT20 cell line (please refer to GSE120919). We discovered SNAI2 as a master regulator candidate to modulate the gene signature in the fusion gene expressing BT20. Our findings will provide biological insights into the role of the fusion gene in breast cancer.
 
Overall design A total number of 13 data sets containing 50 microarray samples, 39 RNA-seq samples, and 12 beadchip samples were downloaded from NCBI GEO (http://ncbi.nlm.nih.gov/geo). For Affymetrix platform datasets, we used SCAN.UPC R package, and for Illumina RNA-seq platforms datasets, we used Rsubread R packages to normalize the data. For other platform datasets, we downloaded series_matrix.txt.gz files and used their normalized data. SCAN.UPC is excellent to adjust intra- and inter-study batch effects across individual samples. In RNA-seq data normalization, we used Rsubread (version 1.30.5) to align sequence reads to reference genome sequence, and used edgeR (version 3.22.3) and limma (version 3.36.2) R packages to normalize gene expression level to log2 transcripts per million (TPM) (PMID: 22872506). We aligned sequence reads to GRCh38 human genome reference sequence (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/GRCh38.primary_assembly.genome.fa.gz) and mapped the aligned sequences to NCBI gene IDs by using NCBI gene annotation data (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.gff.gz). After normalization, we included only protein-coding genes, non-coding RNAs, and pseudogenes which have poly A tails. We removed genes of which expression level is zero across all samples.

Significant gene signature: Limma analysis (PMID: 25605792) was used to identify differentially expressed genes out of the between the control samples and the fusion gene expressing samples in GSE120919 data. We found out the gene signature specifically enriched in the fusion gene expressing samples and applied this gene signature to MRA-FET analysis to infer transcription factors that control the gene signature.

Gene Set Enrichment Analysis (GSEA): we performed Gene Set Enrichment Analysis (GSEA) using the Hallmark Pathway dataset, to identify the pathways enriched in the BCL2L14-ETV6 variants expressing BT20 cells. Several pathways involved in cancer progression are upregulated in fusion expressing cells, such as epithelial mesenchymal transition (EMT), angiogenesis, and kras signaling (Please refer to GSE120919).

BT20-specific interactome and Master Regulator Analysis (MRA): From the 101 microarray samples that have done with normalization, and batch adjusting, ARACNe built up regulatory interactions between the 12,360 genes and 1,032 human transcription factors (TFs) through mutual information (MI) calculation. From the built BT20-specific interactome, MRA-FET inferred master regulator (MR) candidates which control the gene signature, which are specific in the fusion gene expressing BT20 cells and identified by GSEA.

The information of the 101 samples of 13 dataset are summarized in the meta-data spreadsheet. The meta-data have sample names with GSE accession number, and information of array or RNA-seq platform, how to obtain expression data, original experiment type. The matrix data txt file has normalized log2 expression values of 101 samples.
 
Contributor(s) Lee S, Hu Y, Wang X
Citation(s) 32321829
NIH grant(s)
Grant ID Grant title Affiliation Name
R01 CA183976 CHARACTERIZATION OF RECURRENT ADJACENT GENE TRANSLOCATIONS IN BREAST CANCER UNIVERSITY OF PITTSBURGH AT PITTSBURGH Xiaosong Wang
Submission date Dec 17, 2018
Last update date Jun 27, 2020
Contact name Xiaosong Wang
E-mail(s) xiaosongw@pitt.edu
Organization name University of Pittsburgh
Department Pathology
Lab Cagenome
Street address 5150 Centre Ave
City Pittsburgh
State/province Pennsylvania
ZIP/Postal code 15232
Country USA
 
Relations
Reanalysis of GSM2406860
Reanalysis of GSM2406861
Reanalysis of GSM2406862
Reanalysis of GSM2406863
Reanalysis of GSM2406864
Reanalysis of GSM2406865
Reanalysis of GSM2406866
Reanalysis of GSM2406867
Reanalysis of GSM1296240
Reanalysis of GSM1296241
Reanalysis of GSM1296242
Reanalysis of GSM1296243
Reanalysis of GSM1296244
Reanalysis of GSM1296245
Reanalysis of GSM1296246
Reanalysis of GSM1296247
Reanalysis of GSM1821282
Reanalysis of GSM1821283
Reanalysis of GSM1821284
Reanalysis of GSM1821285
Reanalysis of GSM1821286
Reanalysis of GSM1821287
Reanalysis of GSM1821288
Reanalysis of GSM1821289
Reanalysis of GSM1821290
Reanalysis of GSM1821291
Reanalysis of GSM1821292
Reanalysis of GSM1821293
Reanalysis of GSM1924139
Reanalysis of GSM1924140
Reanalysis of GSM1924141
Reanalysis of GSM1924142
Reanalysis of GSM1924143
Reanalysis of GSM1924144
Reanalysis of GSM1203251
Reanalysis of GSM1203252
Reanalysis of GSM1203253
Reanalysis of GSM1380175
Reanalysis of GSM1380176
Reanalysis of GSM1380177
Reanalysis of GSM1380178
Reanalysis of GSM553880
Reanalysis of GSM553881
Reanalysis of GSM2501509
Reanalysis of GSM2501510
Reanalysis of GSM564952
Reanalysis of GSM564953
Reanalysis of GSM564958
Reanalysis of GSM564959
Reanalysis of GSM564960
Reanalysis of GSM564961
Reanalysis of GSM375599
Reanalysis of GSM375609
Reanalysis of GSM1229992
Reanalysis of GSM1229993
Reanalysis of GSM1229994
Reanalysis of GSM1229995
Reanalysis of GSM1229996
Reanalysis of GSM1229997
Reanalysis of GSM1229998
Reanalysis of GSM1229999
Reanalysis of GSM1230000
Reanalysis of GSM3420769
Reanalysis of GSM3420770
Reanalysis of GSM3420771
Reanalysis of GSM3420772
Reanalysis of GSM3420773
Reanalysis of GSM3420774
Reanalysis of GSM3420775
Reanalysis of GSM3420776
Reanalysis of GSM3420777
Reanalysis of GSM3420778
Reanalysis of GSM3420779
Reanalysis of GSM3420780
Reanalysis of GSM3420781
Reanalysis of GSM3420782
Reanalysis of GSM3420783
Reanalysis of GSM2123960
Reanalysis of GSM2123961
Reanalysis of GSM2123962
Reanalysis of GSM2123963
Reanalysis of GSM2123964
Reanalysis of GSM2123965
Reanalysis of GSM2123966
Reanalysis of GSM2123967
Reanalysis of GSM2123968
Reanalysis of GSM2123969
Reanalysis of GSM2123970
Reanalysis of GSM2123971
Reanalysis of GSM2123972
Reanalysis of GSM2123973
Reanalysis of GSM2123974
Reanalysis of GSM2123975
Reanalysis of GSM2123976
Reanalysis of GSM2123977
Reanalysis of GSM2123978
Reanalysis of GSM2123979
Reanalysis of GSM2123980
Reanalysis of GSM2123981
Reanalysis of GSM2123982
Reanalysis of GSM2123983
BioProject PRJNA510324

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE123917_re-analyzed_log2normalized_data.txt.gz 3.1 Mb (ftp)(http) TXT
GSE123917_re-analyzed_samples.txt.gz 1.9 Kb (ftp)(http) TXT
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap