![](/coreweb/template1/pix/main_left_bg.gif) |
![](/coreweb/template1/pix/pixel.gif) |
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Mar 28, 2020 |
Title |
Construction of brest cancer cell line BT20-specific interactome |
Sample organism |
Homo sapiens |
Experiment type |
Expression profiling by array Expression profiling by high throughput sequencing Third-party reanalysis
|
Summary |
Experiment type: Expression profiling by microarray and RNA-seq data, Third-party re-analysis
The purpose of our study is to construct breast cancer cell line BT20-specific interactome by aggregating publically available microarray or RNA-seq samples, and identify novel transcriptional regulators that control breast cancer progression.
We collected 101 samples of BT20 cell line microarray or RNA-seq from 12 prior studies and our own study data, GSE120919. We performed normalization for Affymetrix microarray platform and Illumina HiSeq and NovaSeq platform datasets. For the normalization, we used SCAN.UPC R package (ver.2.24.0, PMID: 22959562, https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html) on Affymetrix microarray platform datasets. We use Rsubread R package (ver.1.30.5, PMID: 23558742, http://bioconductor.org/packages/Rsubread/), TPM log2 values on Illumina HiSeq and NovaSeq platform datasets. And then, we performed batch effect adjusting to combine the datasets which came from different laboratories. The matrix data we deposited in GEO has normalized log2 signal intensity for 12,360 genes, and the genes were mapped to human Entrez Gene IDs that overlapped across 13 datasets (a total number of 101 samples).
BT20-specific interactome was assembled by Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe, PMID: 16723010). Then, we used Master Regulator Analysis-Fisher?s Extact Test (MRA-FET, PMID: 20531406) to infer transcription factors which control a gene signature dysregulated by a fusion gene in BT20 cell line (please refer to GSE120919). We discovered SNAI2 as a master regulator candidate to modulate the gene signature in the fusion gene expressing BT20. Our findings will provide biological insights into the role of the fusion gene in breast cancer.
|
|
|
Overall design |
A total number of 13 data sets containing 50 microarray samples, 39 RNA-seq samples, and 12 beadchip samples were downloaded from NCBI GEO (http://ncbi.nlm.nih.gov/geo). For Affymetrix platform datasets, we used SCAN.UPC R package, and for Illumina RNA-seq platforms datasets, we used Rsubread R packages to normalize the data. For other platform datasets, we downloaded series_matrix.txt.gz files and used their normalized data. SCAN.UPC is excellent to adjust intra- and inter-study batch effects across individual samples. In RNA-seq data normalization, we used Rsubread (version 1.30.5) to align sequence reads to reference genome sequence, and used edgeR (version 3.22.3) and limma (version 3.36.2) R packages to normalize gene expression level to log2 transcripts per million (TPM) (PMID: 22872506). We aligned sequence reads to GRCh38 human genome reference sequence (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/GRCh38.primary_assembly.genome.fa.gz) and mapped the aligned sequences to NCBI gene IDs by using NCBI gene annotation data (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.gff.gz). After normalization, we included only protein-coding genes, non-coding RNAs, and pseudogenes which have poly A tails. We removed genes of which expression level is zero across all samples.
Significant gene signature: Limma analysis (PMID: 25605792) was used to identify differentially expressed genes out of the between the control samples and the fusion gene expressing samples in GSE120919 data. We found out the gene signature specifically enriched in the fusion gene expressing samples and applied this gene signature to MRA-FET analysis to infer transcription factors that control the gene signature.
Gene Set Enrichment Analysis (GSEA): we performed Gene Set Enrichment Analysis (GSEA) using the Hallmark Pathway dataset, to identify the pathways enriched in the BCL2L14-ETV6 variants expressing BT20 cells. Several pathways involved in cancer progression are upregulated in fusion expressing cells, such as epithelial mesenchymal transition (EMT), angiogenesis, and kras signaling (Please refer to GSE120919).
BT20-specific interactome and Master Regulator Analysis (MRA): From the 101 microarray samples that have done with normalization, and batch adjusting, ARACNe built up regulatory interactions between the 12,360 genes and 1,032 human transcription factors (TFs) through mutual information (MI) calculation. From the built BT20-specific interactome, MRA-FET inferred master regulator (MR) candidates which control the gene signature, which are specific in the fusion gene expressing BT20 cells and identified by GSEA.
The information of the 101 samples of 13 dataset are summarized in the meta-data spreadsheet. The meta-data have sample names with GSE accession number, and information of array or RNA-seq platform, how to obtain expression data, original experiment type. The matrix data txt file has normalized log2 expression values of 101 samples.
|
|
|
Contributor(s) |
Lee S, Hu Y, Wang X |
Citation(s) |
32321829 |
NIH grant(s) |
Grant ID |
Grant title |
Affiliation |
Name |
R01 CA183976 |
CHARACTERIZATION OF RECURRENT ADJACENT GENE TRANSLOCATIONS IN BREAST CANCER |
UNIVERSITY OF PITTSBURGH AT PITTSBURGH |
Xiaosong Wang |
|
|
Submission date |
Dec 17, 2018 |
Last update date |
Jun 27, 2020 |
Contact name |
Xiaosong Wang |
E-mail(s) |
xiaosongw@pitt.edu
|
Organization name |
University of Pittsburgh
|
Department |
Pathology
|
Lab |
Cagenome
|
Street address |
5150 Centre Ave
|
City |
Pittsburgh |
State/province |
Pennsylvania |
ZIP/Postal code |
15232 |
Country |
USA |
|
|
Relations |
Reanalysis of |
GSM2406860 |
Reanalysis of |
GSM2406861 |
Reanalysis of |
GSM2406862 |
Reanalysis of |
GSM2406863 |
Reanalysis of |
GSM2406864 |
Reanalysis of |
GSM2406865 |
Reanalysis of |
GSM2406866 |
Reanalysis of |
GSM2406867 |
Reanalysis of |
GSM1296240 |
Reanalysis of |
GSM1296241 |
Reanalysis of |
GSM1296242 |
Reanalysis of |
GSM1296243 |
Reanalysis of |
GSM1296244 |
Reanalysis of |
GSM1296245 |
Reanalysis of |
GSM1296246 |
Reanalysis of |
GSM1296247 |
Reanalysis of |
GSM1821282 |
Reanalysis of |
GSM1821283 |
Reanalysis of |
GSM1821284 |
Reanalysis of |
GSM1821285 |
Reanalysis of |
GSM1821286 |
Reanalysis of |
GSM1821287 |
Reanalysis of |
GSM1821288 |
Reanalysis of |
GSM1821289 |
Reanalysis of |
GSM1821290 |
Reanalysis of |
GSM1821291 |
Reanalysis of |
GSM1821292 |
Reanalysis of |
GSM1821293 |
Reanalysis of |
GSM1924139 |
Reanalysis of |
GSM1924140 |
Reanalysis of |
GSM1924141 |
Reanalysis of |
GSM1924142 |
Reanalysis of |
GSM1924143 |
Reanalysis of |
GSM1924144 |
Reanalysis of |
GSM1203251 |
Reanalysis of |
GSM1203252 |
Reanalysis of |
GSM1203253 |
Reanalysis of |
GSM1380175 |
Reanalysis of |
GSM1380176 |
Reanalysis of |
GSM1380177 |
Reanalysis of |
GSM1380178 |
Reanalysis of |
GSM553880 |
Reanalysis of |
GSM553881 |
Reanalysis of |
GSM2501509 |
Reanalysis of |
GSM2501510 |
Reanalysis of |
GSM564952 |
Reanalysis of |
GSM564953 |
Reanalysis of |
GSM564958 |
Reanalysis of |
GSM564959 |
Reanalysis of |
GSM564960 |
Reanalysis of |
GSM564961 |
Reanalysis of |
GSM375599 |
Reanalysis of |
GSM375609 |
Reanalysis of |
GSM1229992 |
Reanalysis of |
GSM1229993 |
Reanalysis of |
GSM1229994 |
Reanalysis of |
GSM1229995 |
Reanalysis of |
GSM1229996 |
Reanalysis of |
GSM1229997 |
Reanalysis of |
GSM1229998 |
Reanalysis of |
GSM1229999 |
Reanalysis of |
GSM1230000 |
Reanalysis of |
GSM3420769 |
Reanalysis of |
GSM3420770 |
Reanalysis of |
GSM3420771 |
Reanalysis of |
GSM3420772 |
Reanalysis of |
GSM3420773 |
Reanalysis of |
GSM3420774 |
Reanalysis of |
GSM3420775 |
Reanalysis of |
GSM3420776 |
Reanalysis of |
GSM3420777 |
Reanalysis of |
GSM3420778 |
Reanalysis of |
GSM3420779 |
Reanalysis of |
GSM3420780 |
Reanalysis of |
GSM3420781 |
Reanalysis of |
GSM3420782 |
Reanalysis of |
GSM3420783 |
Reanalysis of |
GSM2123960 |
Reanalysis of |
GSM2123961 |
Reanalysis of |
GSM2123962 |
Reanalysis of |
GSM2123963 |
Reanalysis of |
GSM2123964 |
Reanalysis of |
GSM2123965 |
Reanalysis of |
GSM2123966 |
Reanalysis of |
GSM2123967 |
Reanalysis of |
GSM2123968 |
Reanalysis of |
GSM2123969 |
Reanalysis of |
GSM2123970 |
Reanalysis of |
GSM2123971 |
Reanalysis of |
GSM2123972 |
Reanalysis of |
GSM2123973 |
Reanalysis of |
GSM2123974 |
Reanalysis of |
GSM2123975 |
Reanalysis of |
GSM2123976 |
Reanalysis of |
GSM2123977 |
Reanalysis of |
GSM2123978 |
Reanalysis of |
GSM2123979 |
Reanalysis of |
GSM2123980 |
Reanalysis of |
GSM2123981 |
Reanalysis of |
GSM2123982 |
Reanalysis of |
GSM2123983 |
BioProject |
PRJNA510324 |
Supplementary file |
Size |
Download |
File type/resource |
GSE123917_re-analyzed_log2normalized_data.txt.gz |
3.1 Mb |
(ftp)(http) |
TXT |
GSE123917_re-analyzed_samples.txt.gz |
1.9 Kb |
(ftp)(http) |
TXT |
Processed data are available on Series record |
|
|
|
|
![](/coreweb/template1/pix/main_right_bg.gif) |