NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM290793 Query DataSets for GSM290793
Status Public on May 24, 2008
Title blood_single primary malignacy_GECM-061_36
Sample type RNA
 
Source name blood, one primary malignancy
Organism Homo sapiens
Characteristics Sex: Female
37 years old
breast cancer
Extracted molecule total RNA
Extraction protocol RNA was obtained by peripheral blood from patients and healthy individuals by using the PAXgene Blood RNA system (Qiagen) according to the manufacturer’s instructions. In order to avoid epithelial cell contamination of our samples the first 5 ml of blood in each case after vein puncture, were discarded. RNA concentration and purity was determined on the NanoDrop ND-1000 spectral photometer (peqlab), whereas RNA integrity was determined by capillary electrophoresis of total RNA samples, using the 2100 Bioanalyzer (Agilent Technologies).
Label Digoxigenin
Label protocol 750 ng total RNA were introduced into an RT-IVT reaction: the RNA was reverse-transcribed into ds-cDNA and then converted into labelled cRNA by in-vitro transcription (Nano-Amp RT-IVT Labeling Kit, Applied Biosystems) using digoxigenin-conjugated UTP nucleotides (Roche Diagnostics).
 
Hybridization protocol 10 µg of digoxigenin-labeled cRNA samples were fragmented and hybridized for 16h to Human Genome Survey Microarrays V2.0 (Applied Biosystems) at a temperature of 55°C. After several washing steps of increasing stringency, an anti-digoxigenin-antibody conjugated to alkaline phosphatase (Roche Diagnostics) was added.
Scan protocol chemiluminescence and fluorescence signals were detected on an AB1700 microarray reader. Reagents from the Chemiluminescence Detection Kit (Applied Biosystems) were used in this procedure. Human Genome Survey Microarrays V2.0 feature numerous control probes and 32878 target gene-specific 60-mer probes detecting more than 27000 different genes.The software platform R (Version 2.4.0) and the Bioconductor packages ABarray (25) and Limma (26) as well as the software Spotfire Decision Site for Functional Genomics 9.0 were used for biostatistical data analysis and visualization.
Description blood_single primary malignacy_GECM-061_ 36
Data processing Normalization compensates for varying global signal intensities of the microarrays and adjusts them to a uniform level, thus making the individual microarrays comparable for downstream analysis. Quantile normalization was applied to the data set . The quantile normalized, log2 transformed data were averaged within the subgroups A, B, C, D, E and N. The quantile normalized, log2 transformed data were also averaged within the category “double cancer” (group AB), and within the category “single cancer” (group CDE). The extent and direction of differential expression between the groups are presented in a fold change value on the log scale (see “statistical significance” below). The similarity of global gene expression profiles of the different patient blood samples was assessed by correlation analysis. For this purpose, MA plots were generated from the quantile normalized data. To analyze the comparisons of interest, design matrices were created which assign the samples to the subgroups A through N and to the categories AB, CDE and N. A linear model was then fitted for every gene and contrast matrices were generated to extract the statistical parameters for the different comparisons. All possible comparisons among the participating groups in this study were determined. The contrasts were extracted after fitting the linear model and were tested for statistical significance using an ANOVA-like test (moderated F-Test, empirical Bayes method). To correct for multiple testing, the FDR-based method was applied. The p-value adjusted for multiple testing is usually several orders of magnitude higher than the unadjusted p-value. To allow for the identification of the “significant” comparison(s), a decision matrix was created. This matrix includes all comparisons within the contrast matrix and is filled with the values 0, 1 and -1. The value 1 indicates that the respective comparison achieves statistical significance in the sense of up-regulation, and the value -1 indicates significant down-regulation. The value 0 specifies a non-significant result of the test. The quintile normalized, log transformed signal values were used as the basis for hierarchical clustering, and the clustering method “complete linkage” was applied within the software Spotfire Decision Site for Functional Genomics 9.0. The identification of differentially regulated processes and pathways for all of the genes obtained as well as the enrichment of biological classes and pathways in comparison to the whole genome was carried out. (http://www.pantherdb.org). As a matter of principle, the percentage distributions shown in the pie charts (results not shown) do not indicate if a certain biological process, molecular function or pathway is in fact overrepresented, or if they just reflect the average percentage of the respective class in the human genome. To elucidate this question, the induced gene IDs were compared to all gene IDs represented on the Human AB1700 Microarray V2.0; (http://www.pantherdb.org/ > Tools > Gene Express Data Analysis > Compare gene lists) and whether a specific class was overrepresented when compared to the distribution of all genes on the microarray, was calculated. A p-value is given for the significance of this enrichment, based on the binomial test. From our experience, p-values approximately 10-6, can be regarded as a sign of manifest enrichment in the context of a Panther analysis for biological processes and molecular functions. For Panther Pathway enrichment analysis, we found that the threshold can be slightly increased to p-values approximately10-6. Classification analysis Classification analysis was performed using the support vector machine (SVM) paradigm. We performed SVM on the highest quality subset of the probes measured in the Human Genome Survey Microarray V2.0. This subset was determined as those probes that had a Signal/noise (S/N) ratio of more than or equal to 3 and a flag value of 8192 or more, for all samples tested. This quality filter was passed by a total of 4121 probes. The quantile-normalized expression levels of these probes were then standardized using the function scale within R 2.5.1., resulting in a set of 4121 variables of which all had a mean of 0 and a standard deviation of 1, thus giving each probe an equal chance to enter as a factor into a classifier. To build the classifiers for the various combinations of outcomes compared, the following procedure was chosen: 1) Application of an ANOVA analysis on the standardized quantile normalized probe expression levels and sorting the probes by p-values from the analysis. 2) From the subset of the 300 probes with the lowest p-values, choice of a further subset by means of a reversible direction Monte-Carlo adding, sampling and reduction method developed especially for this study. 3) The quality of classifiers was judged by the cross-validation error, which was aimed to be minimized. The number of cross-validations for each step was set to half the effective sample size, i.e. the number of samples with class designations being part of the combination of outcome investigated. At this point it should be noted that the accuracy of prediction in the total set of data was 100% in all cases (i.e. the training error was 0 in all cases). The relative importance of a probe was calculated as follows: first, the cross-validation error for the total classifier was recorded, then in turn, each of the probes was left out of the classifier. For each of these reduced classifiers, the cross-validation error, which was expected and indeed found to be increased in every case, was recorded. The cross-validation errors of the reduced classifiers were then normalized such that the probe with the largest increase in cross-validation error was assigned a relative importance of 1.00 with all the other probes having increasingly lower importance slowly approaching 0 for the least important variables. This normalization was done by dividing each of the probe-specific increases in error by the maximum increase in error. These normalized values are designated as relative importances. Out of a total of 65 classifiers that were built, the seven most important denominated by the diagnostic categories used in their building are: AB vs CDE, AB vs N, CDE vs N, AB vs CDE vs N, AB vs C, AB vs D, AB vs E.
 
Submission date May 23, 2008
Last update date May 23, 2008
Contact name George Panayiotis Stathopoulos
E-mail(s) dr-gps@ath.forthnet.gr
Phone ++306937075160
Fax ++302107251736
Organization name Errikos Dunant hospital
Department Oncology Department
Street address Souidias 55
City Athens
State/province Attiki
ZIP/Postal code 106 76
Country Greece
 
Platform ID GPL2986
Series (1)
GSE11545 Gene expression analysis of whole blood samples from patients with single and double primary tumors and healthy controls

Data table header descriptions
ID_REF
VALUE To identify genes with a robust differential expression among the comparator groups, several parameters are taken into account: Robust detection. A probe was only considered if it was detectable (S/N ≥3) in at least two thirds (66.7%) of the samples of at least one comparator group. Furthermore, a probe was excluded if its signal value was flagged (flag value ≥ 8192) in one third or more of the samples of at least one comparator group. Annotation/biological relevance. A probe was only considered if its sequence was annotated as “current” in the latest annotation file (Human AB1700 Annotations 09 06.txt)………NEJM……..(e address). All sequences annotated as “pseudogenes” or “obsolete” were discarded. Statistical significance. A probe was only classified as induced in a specific comparison if the adjusted p-value was below 0.1 and its decision matrix cell contained the value 1. Conversely, a probe was classified as repressed if the adjusted p-value was below 0.1 and the decision matrix cell contained a value of -1. No fold change criterion was used for filtering, i.e. even genes with a very small but still reproducible and thus significant differential expression were retrieved. This relatively low stringency was applied to allow for identification of as many deregulated genes as possible, including those with a weak significance. Nevertheless, more stringent criteria were also applied in each subsequent comparison and genes that presented a highly statistically significant value (<1x10-5) were selected.

Data table
ID_REF VALUE
100002 14.49411695
100003 8.974200848
100027 8.454090303
100036 9.939124657
100037 13.80427258
100039 10.49701255
100044 7.64750949
100045 8.688863923
100051 7.854128401
100052 7.523967214
100057 9.36262507
100058 13.23909762
100060 9.572342663
100062 10.88692769
100064 8.181474503
100079 12.89371975
100089 9.693142374
100093 7.783017226
100095 7.862516382
100100 13.73422553

Total number of rows: 32878

Table truncated, full table size 606 Kbytes.




Supplementary data files not provided
Processed data included within Sample table

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap