Amygdala-enriched genes identified by microarray technology are restricted to specific amygdaloid subnuclei

Supplementary material for Zirlinger et al. (2001) Proc. Natl. Acad. Sci. USA98 (9), 5270–5275. (10.1073/pnas.091094698)

Appendix A

Introduction

We used Affymetrix GENECHIPS to search for differentially expressed genes in the mouse brain. About 34,000 genes and ESTs were interrogated with the Mu11kA, Mu11kA, Mu11kB, Mu19kB, and Mu19kC arrays. We compared the following regions: amygdala, cerebellum, hippocampus, olfactory bulb, and PAG. To search for region-specific genes in each area, we wrote custom code in MATLAB (The MathWorks).

In Situ Hybridization

Male and female CD-1 mice, 3-4 week old were used. Clones were purchased from Research Genetics when available, or templates for probes were synthesized by PCR using specific primers and cDNA from mouse brain. For some genes, sense probes were also synthesized to control for non-specific hybridization. Digoxigenin-labeled RNA probes were made and hybridization was performed essentially as previously described [Henrique, D., Adam, J., Myat, A., Chitnis, A., Lewis, J. & Ish-Horowicz, D. (1995) Nature (London)375, 787–790], with some modifications. Briefly, fresh frozen, 20 mm thick coronal sections were cut with a cryostat. Sections were dried and fixed in 4% paraformaldehyde, washed in PBS and subjected to acetylation using 0.25 % acetic anhydride in 1M Triethanolamine-HCl pH 8.0. Slides were prehybridized for 1–3 hr, and hybridized overnight at 70°C, using a probe concentration of 0.5–1 µg/ml. Sections were washed twice in 0.2X SSC at 70°C for 30 min., incubated with anti-digoxigenin alkaline phosphatase-conjugated Fab fragments (Roche) at a 1: 2000 dilution in 0.1M maleic acid buffer, pH 7.5, with 0.2 % Tween-20, 20 % sheep serum and 2 % blocking reagent (Roche). Staining was developed for 4–16 hours with NBT and BCIP (Roche) in alkaline phosphatase buffer to yield a purple product. Slides were fixed in 4% formaldehyde and mounted with glycerol. To aid in the visualization of brain regions, Nissl staining was done on adjacent sections.

How the Program Works

Average difference (D) values for each gene are used for comparisons. The code searches for genes that are highly enriched in the reference sample with respect to all others. Two criteria are applied to identify enriched genes: (1) the D value for the gene in the reference sample; and 2) the ratios (-fold difference) of D values for this gene in the reference sample to that in each of the other samples.

Formally, a given gene j, with an average difference value in the reference sample of D_j,ref , is considered to be enriched in the reference sample relative to the other samples examined if:

(i) D_j,ref > minimum

(ii) D_j,ref/D_j,otherthreshold or D_j,ref/D_j,other < 0 for all other samples.

The minimum value and the threshold ratio can be arbitrarily chosen by the user. Thus, all genes that fulfill both the minimum value and threshold ratio conditions are identified as being enriched in the reference sample.

In our case, based on in situ hybridization experiments, a threshold ratio of 3.5 was optimal. Using higher threshold ratios (e.g., 5- or 6-fold) failed to identify many genes found by the lower-stringency search, whose differential expression could be validated by in situ hybridization. Conversely, lowering the threshold below 3.5 was more likely to identify genes whose expression was not region-specific. We set the minimum value = 1/10 of the mean of D values for all genes on each array (which in our case was ~120).

Code

The program can be downloaded at http://www.its.caltech.edu/~mariela/gene_screen.zip.

Instructions for Use

You need MATLAB (The MathWorks) to run the program.

You will need to enter the data as a .txt file. Consider the following for formatting the data:

The data should be in matrix form and contain average difference (D) values for all genes across all samples compared. The number of columns equals the number of samples to compare, and the number of rows corresponds to the quantity of different genes. In this way, each column contains D values for all genes in a given sample. [Make sure the genes have been sorted in the same way (e.g. alphabetically) across all samples.]

1. First, save the data in the way above described before running the program.

2. Open MATLAB.

3. Change directory to the one where you have downloaded the files.

4. Start by typing start_gs. The following gui should appear:

LOAD DATA MATRIX:

Click the Load Data Matrix button and browse to select the file containing your data.

REF. SAMPLE:

Enter here the sample number that you want to consider as reference in your comparison. (For example, enter 4 if your reference sample will be the fourth column of your data matrix).

THRESHOLD:

Enter here the value for the threshold ratio. (For example, if you enter 6 here, you will only identify genes that are enriched in your reference sample compared to all other samples by at least 6-fold).

MINIMUM:

Enter here the minimum D value that an enriched gene can have in your reference sample.

OUTPUT FILE:

The output of this program contains the following:

n indices of the rows with enriched genes.

n a matrix formed by D values of enriched genes only.

A .txt file will be created in c:/temp directory with this output. Enter here the name you want to give to this text file.

In addition, the conditions you set for comparison, together with the number of genes and samples analyzed will be automatically included in this file, which may be useful for future reference.

Example

If you want to see an example of how the program runs with a simple dataset, see the supplemental Appendix B.

To download the example dataset, go to http://www.its.caltech.edu/~mariela/example_data.txt.

Requests and Feedback

We recommend that you try running this code with several different settings to have an idea of what your data looks like. The default settings are the ones we recommend for initial comparisons. It takes less than a minute to run this program for 5 samples and 34,000 genes.

Experimental datasets are available upon request from Mariela Zirlinger. The code runs under MATLAB.

We appreciate your feedback:
Mariela Zirlinger
216-76 Caltech
Pasadena, CA 91125
mariela{at}caltech.edu