| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||
a Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Kentucky
b Department of Anatomy and Cell Biology, McGill University, Montréal, Canada
Eugenia Wang, Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, 570 South Preston Street, Baxter Building Room 304, Louisville, KY 40292 E-mail: eugenia.wang{at}louisville.edu.
Decision Editor: James R. Smith, PhD
| Abstract |
|---|
OVER the past century, as a result of increasing medical knowledge, better sanitation, and improved nutrition, developed countries are experiencing an unprecedented increase in average human life span, along with a higher incidence of multifactorial diseases such as cardiovascular diseases, neurodegenerative disorders, type 2 diabetes, and cancers (1). These age-dependent diseases, plaguing people as young as their mid-50s, are products of the combined influences of genetics and environment (2). Nature and nurture together provide predispositions to cancer, cardiovascular disease, diabetes, and neurodegenerative disorders, presenting a complex picture for the development of these perils in the fast-growing middle- and old-age subpopulations of our society.
Although recent advances in medical research have enabled us to diagnose several age-associated diseases, alleviate pain associated with them, and retard the onset of their acute stages, we remain largely incapable of identifying at an early stage those individuals bearing genetic predispositions to these diseases, and thus of administering preventive medicine or treatment. Because the human genome contains some 30,000 genes (3), and modern industrialized society yields increasing environmental complexity, it is an ever-greater challenge to perceive how the integration of our genes and surrounding environment creates disease-predisposed states. For example, why do certain individuals suffer lung cancer at an early age, after a few years of cigarette smoking, whereas some centenarians tolerate lifelong smoking without dying of the same disease? Such questions led to the idea of the need to identify "genetic signatures." Once genetic signatures are secured, one may develop means to prevent and/or treat diseases in an individual manner, creating individualized medicine for prognostic, diagnostic, and therapeutic purposes.
In general, large-sample gene chips, bearing perhaps 10,000 genes, are applicable only to early-stage screening and may yield voluminous lists of potential positive results; here we describe a next-generation, medium-density microarray approach embodying considerable quality control in both chip design and analysis, which results in fewer hits of enhanced accuracy and pertinence.
| Genetic Signatures and Microarray Technology |
|---|
| The Theory Behind Microarrays |
|---|
Although the principle behind microarrays is simple, creating and implementing microarray technology is difficult, as several parameters (discussed in the paragraphs that follow) can drastically affect the validity of the results obtained from microarrays. Furthermore, because the number of probes included on each microarray platform is great, the magnitude of results obtained from microarrays is huge, and thus requires powerful computerized image processing and statistical software to classify and analyze the data; without these, little significant gain can be obtained from using microarrays. Thus, microarray core facilities must integrate expertise in biology, computer science, engineering, and statistics. It is with this in mind that we started our quest for designer biochips.
| Platform and Printing Robots |
|---|
Once a microarray platform has been chosen, probes must be attached to the platform. Two obvious methods exist: synthesis of probes directly on the platform (14)(15)(16)(17)(18), and probe-spotting by use of a contact or noncontact printing robot (4)(19)(20)(21)(22). Although leading biochip companies often synthesize oligos directly on their microarrays by using techniques such as photolithography, this method is not easily mastered, nor accessible to most laboratories. In contrast, probe-spotting can be accomplished using any of several commercially available printing robots (22). Because we use membranes attached to glass slides as our platform instead of glass slides directly, we encountered several problems, including skipped dots and uneven printing, when we first attempted to print arrays. We had to substantially modify the printing heads of the first robot we purchased, and we had to build an enclosure over it that permitted maintenance of constant humidity, to ensure even printing of the probes.
| Probes |
|---|
| Thematic Microarray Design |
|---|
Primer design is perhaps the most time-consuming step in our microarray production, because once a primer pair is selected, an analysis must be performed with Blast (proprietary software available on the National Center for Biotechnology Information website) to ensure that each primer pair amplifies only the gene of interest. This is crucial, because results obtained from the microarray are dependent on the specificity of the amplicons. However, in some instances, the specificity of the primer may not guarantee the specificity of the generated amplicon, when a conserved or shared domain lurks somewhere within the sequence. It is therefore highly recommended that the entire amplicon sequences themselves be Blasted to identify homologous regions, which can cause nonspecific binding. With probes obtained from cDNA libraries this may become a pitfall, especially when the spotted nucleic acid sequence is unknown; often highly homogenous sequences may result in nonspecific binding between genes of high homology. Stringent hybridization conditions and washing can generally eliminate this nonspecific binding, if the homologous region is not too long.
Controls
As in any biological experiment, and most importantly for microarrays, controls must be carefully selected. It is important to spot on all microarrays negative and positive controls as well as "housekeeping genes," used in more traditional experiments such as quantitative RT-PCR, which show little or no physiological change in expression among the subjects or conditions being studied. The inclusion of housekeeping genes is useful for data normalization; for our designer microarrays targeted to mouse models, we selected six mouse genes (glyceraldehyde phosphate dehydrogenase, ribosomal S6, beta-actin, hypoxanthine-guanine phosphoribosyltransferase, phospholipase A2, and ubiquitin) commonly used in the literature as controls. In general, the validity of these controls must be determined a priori by using independent tests, such as Northern blotting assays or quantitative RT-PCR (24). For instance, EF-1
would be a poor choice for a housekeeping gene if the target nucleic acids were obtained from skeletal muscle, as it is not expressed in adult muscle cells; it would, however, be a good control when cDNA from liver is used as a target (25). The use of housekeeping genes permits the measuring of changes in gene expression against a gene whose expression does not vary significantly; in some cases this can be of great value. Negative controls should include buffer, bacterial, and viral DNA, as well as amplicons from genes known not to be expressed in target tissues. Negative controls are used to assess the level of background noise arising from nonspecific nucleic acid binding during probetarget hybridization. Positive controls such as total cDNA or genomic DNA permit the detection of suboptimal conditions of hybridization and staining, which may obscure appropriate signal intensity.
Quality Control for Amplicon Production
In order to avoid producing the wrong amplicon for printing as a result of contaminated PCR reactions, the use of dedicated equipment and reagents in the PCR setup and reaction areas is recommended. For each PCR reaction with a unique amplification primer pair, a negative control should be used to ensure the absence of reagent contamination, often caused by the presence of exogenous nucleic acids. This control reaction is identical to the regular reaction, except that no template is present. Agarose gel electrophoresis is used to verify the amplicons and ensure that they are of expected size. In instances where multiple bands result from the PCR, products can be resolved on an agarose gel, and the fragment of expected size excised; these amplicons can then be sequenced to confirm their identity. It is our experience that, when a primer pair is well chosen, multiple bands seldom result from the PCR reaction.
Printing the Arrays
Once amplicons have been produced for all genes of interest as well as housekeeping genes, arrays can be printed. To avoid positional bias, arrays should be printed in a scattered fashion, with several repeats of the same amplicon located in different regions of the chip. It is important to avoid positional bias, as uneven distribution of charges on the membrane can result in regions of increased background. A typical microarray manufactured in this fashion carries arrayed triplicates or quadruplets of amplicons from selected genes, positioned on the array among many control spots. The rationale for triplicate printing is to provide three data points for statistical analysis of significance; ideally, the three could be expanded to four or five repeats, to yield more data points for statistical analysis. This approach of scattered array printing requires considerable careful analytical software design, to enable tracking of amplicon repeats across the platform; however, it approaches an ultimate solution to resolving positional bias.
Although the spots of microarrays printed onto membranes affixed to glass slides are usually colorless, it is possible to monitor quality to detect gross errors in printing, such as missing, smeared, or non-uniform spots; immediately before a batch of microarrays are printed, a colored dye can be used to print a test array of dots onto a membrane. Microscopic visual inspection of the spots enables any necessary adjustments to be made to the robot before sample printing begins. While large batches of chips are being printed, quality can be monitored by inserting poly-L-lysine-coated glass slides among the membrane platforms. Unlike membranes, the clear surface of glass slides permits the researcher to see printed spots by breathing on the slide and viewing it through a transmitted light source.
Once the probes are printed on the membranes, they are cross-linked to the microarray to permit better attachment of the nucleic acids to the substratum; probes are denatured by boiling the membranes before hybridization.
| Target Labeling |
|---|
Using commercially available digoxigenin (DIG)-labeled dUTP (Roche, Palo Alto, CA) and alkaline phosphatase (AP)-conjugated anti-DIG antibody (27), we have developed a new application for DIG in microarrays (28). In our method, the cDNA to each donor RNA is synthesized with a DIG-labeled base. Following hybridization of the DIG-labeled target with the probes, positive reactions are revealed by incubating with anti-DIG antibody conjugated to AP, and subsequent staining with Nitro-blue-tetrazolium/5-Bromo-4-chloro-3-indolyl phosphate (NBT/BCIP, Roche) to detect AP (29). Taking advantage of the fact that two complementary nucleotide strands can hybridize with each other, we generate microarray results by quantifying the signal obtained from the labeled targets bound to the immobilized probes. Thus the positive loci are visible as bluish spots, easily identified as round deposits for each positive locus. The final detection is revealed as a matrix of many round dots of varying intensity of staining.
| Microarray Inventory |
|---|
| Image Acquisition and Data Processing |
|---|
Array Normalization and Background Subtraction
As our arrays are based on a colorimetric detection method, a high-resolution scanner is used to scan them into digital images. Before a normal office scanner is used, it is important to ensure that it digitizes accurately without transforming the image (26). If the image is transformed by the scanner, mathematical correction transformation should be applied to the result. Following acquisition, the digitized images can be normalized and subtracted as desired. We have developed a software program, GeneAnalyzer, which accomplishes background subtraction, array normalization, and quantification. When colorimetric microarrays are analyzed, several types of background must be considered; for instance, regional background subtraction is useful when the array shows differential intra-array background expression, whereas global background subtraction is suitable when the background value is constant within arrays but variable between arrays, as a result of experimental conditions. For interarray comparison to be supported, arrays may be normalized by several methods, including reference to housekeeping gene levels and median chip values. However, investigators should think carefully about the effects of performing such background standardization or normalization before they start analyzing their results; they should especially consider the effect of background subtraction on diminishing the signal of low-expression genes.
Software for Microarray Data Acquisition
In general, image acquisition and data analysis include the following processes: (a) image grabbing and digitizing; (b) image processing; and (c) data mining, including a qualitative and quantitative analysis of all digitized images, and a statistical analysis of data. We developed our software with a user-friendly interface and a limited number of preset functions, to enable researchers to analyze their own data. The main features of our program are as follows.
First, we provide users a personal identification number, which allows optimal security of their data and access to the interactive functions of our web server facility. Second, users can upload their electronic images from remote sites over the Web. Third, our system processes the users' initial data to enhance the image profile, through standard computer software such as MatLab. Fourth, our system supports the users' data archiving and database organization for the next stage of data analysis.
| Statistical Analysis and Data Mining |
|---|
Because the statistical analysis of microarrays presents a challenge to many biologists, it is recommended that a statistician be consulted as necessary. Statistical consultants can be extremely helpful, not only at the final stage of data analysis, but also at the initial experimental design step; for example, they may provide answers to fundamental questions, such as how many animals are needed to establish a statistically significant data analysis, or whether or not one may pool RNA samples.
Once microarray data are processed through statistical analysis, data entry points deemed of true "significant" value, that is, gene expression changes as effects of an experimental physiological change, should be subjected to the next level of data analysis, now popularly termed the data mining process (30). Many established methods have been popularized among microarray users, including principal component analysis (31)(32)(33), hierarchical clustering (34)(35)(36)(37), multidimensional scaling (38)(39)(40), and self-organizing maps (41). In general, the selection of any of these methods is dependent on the individual investigator's preference and expertise. For example, GeneSpring software, sold by Silicon Genetics, Inc. (San Carlos, CA), and Significance Analysis for Microarrays from Stanford University (42), are preferred for many gene screening data mining tasks because they can analyze data generated by several different microarray platforms. These data mining software packages enable researchers to display their data in forms suitable for publication, easily conveying the essence of the results.
Following data mining, microarray data that seem to be significant should be validated by using one of two popular methods: Northern blotting or quantitative RT-PCR. In general, it is advisable that microarray data be validated by the selection of four or five randomly designated genes from each of three categories: those showing high, intermediate, and low levels of significant difference. Because we use amplicons to generate our probes, we can easily validate our results, using the same primers used to generate our amplicons by quantitative or semi-quantitative RT-PCR.
During data analysis, special consideration must be given to low-expression genes, which generally exhibit the greatest variance in expression levels. On any given microarray, these genes show very weak intensities, and in some cases they are barely visible above the background value. Here, standard global normalization and thresholding are not practical, because the signals are so weak. Often we find that global thresholding is too crude, allowing in one case the gene expression to be quantified as a gain, and in another case allowing the same gene expression to be quantified as a loss. One possible solution for this problem is to use "segmental thresholding," localized thresholding for each individual weak spot. Then the local background level is calculated against the global background level to obtain confidence level indices. The actual gene expression level for these low-abundance genes is then the "minithresholding level" divided by the confidence level. We realize that this is not a perfect solution; often we have to disregard these data points altogether.
| Conclusions |
|---|
As with all technologic advances, the microarray approach is not an end in itself; it is just a beginning. Obviously, one wants to know whether the genes identified as significantly changed at the RNA level are truly manifested at the protein level. For this purpose, the recent explosion of proteomic technology is certainly a testimony to the need for follow-up to microarray data. Ultimately, gene expression microarray studies have to be followed with experiments to examine protein changes, thus permitting a comprehensive examination of gene expression changes from RNA to protein levels.
| Acknowledgments |
|---|
We express our sincere gratitude to Ms. Sherry Chen, Ms. Angel Wang, Mr. Keith Liang, Dr. Yih-Jing Tang, Dr. Nagathihalli Nagaraj, Dr. Bo Yu, and Ms. Jane Williams for their excellent technical assistance, and to Mr. Alan N. Bloch for proofreading this manuscript.
Received May 28, 2002
Accepted August 7, 2002
| References |
|---|
| ||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|