The field of functional genomics seeks to annotate genomic sequence with assignments of gene and regulatory element identities and locations, RNA variants and expression levels, and protein variants, abundance and interactions. As functional annotations accumulate, detailed and in-depth studies of specific genes and their products can be placed into a genomic context to reveal the molecular systems in which biological processes occur. In turn, those systems and processes can be studied using functional genomics techniques to identify the genes important for their regulation. One such technique is transcript profiling, the parallel quantitation of RNA abundance for thousands of genes. Transcript profiling can be open-ended, using methods that do not rely on previous knowledge of gene sequences, or closed to a defined set of probes for known RNAs. There are also ways to combine both approaches by starting with an open-ended functional enrichment of transcribed sequences and using them to create probe collections for further profiling before determining their identities.

DNA microarrays have become an established method for generating transcript profiles using probe collections. Like northern blot probes, microarray probes have a known (or knowable) sequence and concentration, but are immobilized in an array on a surface substrate at defined locations. Probes may be PCR products from cDNA clones, long oligomers that are pre-synthesized and then printed as an array, or short oligomers synthesized directly on the array substrate. The target, derived from a complex pool of known and unknown RNAs at unknown concentrations, is labeled with a detection reagent and hybridized to the microarray of probes. Quantitation of the resulting signal from each probe + target reveals the relative RNA abundance for each targeted gene.

Affymetrix GeneChips are commercially available microarrays appropriate for transcript profiling of samples from a number of species. Probe sequences are chosen from the public UniGene databases and synthesized by photolithography directly on the array surface. Depending on the array type, eleven to twenty independent probes are chosen for each unique UniGene accession and synthesized as 25 nucleotide oligomers. A mismatch probe, identical to the perfect match sequence except for a single incorrect base in the middle of the oligomer, is synthesized adjacent to the perfect match probe feature. Hybridization conditions and probe compositions are such that the target sequence should hybridize to the perfect match but not the mismatch probe. Signals from the mismatch features are subtracted from the perfect matches as background or non-specific binding, or in cases of excessive background the gene is labeled as undetected. Separate analysis algorithms are used to make a detected/not detected call and assign a quantitative signal for each targeted gene. Confidence values for the detection calls are based on performance consistency across all the probes synthesized for each gene. Affymetrix maintains databases of target and probe sequences, annotations, and documentation that are readily available to GeneChip users (see

The GeneChip manufacturing strategy can place nearly 500,000 features, each 18 microns across, in an area of about 1 sq. cm. New version GeneChips are now available with 11 micron features and over one million probes per array. The microarray is enclosed in a cassette that contains hybridization, wash and stain solutions and provides a window for scanning. RNA samples to be assayed are converted to cDNA, linearly amplified by in vitro transcription (IVT) that incorporates biotinylated nucleotides, fragmented and hybridized to the microarray. Low and high stringency washes are followed by staining with streptavidin-phycoerythrin and biotinylated anti-streptavidin. A confocal laser scanner quantitates fluorescence emission at each feature, and the average signal from two consecutive scans is computed.

Custom designed arrays can be printed on nylon membranes or coated glass slides. Probe collections were at first PCR amplification products from cDNA templates, but now can include PCR products from BACs, 60-70mer oligonucleotides, or shorter derivitized oligomers that specifically attach to coated surfaces. Robotic printers routinely create 150-micron features on glass substrates and can accurately place more than 20,000 features on a substrate the size of a microscope slide. Target pools can be radiolabeled for hybridization to membrane arrays or be directly or indirectly (with or without IVT amplification) labeled with fluorescent dyes for competitive hybridization to glass microarrays. In the latter case, two target samples are separately labeled with Cy3 or Cy5 dye, combined equally in one hybridization, and the fluorescence ratio at each feature is detected by a dual channel scanner.

Affymetrix GeneChips offer the highest probe densities currently available, targeting all or nearly all known expressed genes for several organisms with multiple probes per gene. The platform is consistent and robust, and provides fast turnaround from RNA to data. GeneChips are expensive, usually costing $375-$425 per array (single use), not including target preparation. Probe design is controlled by Affymetrix and relies on public sequence compilations, although previous design errors have prompted more extensive quality control for probe picking. GeneChips can be customized from existing probe sets or using a client�s target sequences, but the setup charges and bulk purchasing requirements make this appropriate only for high volume applications. Glass slide microarrays printed in-house are highly flexible, both for probe composition and target labeling strategies. Per-assay costs (not including probe set development) are cheap; commercial glass substrates are less than $15 each and two samples can be labeled and tested on one array for one-quarter the cost of a single GeneChip. Producing probe sets is costly and/or laborious. If probes are PCR products, an assembly line of library management, plasmid propagation, PCR amplification and probe quality control must be constructed. Commercial oligomer sets offer the advantage of skipping these steps, but require a large initial investment for probe synthesis although the yield is usually sufficient for creating thousands of microarrays. Many transcript profiling projects have adopted a strategy of screening key samples with GeneChips, then using a liberal collection of candidates to develop a focused glass slide microarray for profiling more treatments, time points or replicates.

The Penn Microarray Facility provides instrumentation and expertise for RNA transcript profiling. The Facility primarily supports two microarray formats: oligonucleotide arrays synthesized by Affymetrix and arrays of cDNAs or oligomers printed in-house on glass slides. This reflects our goal of offering a range of cost and performance options suitable for a variety of experimental questions. Researchers from the University and its affiliated institutions are invited to utilize the Microarray Facility as part of their functional genomics efforts, and projects from non-affiliated institutions will be considered by special arrangement. All projects are initiated only after consultation with the facility director; this ideally occurs during the experimental design stage to ensure maximal and meaningful results.

Array Examples

This Affymetrix GeneChip contains oligomer probes for more than 12,000 mouse genes arrayed on a surface of about one square centimeter. Each gene is represented as a set of up to 20 independent oligomers, synthesized by photolithography at 20 different locations or features throughout the array. Every probe is accompanied by a mismatch oligomer to detect background or cross-hybridization, synthesized in a feature adjacent to the perfect match probe. An estimate of the total number of features in this image, therefore, is 12,000 genes X 40 paired probes, or 480,000 features not counting control genes and markers.

This view of a portion of the Affymetrix Mouse GeneChip is centered on a grid of 16x16 features. Each feature is 20 micrometers across, about one-fourth the width of a human hair. The grid is surrounded by probe pairs for mouse genes; note that the upper perfect match probe usually gives much greater hybridization signal than its mismatch counterpart directly beneath.

A composite image of a probe microarray printed on a glass microscope slide. PCR fragments were deposited in 1 nl aliquots, producing features about 120 microns in diameter. These probes were used for competitive hybridization of two samples, one labeled with the dye Cy3 and the other Cy5. The ratio of labeled target hybridizing to each probe is reflected by its color: more green means higher amounts of the targeted RNA in sample 1, more red for greater abundance in sample 2, and yellow for equivalent amounts in both samples.

Tests Performed by Facility

Apo E Genotyoping
Apolipoproteins are proteins that bind and transport lipids in blood and tissues. The complex of protein and lipid is a lipoprotein. Several major apolipoproteins have been described which differ in their structure, physicochemical behavior, function and distribution. Apolipoprotein E (ApoE) binds to the LDL receptor and serves as a ligand for receptor-mediated endocytosis.
Genetic defects of apolipoproteins cause various abnormalities in lipid metabolism and increased susceptibility to heart disease. The ApoE gene spans 3.7 kb including 4 exons and is located on chromosome 19. Three common alleles of ApoE (e2/e3/e4) have been identified and are expressed codominantly to generate 6 possible genotypes. These 3 alleles differ from each other by single nucleotide substitutions at codons 112 and 158 and are distinguished by cysteine and/or arginine at these positions. ApoE allelic frequencies vary among different populations.
The principle of ApoE genotyping is restriction isoform genotyping by PCR gene amplification and cleavage with the enzyme HhaI (Tsukamoto et al, 1993). Since nucleotide substitutions in the 3 common alleles alter 2 HhaI cleavage sites, all genotypic combinations can be distinguished.

Example of ApoE Genotyping

Donor Lymphocyte Infusion Analysis
In order to evaluate engraftment after donor lymphocyte infusion and to correlate the assay results with the clinical outcome, polymerase chain reaction (PCR) amplification is done using three sets of fluorescently labeled multiplex or monoplex primers (Powerplex, CTTv, FFFL, and GSTR; GenePrint from Promega) of polymorphic microsatellite loci using DNA from the donor and recipient before infusion. A total of 12 loci are used for pre-infusion or pre-transplant evaluation. These specific short tandem repeat (STR) loci consist of repetitive sequence elements of 3 to 7 bases pairs in length. These abundant repeats are distributed throughout the human genome and are a rich source of highly polymorphic markers, which often may be detected using PCR. After amplification, the fluorescent PCR products are analyzed on an ABI 3100 automated DNA sequencer, using capillary electrophoresis to determine the informative allele. Once the informative allele for a pair of donor and pre-infusion recipient is established, that particular locus is used to track the ratio of donor and recipient cells after post-infusion. Data is analyzed using ABI Genescan 3.7 software and PCR product sizes and areas of the peak are assigned using Genotyper 3.7 software, where results are reported as the percentage donor relative to the signal.

Strategy of Donor Lymphocyte Infusion Analysis

Examples of Donor Lymphocyte Infusion Analysis

Gene Therapy with Avigen Vector

AAV-mediated gene transfer to patient fluids can be demonstrated by PCR amplification of vector sequences present in DNA extracted from the specimen. Following DNA extraction to isolate DNA and to remove PCR inhibitors, DNA is amplified using primers that are specific for the vector sequences. Amplification products are analyzed by agarose gel electrophoresis and visualized by ethidium bromide staining.

Example of Gene Therapy w/Avigen Vector

Top of Page

Identity Testing

STR (short tandem repeat) loci consist of short, repetitive sequence elements of 3 to 7 base pairs in length. These abundant repeats are well distributed throughout the human genome and are a rich source of highly polymorphic markers which often may be detected using polymerase chain reaction. Alleles of these loci are differentiated by the number of copies of the repeat sequence contained within the amplified region and distinguished from one another using fluorescence detection following electrophoretic separation.

Example of Identity Testing

T Cell Receptor V Analysis (CDR3 Spectratyping)
T-cells recognize small peptides that are presented at the cell surface by major histocompatibility complex (MHC) molecules through their T-cell receptor (TCR). Peptide-binding specificity is determined by variations in the structure of TCR. In addition to combinatorial diversity among multiple variable (V), diversity (D), and joining (J) gene, addition or deletion of nucleotides in the complementarily determining region 3 (CDR3) during intra-thymic development also contributes to shape the TCR. In humans, there are 65 TCR variable (TCRV) gene segments that have been identified by genomic DNA sequencing, and classified into 30 TCRV families ranging in size from one to nine based on >75% nucleotide sequence similarity, of which 23 families comprising 46 subfamilies are functional, whereas the rest are pseudo-genes. TCRV transcripts can be identified by RT-PCR, but multiple reactions are required to detect all genes of the TCRV families. This assay is performed by multiplexing PCR reactions for 46 functional genes comparing 23 TCRV families in 5 reactions where each contains 4 to 7 specific primers together with a fluorescence-tagged TCRV constant region primer. Between 8 and 10 distinct subtypes within 23 TCRV families can be identified by analysis of the CDR3 length.