Biochemical Activities and Assays

Publication Details

Assays Using Preintegration Complexes from Infected Cells

The earliest in vitro assays for integration used viral replication intermediates isolated from acutely infected cells as the source for both the integration machinery and the viral DNA substrate for integration (Brown et al. 1987, 1989; Bowerman et al. 1989; Farnet and Haseltine 1990; Lee and Coffin 1990; Kitamura et al. 1992; Pryciak and Varmus 1992a,b). These preintegration complexes are sufficiently stable that they can be substantially purified, allowing their composition and the requirements for their activity to be studied directly (Brown et al. 1987, 1989; Bowerman et al. 1989; Farnet and Haseltine 1991b; Bukrinsky et al. 1993b; Lee and Craigie 1994; Farnet and Bushman 1997). The most sensitive assays have used either genetic selection (Brown et al. 1987) or the polymerase chain reaction (Kitamura et al. 1992; Pryciak and Varmus 1992b) to amplify the recombinant products specifically, allowing their detection and characterization (Fig. 5A,B). The polymerase chain reaction, using primer pairs that recognize specific sequences in the target and the viral DNA ends, respectively, is particularly useful in this setting, because it provides a convenient and accurate way to map the distribution of integration events in a target sequence (Grandgenett and Vora 1985; Kitamura et al. 1992; Pryciak et al. 1992a; Pryciak and Varmus 1992b; Withers-Ward et al. 1994; Smith et al. 1995). When the integration products are sufficiently abundant, simpler and more quantitative but less sensi- tive methods, including filter blot hybridization to detect products resolved by gel electrophoresis, can be used (Fujiwara and Mizuuchi 1988; Brown et al. 1989; Farnet and Haseltine 1990; Lee and Coffin 1990).

Figure 5. Assays for preintegration complexes and purified integrase.

Figure 5

Assays for preintegration complexes and purified integrase. (A) Genetic assay for integration (Brown et al. 1987). Preintegration complexes are isolated from cells infected with an MLV strain carrying the E. coli SupF amber suppressor tRNA. Incubation (more...)

Assays with Purified Enzyme and Substrates

Specific Endonuclease: 3′-end Processing

This activity (Fig. 5C), corresponding to the first step catalyzed by integrase in the integration pathway, is usually and most conveniently assayed by using short synthetic oligonucleotide substrates that mimic the ends of the viral DNA molecule (Katzman et al. 1989; Craigie et al. 1990; Sherman and Fyfe 1990), although longer substrates have been used (Vora et al. 1990). Gel electrophoresis is routinely used to resolve the products from the substrate. Typically, the 5′end of the strand corresponding to the viral DNA 3′end is radioactively labeled to allow sensitive detection. Alternatively, the dinucleotide product released from the 3′end can be detected by labeling the 3′end of the processed strand of the substrate (Engelman et al. 1991; Vink et al. 1991b). This latter assay has the virtue of resolving several alternative cleavage products, resulting from the use of either water, glycerol or other alcohols, or the terminal 3′-OH of the viral DNA as the attacking nucleophile. The need for a high-throughput assay to screen for integrase inhibitors as candidate antiviral drugs has led to assays for 3′-end processing that do not require gel electrophoresis to detect products (Fitzgerald et al. 1991; B. Muller et al. 1993).

Nonspecific Endonuclease: 3′-end Processing

An assay for a nonspecific endonuclease activity was the earliest to detect any enzymatic activity in purified preparations of integrase (Grandgenett et al. 1978). The significance of this activity and its relationship to the highly specific endonuclease activity that normally processes the viral 3′ends remain unclear, however. The simplest assays for this activity monitor conversion of a supercoiled, covalently closed circular DNA molecule into a relaxed, nicked form, using agarose gel electrophoresis to resolve the two species (Grandgenett et al. 1978; Sherman and Fyfe 1990). Alternative assays have used end-labeled linear single-stranded or double-stranded DNAs or primer-extension assays to detect and map cleavage sites (Duyk et al. 1983, 1985; Grandgenett and Vora 1985; Grandgenett et al. 1986; Cobrinik et al. 1987). The nonspecific endonuclease activity is stimulated by Mn++ and strongly favors supercoiled or single-stranded substrates over relaxed duplex substrates (Duyk et al. 1983, 1985; Grandgenett and Vora 1985; Cobrinik et al. 1987; Sherman and Fyfe 1990), perhaps reflecting the same specificity for frayed DNA structures that is observed in the specific 3′-end processing reaction (Scottoline et al. 1997). Integrase can also promote nonspecific alcoholysis of nonviral DNA substrates (Katzman and Sudol 1996). Integrase's nonspecific endonuclease activity can present a technical complication in experiments using supercoiled DNA molecules as substrates.

DNA Joining

The DNA joining step of the integration process (Fig. 5D), sometimes called the strand-transfer step, can readily be assayed using oligonucleotide substrates in the same system typically used to assay 3′-end processing (Craigie et al. 1990). A single species of double-stranded oligonucleotide, designed to mimic the viral DNA end, can provide both the viral DNA end and target DNA substrates in the reaction. Typically, the strand that corresponds to the 3′end of the viral DNA is radioactively labeled at its 5′end, to facilitate detection of the products. It is often convenient to bypass the 3′-end processing step by using synthetic substrates which lack the two nucleotides that are removed in the end-processing reaction. An oligonucleotide that differs from the viral DNA end substrate, or a larger DNA molecule (e.g., a plasmid), can also serve as the target (Craigie et al. 1990; Bushman and Craigie 1991; Leavitt et al. 1992; Pryciak and Varmus 1992b; Carteau et al. 1993c). The integration products can be resolved by gel electrophoresis and detected on the basis of their characteristic size (Craigie et al. 1990), specifically amplified in a polymerase chain reaction (Pryciak and Varmus 1992b) or captured on a solid substrate, either by using an immobilized oligonucleotide as a model viral DNA end (Hazuda et al. 1994a) or by using a ligand such as biotin on one of the substrate DNAs as a handle to capture the product (Craigie et al. 1991; Vink et al. 1994b). The capture assays are especially convenient for high-throughput screens for inhibitors.

Concerted Integration of Two Viral DNA Ends

Assays using only oligonucleotide substrates do not distinguish between the concerted integration of two viral DNA ends into a DNA target and independent integration of a single end. Indeed, the predominant product of the model reactions using short model substrates and purified integrase appears to consist of a single viral DNA end joined to one strand of a target DNA duplex. Use of a circular DNA molecule as a target can allow these “single-end” integration products to be resolved by gel electrophoresis from products produced by concerted integration of two viral DNA ends (Craigie et al. 1990). Genetic assays using “mini-viral DNA” substrates, in which a selectable marker is flanked by model viral DNA ends, are highly sensitive and specifically detect concerted integration products (Fujiwara and Craigie 1989; Bushman et al. 1990; Katz et al. 1990; Fitzgerald et al. 1992). However, for most routine purposes, genetic assays for mini-viral DNA integration have been supplanted by more convenient physical assays (Vora et al. 1994).

Disintegration

Integrase can reverse the DNA joining reaction (Fig. 5E), releasing a viral DNA end, and restoring the continuity of the target DNA strand, using a model substrate that mimics the product of integration of a single viral DNA 3′end into one strand of an oligonucleotide target (Chow et al. 1992). This reverse reaction has been called disintegration. Integrase can also mediate a DNA cleavage-ligation reaction on a variant of the disintegration substrate in which the entire viral DNA portion is single-stranded, or even replaced by a single nucleotide, an activity termed DNA splicing because of its formal resemblance to an RNA-splicing reaction (Chow et al. 1992). As exemplified by the DNA-splicing activity, the substrate sequence and structure requirements for disintegration are much less stringent than those for 3′-end processing and DNA joining (Chow et al. 1992; Sherman et al. 1992; Donzella et al. 1993; Jonsson et al. 1993a; Chow and Brown 1994b). Similarly, many genetic variants of integrase that lack detectable activity in the 3′-end processing and DNA-joining assays retain the ability to catalyze disintegration, facilitating functional studies of mutant integrases (Engelman and Craigie 1992; van Gent et al. 1992; Vincent et al. 1993). This assay has therefore been valuable in locating the catalytic domain (Vink et al. 1993; Engelman et al. 1994; Hickman et al. 1994; Bujacz et al. 1995). Although a role in the final 5′-end joining step of integration has been suggested (Chow et al. 1992; Kulkosky et al. 1995; Roe et al. 1997), it remains to be determined whether the disintegration or DNA-splicing activities of integrase play any part in vivo.

DNA Binding

Integrase binds DNA (Grandgenett et al. 1978). Two functionally distinct DNA-binding sites are implicated in the DNA-joining reaction, one for the viral DNA end and a second for the target DNA, but it has been difficult to distinguish these sites in a simple DNA-binding assay. Nonspecific DNA can compete effectively with viral DNA for binding to its cognate site, inhibiting the 3′-end processing reaction, although it cannot replace the viral DNA as a substrate for that reaction (LaFemina et al. 1991; van Gent et al. 1991; Dotan et al. 1995). This suggests that the site which normally serves to bind the viral DNA end has a similar affinity for DNA molecules that have no overt similarity to viral DNA. Alternatively, an allosteric interaction between DNA-binding sites might prevent an unoccupied viral DNA-binding site from binding to a viral DNA end when the target DNA-binding site is filled. Assays that have been used to measure, characterize, and map the DNA-binding activity of integrase include nitrocellulose filter binding of labeled DNA (Knaus et al. 1984; Terry et al. 1988; Khan et al. 1991; Schauer and Billich 1992), affinity chromatography (Mumm and Grandgenett 1991), UV crosslinking of the enzyme to labeled DNA substrates (Drelich et al. 1993; Engelman et al. 1994; Hazuda et al. 1994b; Lutzke et al. 1994; Yoshinaga et al. 1994; Dotan et al. 1995), precipitation of integrase-DNA complexes by divalent cations (van Gent et al. 1991), and Southwestern blotting—in which integrase polypeptides resolved by gel electrophoresis are transferred to a membrane, and bind radioactively labeled DNA molecules (Roth et al. 1990; Vink et al. 1993). Gel mobility shift assays have been attempted by numerous groups, but the results, mostly unpublished, have been contradictory and controversial, perhaps reflecting the heterogeneity both in the stoichiometry of integrase multimers and in modes of DNA binding (Basu and Varmus 1990, 1991; Krogstad and Champoux 1990; van Gent et al. 1991; Vincent et al. 1993; Hazuda et al. 1994b).

Interactions between Integrase Protomers

Integrase protomers interact to form dimers and higher-order multimers (Grandgenett et al. 1978; Jones et al. 1992; Engleman et al. 1993; Kalpana and Goff 1993; van Gent et al. 1993b; Vincent et al. 1993; Chow and Brown 1994a; Dyda et al. 1994; Hazuda et al. 1994b; Hickman et al. 1994; Ellison et al. 1995; Jonsson et al. 1996). The protein-protein interactions that mediate dimerization appear to be distinct from those that mediate higher-order multimerization (Engleman et al. 1993; van Gent et al. 1993b; Hickman et al. 1994; Andrake and Skalka 1995; Ellison et al. 1995; Jenkins et al. 1996; Zheng et al. 1996). Diverse assays have been used to detect and study these interactions, their requirements, and the structures of the multimeric forms: (1) gel-exclusion chromatography (Vincent et al. 1993), (2) sedimentation velocity (Sherman and Fyfe 1990; Vincent et al. 1993), (3) sedimentation equilibrium (Jones et al. 1992; Hickman et al. 1994), (4) chemical crosslinking of protomers in integrase multimers (Engleman et al. 1993), (5) “far Western” blot analysis, in which a labeled preparation of integrase is incubated with a membrane onto which various forms of integrase have been transferred from SDS-PAGE, (6) in vitro complementation between different defective variants of integrase (Engleman et al. 1993; van Gent et al. 1993b; Ellison et al. 1995), and (7) by means of the “two-hybrid” genetic assay in Saccharomyces cerevisiae (Chien et al. 1991; Kalpana and Goff 1993).

Biosynthesis of Integrase

The integrase protein is encoded by sequences at the 3′end of the pol gene, immediately downstream from the sequences encoding reverse transcriptase, in the same uninterrupted reading frame (Chapter 2). Integrase is thus initially synthesized as the carboxy-terminal part of the Gag-Pol polyprotein, fused at its amino terminus to the carboxy-terminal portion of reverse transcriptase, the RNase H domain. This organization, although apparently conserved among all retroviruses, is slightly altered in the closely related yeast retrotransposons Ty1 and Ty2, in which the order of the integrase and RT domains in the polyprotein precursor is reversed (Chapter 8. In general, the carboxyl terminus of the retroviral integrase polypeptide coincides with the carboxyl terminus of the Gag-Pol polyprotein precursor. An exception is the ASLV GagPol polyprotein, which includes an additional 37-amino-acid polypeptide fused to the carboxyl terminus of the integrase domain. This small peptide, which appears to be nonessential for viral replication, is cleaved from integrase during proteolytic maturation of the virion (Grandgenett et al. 1985; Alexander et al. 1987; Katz and Skalka 1988; Horton et al. 1991).

The fact that integrase is synthesized as part of the Gag-Pol polyprotein has several corollaries. As with the other Pol proteins, its incorporation into the virion is passively determined by the Gag portion of the precursor. The viral protease cleaves the junction between reverse transcriptase and integrase, in the maturation process that follows virion assembly. The stoichiometry of integrase protomers in the virion is 1:1 with reverse transcriptase protomers, or approximately 50–100 protomers per viral particle (Panet et al. 1975; Krakower et al. 1977; see Chapter 2. The quaternary structure of the Gag-Pol precursor in the virion is not known, but the dimeric structure of viral proteases, and of the reverse transcriptases of many retroviruses, may reflect an early dimerization of the Gag-Pol precursor, which could in turn contribute to assembly of integrase dimers (see below).

Interestingly, the ASLV reverse transcriptase dimers are principally found as αβ heterodimers, with the α-subunit containing the polymerase and RNase H domains of a typical reverse transcriptase polypeptide, and the β-subunit identical to an α-subunit with integrase fused to its carboxyl terminus—i.e., a polypeptide in which the protease cleavage site between integrase and the α-subunit is left uncut (Chapter 4. No polypeptide equivalent to β is detectable in mature MLV or HIV virions (Hu et al. 1986; Lightfoote et al. 1986; Tanese et al. 1986). The αβ form of ASLV reverse transcriptase, and the less abundant ββ dimers that can also be isolated from ASLV virions appear to be deficient in integrase activity, and their role in integration in vivo, if any, remains to be established (Grandgenett et al. 1980; Katzman et al. 1989). Their existence, however, suggests that the folding and assembly of reverse transcriptase and integrase in vivo may be closely linked. Moreover, the apparent stabilization of ASLV reverse transcriptase by the integrase domain of the β-subunit suggests that there may be intimate contact between integrase and reverse transcriptase domains (Hizi and Joklik 1977; Grandgenett et al. 1980; Katzman et al. 1989).

A similar inference can be drawn from the observation that a significant fraction of the integrase protein isolated from MLV virions appears to be linked by disulfide bonds to reverse transcriptase, under some conditions of isolation and immunoprecipitation (Hu et al. 1986) but not others (Tanese et al. 1986). Although the existence of disulfide linkages between integrase and reverse transcriptase in vivo is doubtful, their ready formation in vitro suggests a propensity for close physical association between the two proteins. The final stage of DNA synthesis by reverse transcriptase generates the very DNA substrate that is acted upon by integrase. A hypothetical interaction between the two enzymes could thus facilitate transfer of the completed product of reverse transcription to integrase. The proximity of integrase to reverse transcriptase in the immature virion, and the implied interface between the proteins, may presage or contribute to such an association. Nevertheless, phenotypically mixed mutant MLV virions, in which the only fully functional copies of integrase and reverse transcriptase are provided by separate Gag-Pol precursor polypeptides, appear to have little difficulty in completing reverse transcription and integration (Telesnitsky et al. 1993).

The avian retroviral integrase is unusual in that it can undergo phosphorylation in vivo (Schiff and Grandgenett 1980). The phosphorylation occurs at a serine residue five amino acids from the carboxyl terminus of the protein (Horton et al. 1991) and does not appear to be important for its activity (Terry et al. 1988), nor does phosphorylation appear to be a general characteristic of retroviral integrases. The slight possibility that it might have a regulatory role has not been excluded. No other covalent modifications of integrase have been recognized, and the overall similarity between the properties of integrases purified from virions and recombinant integrases purified from unnatural expression systems argues against a key role for any such modifications, should they exist.

Primary Structure of Integrase

All of the retroviral integrase polypeptides that have been characterized to date are similar in size—typically about 300 amino acids, and ranging from approximately 280 to 450 amino acids in length. Many features of the amino acid sequence are highly conserved among the integrases of retroviruses, retrotransposons, and even many prokaryotic transposable elements (Johnson et al. 1986; Fayet et al. 1990; Rowland and Dyke 1990; Khan et al. 1991; Engelman and Craigie 1992; Kulkosky et al. 1992; Doak et al. 1994; Polard and Chandler 1995). The most distinctive of the conserved features are a zinc-finger-like “HHCC” motif, found very near the amino terminus of all retroviral and retrotransposon integrases, and the “DD35E” motif—a universal feature of integrases and transposases, defined by a set of three acidic residues with stereotyped spacing (Fig. 6) (Fayet et al. 1990; Rowland and Dyke 1990; Khan et al. 1991; Kulkosky et al. 1992). The sequence conservation among members of the integrase/ transposase superfamily suggests that studies of structure-function relationships in other family members will contribute to our understanding of integrase.

Figure 6. Schematic of the domain structure of retroviral integrases.

Figure 6

Schematic of the domain structure of retroviral integrases. The three domains appear to be stably folded when prepared separately. The amino-terminal-most (HHCC) domain is characterized by pairs of histidine and cysteine residues that are universally (more...)

HHCC Domain

This feature is typically located in the amino-terminal 20% of the integrase protein (Fig. 6). It is characterized by two histidine and two cysteine residues with the stereotyped spacing H-X(3-7)-H-X(23-32)-C-X2-C(Johnson et al. 1986). The HHCC domain is essential for normal integrase activity in vitro (Khan et al. 1991; Drelich et al. 1992; Schauer and Billich 1992; van Gent et al. 1992; Jonsson and Roth 1993; Leavitt et al. 1993; Vincent et al. 1993; Bushman and Wang 1994; Ellison et al. 1995). Mutations in the conserved histidines and cysteines of HIV-1 integrase (Cannon et al. 1994; Engelman et al. 1995; Wiskerchen and Muesing 1995) and linker insertions disrupting this domain in MLV integrase (Donehower 1988; Roth et al. 1990) block integration, and thus replication, in vivo. The constellation of histidine and cysteine residues is reminiscent of zinc finger domains found in transcription factors and other proteins (Berg 1990). Indeed, the HHCC domain of HIV integrase can bind tightly to a single zinc ion, with a K D that appears to be less than or equal to 10–10 M (Burke et al. 1992; Bushman et al. 1993). The conserved histidine and cysteine residues appear to be essential for zinc binding (Bushman et al. 1993). Conversely, zinc appears to stabilize the folded structure of the amino-terminal domain of HIV-1 integrase (Burke et al. 1992; Zheng et al. 1996), and thereby promote tetramerization of integrase protomers, and enhance the enzyme's catalytic activity (Zheng et al. 1996). It is therefore probable that a zinc ion is an intrinsic and essential component of the enzyme.

Considerable effort has been devoted to searching for direct interactions between the HHCC domain and DNA, but no such interactions have been found. Indeed, mutant proteins that lack this domain retain the ability to bind DNA (Roth et al. 1990; Khan et al. 1991; Mumm and Grandgenett 1991; Schauer and Billich 1992; Woerner et al. 1992; Bushman et al. 1993; Vincent et al. 1993; Vink et al. 1993) and to recognize the conserved sequence features of the viral DNA end (Vincent et al. 1993; Katzman and Sudol 1995). Exchanging the HHCC domains between the integrases of HIV-1 and visna virus does not lead to a corresponding switch in viral DNA sequence specificity (Katzman and Sudol 1995). Moreover, mutations in the HHCC domain do not alter the target specificity of the residual integration activity (Leavitt et al. 1993), yet they dramatically alter the specificity of the enzyme for DNA substrates (Khan et al. 1991; Jonsson and Roth 1993; Vincent et al. 1993). HHCC-defective mutants can catalyze disintegration—the reversal of the DNA joining step of integration (Chow et al. 1992)—with normal or even enhanced activity, but they are virtually devoid of 3′-end processing or integration activity when assayed using oligonucleotide substrates that mimic the viral DNA end (Engelman and Craigie 1992; van Gent et al. 1992; Bushman et al. 1993; Vincent et al. 1993; Vink et al. 1993). These mutants do not lack inherent ability to integrate a viral DNA end, as they can efficiently reintegrate the excised viral DNA end following disintegration (Vincent et al. 1993). Detailed comparison of the substrate specificity of the HHCC mutant enzymes with wild-type integrase in the disintegration reaction suggests that these mutants are defective in their ability to interact stably with portions of the viral DNA end internal to the conserved terminal sequences (Jonsson and Roth 1993; Vincent et al. 1993; Ellison et al. 1995). Thus, the HHCC domain appears to participate indirectly in stable binding of the enzyme to the viral DNA (Ellison et al. 1995).

Although the HHCC domain does not appear to be important for dimerization, it may have a key role in protein-protein interactions that mediate higher-order multimerization (Ellison et al. 1995; Zheng et al. 1996). The HHCC-domain-dependent multimerization may be necessary to allow the protein to interact with internal sites near the viral DNA ends (Ellison et al. 1995).

Catalytic Core/DDE

The active site of integrase is found in a protease-resistant central domain that comprises approximately half of the integrase polypeptide (Fig. 6) (Engelman and Craigie 1992). Deletion derivatives containing this core domain, but lacking either the HHCC domain or the carboxy-terminal approximately 100 amino acids (Vink et al. 1993), or both (Bushman et al. 1993; Kulkosky et al. 1995), retain vestigial ability to catalyze the disintegration reaction. The catalytic core domain is the most phylogenetically conserved contiguous region in the primary structure of integrases (Johnson et al. 1986; Fayet et al. 1990; Rowland and Dyke 1990; Khan et al. 1991; Engelman and Craigie 1992; Kulkosky et al. 1992; Doak et al. 1994). The distinguishing feature in the sequence of this domain is the motif, D-X(39-58)-D-X35-E, which is universally conserved among retroviral and retrotransposon integrases (Rowland and Dyke 1990; Khan et al. 1991; Engelman and Craigie 1992; Kulkosky et al. 1992). The same or related motifs can also be recognized in the transposases of diverse prokaryotic and eukaryotic transposable elements (Fayet et al. 1990; Rowland and Dyke 1990; Baker and Luo 1994; Doak et al. 1994). The presence of three acidic residues as the key features of the catalytic domain suggests that one or more divalent metal ions bound in the active site by these carboxyl groups might participate in catalyzing the phosphotransfer reactions mediated by integrase (Beese and Steitz 1991; Engelman and Craigie 1992; Kulkosky et al. 1992; Katayanagi et al. 1993).

Genetic evidence supports the view that the three acidic residues (D64, D116, and E152 in HIV-1 integrase) are critical components of the active site. All three residues are required for integration in vivo (LaFemina et al. 1992; Cannon et al. 1994; Shin et al. 1994; Taddeo et al. 1994; Engelman et al. 1995; Englund et al. 1995; Wiskerchen and Muesing 1995) and for catalytic activity in vitro (Engelman and Craigie 1992; Kulkosky et al. 1992; Drelich et al. 1993; Leavitt et al. 1993), although weak residual disintegration activity can be observed when the alternative acidic amino acid is substituted for D116 or E152 (Engelman and Craigie 1992). In the transposases of bacteriophage Mu or Tn7, mutations that lead to substitution of a cysteine for one of the homologous acidic residues dramatically alter the metal ion specificity of the enzyme, strongly suggesting that these residues coordinate a metal ion that participates in catalysis (Baker and Luo 1994; Sarnovsky et al. 1996). Moreover, substitutions for other amino acids in the immediate vicinity of D116 and E152 can alter the specificity of the active site of HIV-1 integrase for the nucleophile in the 3′-end processing reaction and for the metal ion cofactor (Engelman and Craigie 1992; van Gent et al. 1993a). In addition to its putative role in catalysis (or alternatively), the central aspartate of the DD35E triad (D116 of HIV-1 or HIV-2 integrase) may also have an important role in stable binding to DNA substrates (Drelich et al. 1993; Vink et al. 1994a).

Key aspects of DNA substrate specificity are also determined by the core domain. The isolated core domain of HIV-1, comprising residues 50–212, can discriminate strongly against model substrates in which the phylogenetically invariant subterminal CA/TG base pairs are altered (J. Gerton and P.O. Brown, unpubl.). The amino acid residues that recognize this essential feature of the viral DNA substrate are therefore likely to be among the phylogenetically conserved residues of the core domain.

X-ray crystallography of the catalytic core of HIV-1 reveals that the two conserved aspartic acid residues are juxtaposed in the three-dimensional structure, and the conserved glutamic acid, although not well-resolved, appears constrained to be close by (Dyda et al. 1994). The analogous domain of ASV integrase shows an overall structure analogous to that of the HIV-1 integrase core domain. In the more recently determined structure of the ASV integrase core, all three of the conserved acidic residues are resolved (Bujacz et al. 1995, 1996). These conserved carboxylate groups are closely approximated, and the two aspartate residues participate in coordinating a divalent metal ion (Bujacz et al. 1996). The structure of the core domain of integrase closely resembles those of the RNase H domain of HIV-1 reverse transcriptase (Davies et al. 1991), the RuvC protein of Escherichia coli (an endonuclease that resolves Holliday junctions by cleaving parallel strands) (Ariyoshi et al. 1994), and the bacteriophage Mu transposase (Rice and Mizuuchi 1995; Rice et al. 1996)—all enzymes that catalyze substitution reactions involving phosphodiester bonds (Fig. 7).

Figure 7. Structure of the integrase core domain.

Figure 7

Structure of the integrase core domain. (A) Ribbon diagrams of the crystal structures comparing the core domains of integrase from HIV-1 (left; Dyda et al. 1994) and RSV (right; Bujacz et al. 1995). The catalytically important D,D,35,E residues are shown. (more...)

Carboxy-terminal Domain

The carboxy-terminal third of the integrase polypeptide is the region that shows the least sequence conservation among integrases (Johnson et al. 1986; Lutzke et al. 1994). This portion of the protein is required for both 3′-end processing and integration activity, but it is not essential for disintegration activity (Bushman et al. 1993; Vink et al. 1993; Bushman and Wang 1994; Kulkosky et al. 1995). The carboxy-terminal region has intrinsic DNA-binding activity, which can be recognized by a variety of assays, including nitrocellulose filter binding, UV crosslinking, and Southwestern blotting (Mumm and Grandgenett 1991; Woerner et al. 1992; Woerner and Marcus-Sekura 1993; Engelman et al. 1994; Lutzke et al. 1994). Interpretation of the DNA-binding activity of the carboxy-terminal region is complicated by the fact that integration involves two different DNA substrates, which have distinctly different roles in the reaction and have different structural requirements. Moreover, there are at least two distinct segments of the integrase polypeptide that can independently bind to DNA, with distinctly different binding properties. The isolated carboxy-terminal region (e.g., a fragment containing only amino acids 235–288 or 220–270 of HIV-1 integrase) does not depend on a metal ion for DNA binding and binds well to simple linear double-stranded DNA oligonucleotides (Vink et al. 1993; Engelman et al. 1994; Lutzke et al. 1994). Deletion derivatives of integrase lacking this carboxy-terminal region (e.g., a polypeptide comprising amino acids 1–186 of HIV-1 integrase) also have inherent DNA-binding activities, which can be manifested by their ability to catalyze disintegration (Bushman et al. 1993; Vink et al. 1993), as well as by their ability to be crosslinked to DNA by UV-irradiation (Engelman et al. 1994).

In contrast to the carboxy-terminal domain, the isolated catalytic core domain binds efficiently only to branched, “Y-mer” DNA substrates (see Fig. 5) that mimic the product of integration of a model viral DNA end into an oligonucleotide target (Engelman et al. 1994). Moreover, the DNA-binding activity of the core domain is strictly dependent on the presence of a divalent metal ion (Engelman et al. 1994).

The strict requirement for the carboxy-terminal domain in 3′-end processing and for binding to nonbranched DNA substrates suggests that it may contribute to binding the viral DNA end. However, there is currently no evidence that this binding site can distinguish between viral DNA and nonspecific DNA sequences and considerable evidence that its contribution to specificity in viral DNA recognition, if any, is minor (Katzman and Sudol 1995, J. Gerton and P.O. Brown, unpubl.). However, it may contribute to the sequence-insensitive binding to sites internal to the viral DNA ends, which is clearly important for activity (see below), and perhaps to target DNA binding.

The isolated carboxy-terminal domain of RSV integrase can also bind to single-stranded DNA or RNA (Mumm and Grandgenett 1991), raising the possibility that this domain could play a part in maintaining the association between integrase and the viral genome before and during viral DNA synthesis.

Three-dimensional Structure

Despite serious efforts for several years by several laboratories, definition of the atomic structure of a retroviral integrase was hampered by the very low solubility of the native integrases that have been studied to date, which impeded crystallization as well as the use of NMR spectroscopy. Although high-resolution structures of the core and carboxy-terminal domains have been determined, the relative organization of these two domains in the integrase protomer, the atomic structure of the amino-terminal domain, and the higher-order structure of the active integrase multimer remain to be solved. The identification of a mutation— the substitution of a lysine for a phenylalanine residue at position 185—that markedly enhanced solubility without a deleterious effect on catalytic activity allowed facile crystallization of the catalytic core domain of HIV-1 integrase (Dyda et al. 1994; Jenkins et al. 1995). Several striking features are apparent from this structure (Fig. 7) and that of the corresponding domain of ASV integrase (not shown) (Bujacz et al. 1995, 1996).

1.

As noted above, the highly conserved acidic residues implicated by phylogenetic conservation and genetic studies as critical components of the active site are in close proximity.

2.

The protein forms a dimer with twofold rotational symmetry, in which the putative active sites are separated by about 30 Å. This spacing is difficult to reconcile with the approximately 15 Å spacing of the phosphodiester bonds to which the viral DNA 3′ends are joined by integrase, assuming that the target DNA has a B-form double-helical structure. There are at least four possible explanations for this discrepancy. First, the two putative active sites of the dimeric form observed in the crystal do not correspond to the two active sites that mediate the joining of the two opposite ends of the viral DNA to host DNA. Instead, for example, one active site from each of two dimers might act on each of the two viral DNA ends. Second, the target DNA might deviate dramatically from the B-form structure when it is used as a target for integration, Third, the two ends of the viral DNA might not be joined to target DNA simultaneously, but rather in a stepwise process. Fourth, the dimeric structure seen in the integrase crystals may not represent the conformation of the enzyme that carries out integration.

3.

The surface of the enzyme displays large contiguous patches of predicted net positive charge, features often associated with DNA-binding sites.

4.

Many of the most phylogenetically conserved residues are clustered in the vicinity of the active site, suggesting possible roles in binding and orienting conserved features of the DNA substrate. The critical conserved acidic residues are better resolved in the more recently solved structure of the core domain of ASV integrase (Bujacz et al. 1995, 1996), in which they appear in close proximity, with the two aspartates coordinating a divalent metal ion (Bujacz et al. 1996).

5.

The three-dimensional structure of integrase closely resembles the structures of RNase H (Davies et al. 1991), RuvC (Ariyoshi et al. 1994), and the bacteriophage MuA transposase (Rice and Mizuuchi 1995; Rice et al. 1996) (Fig. 7B), suggesting that structural mechanistic insights from those related proteins will contribute to our understanding of the catalytic mechanism of integrase.

The solution structure of the carboxy-terminal domain of HIV-1 integrase has been solved by two groups using nuclear magnetic resonance (NMR) methods (Eijkelenboom et al. 1995; Lodi et al. 1995). This domain exists in solution as a dimer, consistent with its role in promoting integrase multimerization, and the folding topology of each protomer is that of a five- stranded β barrel, homologous to an SH3 domain (Eijkelenboom et al. 1995; Lodi et al. 1995).

Higher-order Structure

Direct physical measurements of purified integrase, as well as crosslinking experiments and in vitro complementation between defective variants of integrase provide compelling evidence that integrase is a multimeric enzyme (Grandgenett et al. 1978; Sherman and Fyfe 1990; Engleman et al. 1993; van Gent et al. 1993b; Vincent et al. 1993; Chow and Brown 1994a; Ellison et al. 1995). Results of in vitro complementation between variants of HIV-1 or HIV-2 integrase demonstrate that (1) the HHCC domain is required in trans to the protomer that contains the active site (Engleman et al. 1993; van Gent et al. 1993b; Ellison et al. 1995); the extreme carboxy-terminal region can function either in cis or in trans to the active site (Engleman et al. 1993; van Gent et al. 1993b); and at least some of residues (187–234) of the carboxy-terminal domain are required in cis to the active site (Engleman et al. 1993). The precise stoichiometry of the native integrase enzyme is not clear. Under many experimental conditions, the most abundant form of the HIV and RSV integrases appears to be a dimer (Sherman and Fyfe 1990; Jones et al. 1992; Vincent et al. 1993). Each active site, however, appears to be composed solely of residues from a single polypeptide chain, as no complementation has been seen between polypeptide chains with different point mutations in putative active site residues (van Gent et al. 1993a). The determinants of dimerization appear to map to the catalytic core domain (Engleman et al. 1993; Dyda et al. 1994; Hickman et al. 1994; Andrake and Skalka 1995) and the carboxy-terminal domain (Andrake and Skalka 1995), and both of these domains in isolation form dimers in solution (Andrake and Skalka 1995; Eijkelenboom et al. 1995; Lodi et al. 1995). The HHCC domain, in the presence of zinc, folds into a structure that promotes assembly of integrase dimers into tetramers (Zheng et al. 1996). Higher-order multimerization, leading to a rapidly sedimenting structure, occurs readily under the conditions required for activity, notably including the presence of a divalent metal ion (van Gent et al. 1991; Ellison et al. 1995). Two features of the HIV-1 integrase protein, the HHCC domain of one protomer and an N-ethylmaleimide (NEM)-sensitive site in the catalytic core domain of a second protomer, participate in a divalent cation-dependent assembly of integrase dimers into the active multimeric form (Ellison et al. 1995). Although neither of these features is required for dimerization (Engleman et al. 1993; Dyda et al. 1994; Hickman et al. 1994), or for binding and juxtaposition of two viral DNA ends (Chow and Brown 1994a), both are required for assembly of stable complexes between integrase and viral DNA ends, suggesting that higher-order multimerization is a prerequisite for assembly of this stable complex, perhaps serving to extend the viral DNA-binding site (Ellison et al. 1995).

Integrase multimerization can occur in the absence of a DNA substrate, in contrast to the Mu transposase, whose multimerization is ordinarily dependent on specific interactions with substrate DNA (Baker and Mizuuchi 1992). The possibility remains open, however, that DNA may influence the multimerization process or alter the structure of the integrase multimers.