HIV chromatin is a preferred target for drugs that bind in the DNA minor groove

The HIV genome is rich in A but not G or U and deficient in C. This nucleotide bias controls HIV phenotype by determining the highly unusual composition of all major HIV proteins. The bias is also responsible for the high frequency of narrow DNA minor groove sites in the double-stranded HIV genome as compared to cellular protein coding sequences and the bulk of the human genome. Since drugs that bind in the DNA minor groove disrupt nucleosomes on sequences that contain closely spaced oligo-A tracts which are prevalent in HIV DNA because of its bias, it was of interest to determine if these drugs exert this selective inhibitory effect on HIV chromatin. To test this possibility, nucleosomes were reconstituted onto five double-stranded DNA fragments from the HIV-1 pol gene in the presence and in the absence of several minor groove binding drugs (MGBDs). The results demonstrated that the MGBDs inhibited the assembly of nucleosomes onto all of the HIV-1 segments in a manner that was proportional to the A-bias, but had no detectable effect on the formation of nucleosomes on control cloned fragments or genomic DNA from chicken and human. Nucleosomes preassembled onto HIV DNA were also preferentially destabilized by the drugs as evidenced by enhanced nuclease accessibility in physiological ionic strength and by the preferential loss of the histone octamer in hyper-physiological salt solutions. The drugs also selectively disrupted HIV-containing nucleosomes in yeast as revealed by enhanced nuclease accessibility of the in vivo assembled HIV chromatin and reductions in superhelical densities of plasmid chromatin containing HIV sequences. A comparison of these results to the density of A-tracts in the HIV genome indicates that a large fraction of the nucleosomes that make up HIV chromatin should be preferred in vitro targets for the MGBDs. These results show that the MGBDs preferentially disrupt HIV-1 chromatin in vitro and in vivo and raise the possibility that non-toxic derivatives of certain MGBDs might serve as a novel class of anti-HIV agents.

The HIV genome is rich in A but not G or U and deficient in C. This nucleotide bias controls HIV phenotype by determining the highly unusual composition of all major HIV proteins. The bias is also responsible for the high frequency of narrow DNA minor groove sites in the double-stranded HIV genome as compared to cellular protein coding sequences and the bulk of the human genome. Since drugs that bind in the DNA minor groove disrupt nucleosomes on sequences that contain closely spaced oligo-A tracts which are prevalent in HIV DNA because of its bias, it was of interest to determine if these drugs exert this selective inhibitory effect on HIV chromatin. To test this possibility, nucleosomes were reconstituted onto five double-stranded DNA fragments from the HIV-1 pol gene in the presence and in the absence of several minor groove binding drugs (MGBDs). The results demonstrated that the MGBDs inhibited the assembly of nucleosomes onto all of the HIV-1 segments in a manner that was proportional to the A-bias, but had no detectable effect on the formation of nucleosomes on control cloned fragments or genomic DNA from chicken and human. Nucleosomes preassembled onto HIV DNA were also preferentially destabilized by the drugs as evidenced by enhanced nuclease accessibility in physiological ionic strength and by the preferential loss of the histone octamer in hyper-physiological salt solutions. The drugs also selectively disrupted HIV-containing nucleosomes in yeast as revealed by enhanced nuclease accessibility of the in vivo assembled HIV chromatin and reductions in superhelical densities of plasmid chromatin containing HIV sequences. A comparison of these results to the density of A-tracts in the HIV genome indicates that a large fraction of the nucleosomes that make up HIV chromatin should be preferred in vitro targets for the MGBDs. These results show that the MGBDs preferentially disrupt HIV-1 chromatin in vitro and in vivo and raise the possibility that non-toxic derivatives of certain MGBDs might serve as a novel class of anti-HIV agents. PLOS

Introduction
The tremendous genetic variation of HIV-1 is responsible for the appearance of drug and antibody resistant forms of HIV-1 that appear during infection, and this genetic swarm is a major obstacle to the treatment of AIDS and to the development of an HIV vaccine. Viral genetic heterogeneity also produces variation in virulence, replication rates, cell tropisms and other properties [1,2]. A drug designed to combat HIV should optimally target a viral feature that is conserved in this heterogeneous viral population, is critical for viral survival, acts on the integrated proviral genome, and is not found or is significantly depleted in the host cell. Although substantial progress has been made during the past three decades in the development of anti-HIV drugs which has culminated in the use of combination antiviral drug therapy for treatment of viremia, HIV-drug-resistant mutants have been found for all nucleoside analogs and nonnucleoside and protease inhibitors used in the treatment of HIV-infected individuals. The emergence of multidrug-resistant variants of HIV-1 is also a concern for the treatment of the disease in the future [3,4].
The major challenge in current HIV research is the development of methods to eliminate the replication competent provirus that has integrated as a double stranded DNA molecule within the host cell chromatin. This viral reservoir is resistant to the host's immune system and is resistant to the current drugs used to treat viremia. Upon session of antiviral drug therapy, competent proviral sequences are activated giving rise to new rounds of replication and the resumption of active infection. Little progress has been made during the past to eliminate or inactivate the replication competent provirus and there is no universal agreement as to the best strategies that should be used to approach the problem in the future [5,6].
The HIV genome and the genomes of other lentiviruses display an unusual variation in nucleotide composition being deficient in C but not G and rich in A but not U [7,8]. We have previously shown that this nucleotide compositional bias is responsible for the unusual composition of HIV proteins which are rich in the polar amino acids encoded by A-rich codons and depleted in amino acids encoded by C-rich codons [8]. The extreme amino acid compositions are characteristic of all major proteins of the viruses including the polypeptides that make up the hypervariable viral envelope and the conserved pol region that codes for reverse transcriptase and integrase. The global nature of these effects makes it likely that the variation in protein composition caused by the biased nucleotide frequencies is an important factor in determining the characteristic phenotype of HIV [8]. The A-bias may also be a driving force that promotes enhance genetic variation of HIV and to increase the DNA curvature of the integrated viral genome that is caused by oligo A-tracts arranged in a 10 bp periodicity [8,9]. As a group, these observations suggest that the conserved A-bias of the HIV genome is critical for viral phenotype and thus the overall A-bias may represent a novel and fixed target for anti-HIV agents.
The minor groove binding drugs (MGBDs) are of interest in clinical medicine because of their antiviral, antimicrobial and antitumor activities [10][11][12]. These ligands preferentially bind to AT-rich DNA sequences that are 4 or more bp in length. The biologically important targets of these drugs are poorly understood although many studies have focused on their abilities to inhibit the interaction of regulatory proteins to short oligonucleotide-length AT-rich sequences. However, the importance of these targets in dictating the biologically relevant targets of these drugs is questionable because of the high frequency of isolated AT-rich oligonucleotide length sequences in the human genome. We previously reported a novel specificity of the MGBDs that requires much longer regions of DNA containing multiple closely spaced oligo A-tracts [13,14]. It was demonstrated that the drugs inhibited the assembly of such unusual DNA segments that are at least 100 bp long into nucleosomes. This inhibitory action may be relevant to HIV since the entire viral genome is A-rich and consequently nucleosomes containing HIV sequences may be targets for the MGBDs that are preferred over host sequences. The studies described in this report were designed to evaluate this possibility.

Bioinformatics studies
For the bioinformatics studies, sequences and gene coordinates for the HIV-1 and HTLV-1 genomes, linked to GenBank accession numbers AF033819.3 and J02029.1, respectively, were acquired from NCBI. These sequences were used as input into the R software package, DNA-shapeR [15], to obtain predictions for their minor groove widths (MGWs) at bp resolution (S1 and S2 Files). MGWs for the hg19 version of the human genome were obtained from the GBshape database [16]. MGWs for all pentamers were determined by inputting all possible 4,194,304 11-mer DNA sequences into DNAshapeR. Regardless of surrounding sequence context, all central pentamers and their reverse complements possessed the same predicted MGWs (S3 File).
Profiles of AnTm occurrence and narrow MGW frequency were computed using in-house Perl scripts, and all plotted values represent the centers of 201 or 1001 bp sliding windows. For Fig 1A and  For Fig 2F, the numbers of narrow MGWs were computed in all 201 bp sliding windows across the human genome, human cDNA, and the HIV pol gene using an MGW cutoff of 3 Angstroms. Note that for the whole human genome, 100 bp steps were used while 1 bp steps were used for human cDNA and the HIV pol gene. For Fig 2G, the center locations of all nonoverlapping AnTm sequences in the HIV-1 and human genomes were determined, where n + m > = 4 (S4 and S5 Files). In other words, an AnTm sequence like AAAATTTT would only be counted as one AnTm sequence, and the centers of AnTm sequences would define their locations. For odd-length AnTm sequences, the central nucleotide would designate its center location, and for even-length AnTm sequences, the internal central position was equal to its length divided by 2. After the center locations of all non-overlapping AnTm sequences were determined, the numbers of AnTm sequences were computed in all 201 bp sliding windows in 1 bp steps across the human genome, human cDNA, and the HIV pol gene. At the same time, the pairwise distances between non-overlapping AnTm sequences were also computed.
For Fig 2F and 2G, the significance of the differences in the fractional distributions were evaluated by Kolmogorov-Smirnov tests in R or by traditional methods. For Fig 2F, narrow MGW frequencies in all 201 bp windows for the 3 reference sequences were input into R, and the "ks.test" function revealed high D-statistic values and P values less than 2.2e-16 for all comparisons. For Fig 2G, D-statistics were determined by calculating the maximum differences between corresponding cumulative distribution functions and were compared to D-critical values. D-critical values were calculated using the KS method's empirical formula, which incorporates sample sizes and a coefficient that is a function of selected α value. For all comparisons, the results showed that the D-statistics were considerably greater than the D-critical values ( Fig 2G). We opted to use this traditional method because the AnTm dataset size for the human genome was too large for input in R as 1 bp sliding windows steps were used.  Fig 2H, the frequencies of non-overlapping AnTm sequences were normalized by the ratio of the total number of non-overlapping AnTm sequences in the reference sequence over the length of the reference sequence. For example, for the human genome AnTm profile in Fig  2H, the AnTm occurrences at distances between +/-4 to +/-100 bps were divided by (898610 37 / 3095677412). Note that the smallest possible distance between two non-overlapping AnTm sequences (n + m > = 4) is 4 bps. However, we included a value of 1 for the distance at 0 in the plots.

DNA preparation and characterization
DNA preparation and characterization of Plasmid PHRT25 [17] containing the HIV-1 pol gene from isolate BH-5 was used as a template in the PCR to amplify HIV fragments 1-5 studied in The amplified fragment was then cut at NTs 1973 and 4004 with EcoRI to give rise to the 2 KB fragment which was inserted into the EcoRI site of plasmid P-2 as described below.
Two parent yeast shuttle vectors were used. P-1 is a pUC18 derivative containing yeast ARS 1 in the EcoRI site and the URA 3 gene in the HindIII site [18]. P-2 is a pUC18 derivative with a 627 bp fragment from yeast ARS 2 in the Xho I site and the URA 3 gene in the HindIII site. HIV fragment 1 was blunt ended into the SmaI site of P-1 to give rise to P-1-H1 or inserted into the EcoRI site of P-2 to give P-2-H1. The 5 S rDNA sequence was inserted into the EcoRI site of P-2 to give P-2-5S rDNA. The 2KB PCR amplified EcoRI fragment from HIV pol (1973-4004 bp) was inserted into the EcoRI site of P-2 to give P-2-H-2 KB. To generate low copy number plasmids in yeast, a 289 bp Bam H1 fragment containing yeast CEN III was inserted into the Bam H1 site of each of the plasmids in the P-2 series. DNA size markers consisting of 20bp, 100bp and 1 KB ladder sequences were end-labeled and used to calibrate the gels in Figs 4 and 5.
In vitro studies with synthetic (Syn # 61 and 67) and natural (5S rRNA gene from sea urchin) control fragments were prepared as described previously [19]. Genomic DNAs from chicken erythrocytes and HeLa cells were digested with a mixture of HaeIII and Hinf I and fragments which comigrated with 200 bp standards were eluted from 1.4% agarose gels and used for the studies in Fig 3A and Fig 4B. DNA fragments were uniformly labeled with [alpha-32P] dATP or uniquely end-labeled by using one 32P-end-labeled primer in the PCR. Genomic DNAs were end-labeled by standard procedures. DNA fragments were purified by densities displayed by HIV-1 pol fragment 5, respectfully (Table 2). B) AnTm and select TA tetranucleotide frequencies were computed within 1001 bp sliding windows along the HIV-1 genome. C) The percentages of the 16 AT tetranucleotides and AnTm sequences were computed within 201 bp segments surrounding the HIV-1 pol gene, and the positions of the 5 HIV-1 pol fragments used in biochemical studies are displayed. D-F) Occurrences of all AT tetranucleotide sequences in the coding strand of the HIV-1 DNA genome (F), the HIV pol region (E), and the five pol fragments (F) described in Table 2 are plotted against the tetranucleotide percentage of A in each region. The numbers adjacent to the tetranucleotide sequences in Fig 1D and 1E are the ratios of the occurrences in the natural to the randomized HIV sequences. https://doi.org/10.1371/journal.pone.0216515.g001

In Vitro Minor Groove Binding Drug (MGBD) studies
MGBDs described in Table 1 were from Sigma and drug solutions in 10 mM Tris-HCl (pH 7.5) containing 10 mM NaCl were stored in the dark at -70˚C. Nucleosomes were reconstituted onto the DNA fragments by the standard exchange-salt dilution method (1.0 M NaCl to 0.1 M NaCl) using H1/H5 deficient nucleosomes from chicken erythrocytes as a source of core histones [19]. For translational mapping studies with HIV fragment 1, the reconstituted nucleosome was digested with exonuclease III and analyzed as detailed previously [19]. For the restriction enzyme studies in Fig 4B, digestions were carried out for 40 min at 37˚C in 10 mM MgCl2 containing 40 mM NaCl and the fragments analyzed on 8% native PA gels (S6 and S7 Files). The nitrocellulose filter binding assay was carried out as described previously [20] using the NaCl concentrations for sample application and washing that were the same as those used in Fig 3C. The relative binding affinities of DAPI for HIV and control DNAs were determined by fluorescence-enhancement at 465 nM as detailed previously [13].

In Vivo MGBD studies in yeast
Studies using yeast-S. cerevisiae strain J17 (MAT a,his2,ade1,trip1,met14,ura3-52) was used as the host for plasmid transformation. The isogenic derivative containing the rad 52 deletion was obtained from K.S. Bloom [21]. Plasmid DNAs were used to transform yeast to uracil prototrophy and transformed cells were grown at 37˚C in synthetic medium lacking uracil [18]. For chromatin studies, cultures (100-200 ml) were diluted to 0.2 A 660/ml, drugs were added as indicated and the cells grown overnight to late log phase corresponding to an A660/ml of 1.3-2.0. DAPI (4 uM) reduced growth rates by approximately 20% while concentrations > 10 uM caused severe growth retardation. Berenil at concentrations up to 150 uM was without effect on the generation time. Spheroplasts were isolated essentially as described by Kent et al. [22] except Lyticase (Sigma) was used in place of Glusulase and spermidine was omitted from the MNase digestion buffer. Spheroplasts were permeabilized with NP-40 and digested with MNase at 37˚C. for 6 minutes. For indirect-end labeling, DNA obtained from MNase digested spheroplasts was secondarily digested with Ava II and fragments were electrophoresed on 1.0% agarose gels, transferred to Zeta probe membranes (Bio-Rad) in 0.4 M NaOH and the membranes hybridized with multiprime labeled 200-300 bp probes that abutted the Ava II restriction sites. Gels were calibrated using plasmid restriction fragments which are not shown in the figures. For the experiments in the top panels of Fig 5A-5I, blots were probed with HIV sequences or with a 190 bp Msp1 fragment from pUC18 which is located 734 bp from the HIV DNA. Blots from chloroquine gels were probed with pUC18. For analysis of DNA topoisomers, DNA was prepared from 10 ml cultures as described previously [18] and electrophoresed on 0.9-1.1% agarose gels in Tris-phosphate-EDTA in the presence of 1.5 ug/ml of pentamers with and without TA steps and for G+C pentamers are displayed in a level plot. C) Occurrences of AnTm sequences in 1001 bp sliding windows are plotted along the HIV-1 and HTLV-1 genomes. D) Narrow MGW frequencies in 1001 bp sliding windows are plotted along the HIV-1 and HTLV-1 genomes, and the average narrow MGW frequency in human cDNA in 1001 bp sliding windows was also plotted. E) Narrow MGW frequencies in 201 bp sliding windows were plotted along the HIV-1 and HTLV-1 genomes, and the average narrow MGW frequencies in human cDNA and genomic DNA in 201 bp sliding windows were also plotted. F) The fractional distributions of narrow MGW frequencies in 201 bp sliding windows were plotted for the human genome, human cDNA, and the HIV-1 pol gene. G) The fractional distributions of non-overlapping AnTm sequences in 201 bp sliding windows were plotted for the human genome, human cDNA, and the HIV-1 pol gene. H) The pairwise distances between the centers of non-overlapping AnTm sequences were computed for every AnTm sequence in the HIV-1 pol gene and in the human genome.
https://doi.org/10.1371/journal.pone.0216515.g002 chloroquine [23]. Under these conditions, the bulk of the topoisomers are negatively supercoiled. Values for drug-induced loss of superhelical turns similar to those given in Fig 5J were obtained when 90 ug/ml chloroquine was used. Under these conditions, the bulk of the  Fig 3A. C) Selective destabilization by DAPI on nucleosomes preassembled on HIV fragments were quantified. Labeled HIV fragments 1-5, synthetic fragments 61 and 67, the 5 S rDNA sequence, and genomic DNA from chicken were assembled into nucleosomes as in Fig 3A and then incubated in the absence and in the presence of 10 uM DAPI at the indicated NaCl concentrations for 1 hour at 37˚C. The percentage of the nucleosomes that were lost in the presence of the drug was then determined following electrophoretic analysis. With all sequences, more than 90% of the nucleosomes were retained in the absence of drug in all salt levels.
https://doi.org/10.1371/journal.pone.0216515.g003 topoisomers were positively supercoiled, reflecting the differences in negative supercoiling [24]. Plasmid copy number was determined as described previously [18]. Densitometric scans from multiple exposures of films were used to obtain quantitative estimates of band intensities for result in

Patterns of DNA sequence and shape in the HIV-1 genome
There are several characteristics of the HIV-1 genome which should make it a preferred target for the MGBDs. As shown in Fig 1A, approximately one fourth of the nucleotides in the genome consist of AT tetranucleotides, the minimal binding-site size for most of the MGBDs. The AT tetranucleotide content is nearly 2-fold greater than that of the human genome (24.2% versus 13.5%). The coding strands of these AT oligonucleotides (> = 4 bp) are A-rich (59.9% A) reflecting the compositional bias of the genome [7,8]. The AT tetranucleotide and the AnTm tetranucleotide frequencies (n + m = 4, no TA steps) as well as the A-bias are strongest in the pol and env regions and are reduced only in regions of overlapping reading frames and in the long terminal repeats (Fig 1A & 1B, Table 2). The pol and env genes (within regions absent of overlapping reading frames) display a high density of AT sites. On average, the distances between the centers and between the starts and ends of the AT tracts (> = 4 bp) are~18.8 bp and~12.8 bp, respectively. There are multiple nucleosome-length stretches in pol and env where more than 40% of their sequences are composed of AT oligonucleotides (Fig 1A and 1C), which corresponds to even tighter spacing between AT tracts. Nucleosomes that contain such high levels of AT oligonucleotides should be drug sensitive provided that the AT sequences have narrow minor grooves [13,14]. Taken together, these results imply that the A-bias favors a high frequency of potential drug binding sites in the HIV genome.
The occurrences of all AT tetranucleotide sequences are shown for the HIV-1 genome ( Fig  1D), for the HIV pol region ( Fig 1E) and also for five short HIV pol fragments (Fig 1F), which were used in biochemical studies described below. The effect of the genomic bias on the composition is clearly evident with the level of homopolymeric tracts of A being about 3-fold and 15-fold greater than those of G and C, respectively (Fig 1D-1F). The underlined sequences lack a TA step, and these AnTm tetranucleotide sequences are the most prevalent in the mixed AT sequences at each percentage of A in the three sequence sets (Fig 1D-1F). Each of these AnTm sequences is also overrepresented in HIV DNA relative to randomized HIV DNAs as shown by the numbers in Fig 1D and 1E, which represent the ratios of the occurrences in the natural to the randomized sequences.
These findings may be relevant since the TA step widens the minor groove of AT sites [25][26][27][28], and the compressed groove is required for high affinity drug binding and nucleosome disruption by the MGBDs [10,13,29,30]. In contrast, most of the sequences with TA dinucleotides are underrepresented in HIV DNA relative to the shuffled sequences, and this underrepresentation is particularly pronounced in the tetranucleotide with two TA steps and those with a 1-H1 (A), plasmid P-2-H1 (B) or plasmid P-2-H-2KB (C-E) were grown for about 20 hours in the absence (-) and in the presence (+) of 4 uM DAPI. Permeabilized spheroplasts were digested with MNase (100 and 350 Units/ml) and the digested DNA was electrophoresed, blotted and probed with HIV DNAs (A-C), PUC18 (D) or run on a gel stained with ethidium bromide (E). F-I) Indirect end-labeling analysis was performed on MNase digests obtained from cells that carried the longer 2 KB HIV segment on P-2-H2KB. The map of the region indicates the positions of the AvaII sites (arrows), hybridization probes, pUC18 DNA (single line), the 2 KB HIV segment and a portion of the URA3 gene. HIV fragments 1-4 are contained within this 2 KB segment. Chromatin samples from control (-) and DAPI treated (+) cells were digested with increasing amounts of MNase (50, 100 and 200 U/ml). DNA was then digested with AvaII, electrophoresed on 1% gels and blots probed with the URA 3 probe (F) and the pUC18 probe (G). Chromatin samples from control (-) and berenil treated (+) cells were digested with MNase (100 U/ml) and the blots probed with the URA 3 probe (H) and the pUC18 probe (I). Naked DNAs carried through the procedure are indicated. The approximate gel positions of the 2 KB HIV segment are indicated by the vertical brackets. J) Cells that carried the indicated plasmids were grown in the presence (+) and in the absence (-) of 4 uM DAPI for 18 hours and the distributions of topoisomers were analyzed in chloroquine-agarose gels as described in Materials and Methods. The DNA at the top of the gels, which is particularly prevalent in the P2-H2KB +DAPI lane, represent relaxed/nick circular plasmid [21,32,33]  single TA step in the central region (Fig 1D-1F). The relative occurrences of the sequences that make up the AT sites with 50% A were AATT >> ATTA > TTAA~TAAT > ATAT > TATA (Fig 1D and 1E). This order is inversely related to the DNA minor groove width of these sequences [25][26][27][28] and parallels MGBD binding affinity [13,29,30] and the effectiveness of the drugs for disrupting nucleosomes on DNA that is rich in AT-sites [13,14]. The overrepresentation of AAAA, AATT and other AnTm sequences that lack TA dinucleotides also follow the occurrences of the AT sites along the viral genome while the AT tetranucleotides with TA steps display occurrence patterns that are lower and more uniform throughout the viral genome ( Fig 1B). Consequently, the segments containing clustered AT sites in pol and env of HIV should be especially sensitive to MGBDs. Table 2 gives additional characteristics for the five DNA fragments from HIV pol used in the biochemical studies, and these characteristics are graphically illustrated in Fig 1C. Fig 1C  shows that the five HIV pol fragments are enriched in AT tetranucleotides and AnTm sequences relative to the entire HIV-1 genome. However, considerable variation exists in AT tetranucleotide and AnTm content amongst the five fragments ( Fig 1C, Table 2). These fragments extend from the beginning of reverse transcriptase coding region into the viral integrase gene (Fig 1C), and these~200 bp fragments have a high density of oligonucleotide AT tracts. On average,~13.8 oligonucleotide AT tracts that are~5.6 bp in length comprise~37.9% of the nucleotides among the five HIV-1 pol fragments (Fig 1C, Table 2). Additionally, in these fragments, the frequency of the all oligonucleotide AT tracts (> = 4 bp) is about 6-fold greater than the frequency of all oligonucleotide GC tracts (> = 4 bp) as the GC tracts make up onlỹ 5.9% of the nucleotides in the fragments on average ( Table 2). The fragments also show an enrichment of AnTm tracts, AT sites that are not interrupted by TA steps. Although AnTm tetranucleotides represent only 5 of all 16 AT tetranucleotides,~69.6% of the AT sites in the fragments consist of AnTm tracts (Fig 1C, Table 2).
In order to directly relate the DNA sequence patterns of HIV to the structure of the DNA minor groove, we performed a systematic and comprehensive bioinformatic investigation to further our understanding of the relationship between DNA sequence and minor groove width. The biophysical parameters of dinucleotides such AA, AT, and TA are influenced by the surrounding sequence context of DNA. As a result, models based on longer DNA sequences have been proposed to more accurately estimate the shape of the DNA helix in 3-dimensional space. In one particular model, a combination of computational and experimental measurements was used to predict the shapes of all 512 unique pentamer sequences in terms of roll, propeller twist, helical twist, and minor groove width (MGW) [23,24]. We utilized this pentamer model to predict the MGWs along the HIV-1 and HTLV-1 genomes and also leveraged the model's predicted MGWs across the entire human genome.

Basic Sequence Information AT-Sites (> = 4 bp) AnTm-Sites (n +m > = 4 bp) GC-Sites (> = 4 bp)
HIV In order to illustrate how this model predicts AnTm sequences (n + m = 5, A+T pentamers without TA steps) to be associated with significantly narrow minor grooves, we plotted the cumulative fraction of all 512 pentamers as a function of decreasing MGW (Fig 2A). This plot shows that all 3 unique AnTm pentamer sequences are among the top 5 (or 1%) of all pentamers with the narrowest minor grooves. Notably, A+T pentamers with TA steps and G+C pentamers are characterized by much wider minor grooves (Fig 2B).
We then sought to compare the AnTm content and narrow MGW frequency between the HIV-1 and HTLV-1 genomes, which have similar sequence lengths. Fig 2C shows greater AnTm occurrence along the HIV-1 genome as compared to HTLV-1 in 1001 bp sliding windows. The most considerable increases in AnTm content in HIV-1 over HTLV-1 appear in the gag, pol, and env genes, especially outside of overlapping reading frames among other HIV-1 genes. In parallel, similar patterns of elevated narrow MGW frequency occur along the HIV-1 genome relative to not only HTLV-1 but also to the average narrow MGW frequency in human cDNA (Fig 2D).
In order to assess the patterns of narrow MGW frequency in HIV-1 at the scale of nucleosomal DNA, we executed the same analysis conducted in Fig 2D using 201 bp sliding windows instead of 1001 bp ones (Fig 2E). The results from this analysis revealed several hot spots of high narrow MGW density in HIV-1, which are virtually absent in HTLV-1. Moreover, several sections of HIV-1 are observed to occupy 2-8-fold increases in narrow MGW frequency at the nucleosome level over human genomic DNA (Fig 2E). Among the several spikes of AnTm and narrow MGW frequency along the HIV-1 genome, the most striking examples are observed in the HIV-1 pol gene with the largest spike in narrow MGW density encompassing HIV fragment 1 (Fig 2C-2E). Subsequently, we computed the fractional distributions of narrow MGW frequency and AnTm frequency in 201 bp windows for the HIV-1 pol gene, human genomic DNA, and human coding DNA (Fig 2F and 2G). As expected, these distributions indicated that the HIV-1 pol gene encompasses greater narrow MGW and AnTm density than human genomic DNA, which showed even greater concentration of these features over human coding DNA. Indeed, Kolmogorov-Smirnov tests on corresponding empirical cumulative distribution functions showed that the differences between these distributions were statistically significant.
After examining AnTm frequency and narrow MGW density in HIV-1, we investigated the spacing of AnTm sequences, and the results revealed extraordinarily organized spacing along the HIV-1 pol gene as compared to the bulk of the human genome ( Fig 2H). Notably, resulting from an accumulation of AnTm pairwise distances, strong~10 bp periodic patterns can be detected in AnTm spacing along the HIV-1 pol gene, which can be inferred by the presence of peaks near positions +/-20, +/-30, +/-40, and +/-50 in Fig 2H. In contrast, this analysis of AnTm spacing also shows that AnTm centers are often separated by a distance of 4-5 bps ( Fig  2H). Upon closer inspection of the HIV-1 pol sequence, one can identify several examples of short AnTm sequences that are either separated by TA steps or interrupted by short C/G-containing mono and di nucleotides (data not shown).

Effects of Minor Groove Binding Drugs on HIV-1 chromatin
Since MGBDs inhibit the assembly of nucleosomes on sequences with multiple closely spaced oligo-AT sequences with narrow minor grooves which serve as drug binding sites [13,14], it was of interest to determine if the drugs exert their selective inhibitory action on HIV chromatin. To study this possibility, nucleosomes were reconstituted onto the five fragments in the presence and absence of several MGBDs. Free DNA was then separated from nucleosome DNA by electrophoresis on polyacrylamide-glycerol gels. A sample of the data is shown in Fig  3A, and the results of several similar experiments are summarized in Fig 3B. The studies demonstrated that the MGBDs inhibited the assembly of nucleosomes onto all five of the HIV segments. The marked specificity of the drugs for HIV DNA is illustrated by the results in Fig  3A which show that these ligands had no detectable effects on the assembly of nucleosomes onto a control fragment and genomic DNA. For all HIV fragments, the relative order of inhibition was DAPI (4,6-diamidino-2-phenylindole) > distamycin > Hoechst 22358 > berenil >> pentamidine. For all drugs, the relative order of inhibition was HIV fragment 1 > 2, 3, 4 > 5. This order parallels the density of AT sites > 4 bp as can be seen from the data in Table 2. This finding is in agreement with our previous studies which demonstrated that drug sensitivity is directly related to the frequency of AT sites in nucleosome-length DNA [13,14]. Selective inhibition of nucleosome formation was seen with HIV fragment 5 which contains 29% of nucleotides in AT sites. As indicated by the dotted line in Fig 1A, approximately 65% of all of the possible 200 bp segments in HIV pol and 35% of all of the possible 200 bp segments in the entire viral genome have AT site densities that are equal to or greater than this level. These same sites also tend to be enriched in AT sequences without TA steps as shown by the example given in Fig 1B. These observations suggest that a relatively large fraction of the nucleosomes along HIV chromatin should be a preferred target for MGBDs.
Previous studies have shown that the MGBDs promoted the destabilization of nucleosomes that were pre-assembled onto DNA molecules rich in AT sites and this effect was most pronounced at high ionic strength [13,14]. This approach was taken in the present study to determine if MGBDs destabilize nucleosomes assembled onto the five HIV fragments. Fig 3C shows the fate of preassembled nucleosomes containing the five HIV sequences and several control DNAs when they were incubated for 1 hour at 37˚C in 0.1-0.5 M NaCl in the presence and in the absence of DAPI. Nucleosomes assembled onto the HIV DNAs were preferentially destabilized by the drug and the effect was most apparent in 0.5 M NaCl where up to 95% of the histone octamers were lost from DNA. The relative order of drug-induced nucleosome loss (Fragment 1 > 2, 3, 4 > 5) was the same as the order for drug-induced inhibition of nucleosome assembly, and density of AT sites as can be seen by comparing the results in Table 2 and Fig 3B and 3C. The template selectivity is also clearly evident from the results since octamers were retained on all control fragments and genomic DNA in all salt levels.
Natural nucleosome positioning sequences are enriched in AnTm sequences which are arranged to give rise to two regions of curvature of 40-60 bp separated by a non-curved segment of 30-40 bp [31]. Similarly, HIV fragment 1 is highly enriched in these AnTm sequences and displays two curved regions, and was therefore, previously predicted to position an assembled nucleosome onto a single site [9]. This prediction of HIV fragment 1's translational positioning ability was supported by the exonuclease III digestion studies shown in Fig 4A and enabled us to study the effects of DAPI on restriction nuclease accessibility. The central position of this nucleosome along with the positions of restriction sites for HaeIII and RsaI is shown in the map at the bottom of Fig 4B. In order to see if a MGBD destabilizes this nucleosome in low ionic strength, a preassembled nucleosome containing this sequence was incubated for 40 min at 37˚C in 40 mM NaCl in the presence and in the absence of DAPI. Under these conditions, no more than 5% of the DNA was rendered histone-free by drug treatment. Restriction enzymes were then added and the reaction terminated after an additional 40 min. In the absence of the drug, HaeIII site 1 which lies outside of the major positioned core was cleaved extensively while HaeIII site 2 and the RsaI sites were more protected from enzyme cleavage. The analysis demonstrated that DAPI increased the extent of cleavage of these internal nucleosome sites by 3-8-fold (Fig 4B).
The above studies established the selective inhibitory effects of the MGBDs on HIV-1 containing nucleosomes in vitro. To provide a first step to evaluate the effects of the drugs on HIV chromatin in vivo, HIV segments were cloned into yeast shuttle vectors and cells transformed with the plasmids were grown for about 20 hours in the absence and in the presence of 4 uM DAPI. Permeabilized spheroplasts were digested with micrococcal nuclease (MNase) and the digested DNA was electrophoresed, blotted and probed with HIV DNAs. The MNase ladder observed in the absence of drug is indicative of regularly spaced nucleosomes and the definition of the nucleosome repeat was reduced by DAPI as seen in Fig 5A-5I. Reduction in the resolution of the ladders was seen with the chromatin assembled onto the 199 bp HIV fragment 1 in two different sequence contexts (Fig 5A and 5B). Partial but not complete loss was noted in the chromatin assembled onto the longer 2 KB HIV segment, and this reduction was more clearly seen in the minimally digested samples (Fig 5C). This result is expected since some but not all of the nucleosomes assembled on this long sequence are expected to be drug sensitive. The drug had little effect on the nucleosome ladders containing bulk yeast DNA and little or no effect on the nucleosome profile of sequences from the blot probed with a segment of pUC18 (Fig 5D and 5E). The reduction in the nucleosomal repeat in Fig 5A-5C is interpreted as representative of either the preferential loss or disruption of nucleosomes along the HIV sequences.
To further characterize this effect, indirect end-labeling analysis was performed on MNase digests obtained from control and DAPI treated cells that carried the longer HIV segment ( Fig  5F and 5G). In the absence of drug, the HIV sequences resided within a nuclease protected region which is nuclease sensitive in naked DNA. DAPI enhanced the nuclease cutting of the HIV sequences in chromatin rendering the digestion pattern similar, but not identical, to that seen in the protein-free samples. The cutting pattern of sequences adjacent to the HIV segment were similar in the presence and in the absence of the DAPI providing evidence for the template selectivity of this drug action. Essentially the same results were seen in experiments where 60uM berenil was used in place of DAPI (Fig 5H and 5I).
In order to provide an alternative strategy to study the effects of the MGBDs on the in vivo assembled chromatin, DNA topoisomers from drug-treated and control cells were resolved in agarose gels containing chloroquine (Fig 5J). DAPI promoted a topological change toward a loss of superhelical turns of the HIV-containing plasmids while having little or no effect on control plasmids that lacked HIV sequences. Such a reduction is most often interpreted as a loss of nucleosomes since each nucleosome induces a single negative superhelical turn in a closed circular DNA molecule [21,32,33]. Densitometric scans of these and other autoradiograms indicated an average loss of 1-2 nucleosomes on the plasmids with the~200 bp HIV sequence 1 and up to 4-6 nucleosomes on the plasmid containing the larger segment of HIV DNA as seen in the histogram in Fig 5J. The value for the larger segment is significantly less than the 13 nucleosomes that should be present on this sequence in the absence of drug which implies that a fraction of the nucleosomes on this HIV segment are drug resistant. This result is consistent with the partial loss of the nucleosome ladder induced by drug treatment (Fig 5C) and with calculations of the AT-tract densities of 200 bp segments that make up this fragment which indicate that only about 2/3 of the nucleosomes in this region of HIV should display detectable drug sensitivity (Fig 1A). The results shown in Fig 5J (bottom panel) also demonstrate that DAPI was several fold more effective than berenil at inducing nucleosome loss in vivo in agreement with the relative potencies of these drugs determined by the in vitro studies shown in Fig 3B and 3C.
Since DNA that is devoid of nucleosome is hypersensitive to nucleases [34,35], it was of interest to determine if loss or disruption induced by a MGBD might render the HIV DNA nuclease hypersensitive in vivo. To test this possibility, HIV fragment 1 and the longer HIV segment on centromere-containing plasmids were introduced into yeast cells that lacked the RAD 52 gene, which is required for repair of DNA double-strand breaks [36]. DAPI and berenil had no effect on plasmid copy number or HIV sequence integrity in these cells or in wild-type cells nor did the drugs preferentially affect the growth rate of these cells in selective media lacking uracil relative to those that were transformed with control plasmids that lacked the HIV DNA. Likewise, the drugs did not increase the ratio of relaxed circular/supercoiled DNA of the HIV-containing plasmids. Thus, it appears that nucleosome loss in vivo does not necessarily render the HIV DNA sensitive to cleavage by cellular nucleases under the conditions used in these studies.

Discussion
Most of the parent MGBDs were developed as trypanocides and berenil remains used for this purpose today in Africa and Asia. Pentamidine has also been used to prevent and treat opportunistic pneumonia that is caused by Pneumocystis carinii which is common in AIDS patients. The drugs have found additional applications in clinical medicine including use as general antitumor, antibacterial and antiviral agents [12,[37][38][39][40]. However, the targets and modes of action of the drugs remain, for the most part, unknown. Although the MGBD distamycin and related analogues show anti-HIV activity in cultured cells, the MGBDs have received little attention as anti-HIV agents because many of the distamycin-related compounds are expensive and toxic to cells [41]. Likewise, to our knowledge, MGBDs have not been tested for activity against simian immunodeficiency viruses or the lentiviruses that infect non-primate mammals, though the nucleotide composition of these viral genomes are nearly identical to that of HIV-1 [8]. Much more emphasis has been devoted to the synthesis of distamycin conjugates and analogues as anti-HIV drugs with the goal of targeting the ligands to defined features of the viral life cycle [12]. For example, several distamycin-related polyanionic conjugates are inhibitors of HIV-induced cell killing and mechanistic studies suggest that the inhibition is due to the inhibition of virus attachment to CD4+ susceptible cells [42]. Another series of distamycin derivatives was designed to target the binding sites for several cellular transcription factors and a combination of these ligands inhibited HIV replication in lymphocytes [43]. However, as discussed by Tutter and Jones [44], the most significant problems associated with these analogs is the ease with which HIV-1 evolves to evade chemical inhibitors and the fact that no single sequence in the HIV genome is absolutely essential for virus replication. The studies described in this report point to an alternative global specificity of the MGBDs for the HIV provirus DNA that is based on the characteristic A-bias of the lentivirus genome. The bias dictates the unusual composition of essentially all lentiviral proteins and may also serve to enhance genetic variation and curvature of the HIV genome [8,9]. Consequently, the bias is considered to be critical for HIV phenotype making it unlikely that HIV variants that lacked this genome-wide sequence feature would appear during the course of an infection. Drugs whose anti-viral action is dependent on this global bias may thus provide a novel class of antilentiviral agents.
Nucleosome-length sequences with AT site densities as high as those seen in HIV are rare in the eukaryotic genome as implied from the results in Figs 3A, 3C, and 5A-5I which show that the MGBDs did not noticeably inhibit nucleosome assembly or promote nucleosome loss on genomic DNAs from human, chicken or yeast. The unusual nucleotide bias of HIV is the major driving force behind this specificity for it dictates the high frequencies of AT sites and their narrow minor groove character. All lentiviruses exhibit the strong A-bias and display little or no selection against the high levels of AT-trinucleotides [8] and tetranucleotides ( Fig  1D-1F) in their coding regions. In fact, the frequency of AT sites with narrow minor grooves in HIV is higher than in random sequence DNA (Fig 1D-1F) which may be related to the propensity of the lentiviral reverse transcriptase to elongate preexisting oligo-A tracts with A at the expense of G [45,46]. In contrast, less than 0.1% of the sequences in the primate subdivision of GenBank display an A-bias that is extreme as that seen in HIV [8]. In addition, oligonucleotide-length AT sequences are counter selected in cellular genes [47]. Thus, the unusual viral bias results in a high genomic levels of AT sites which are expected to be binding sites for the MGBDs. DNase1 footprinting studies confirmed this expectation by showing that DAPI protected nearly all of the AT sites > 2-3 bp in the HIV pol fragment 1. The compositional bias of the viral genome also favors AT sites with narrow minor grooves since homopolymeric A-tracts comprise a relatively large fraction of the AT sites. In addition, ATcontaining oligonucleotide sequences with TA steps are underrepresented in HIV DNA as compared to random sequence and to AT sites without TA steps (Fig 1D-1F). This viral characteristic is also unusual since the AT-containing oligomeric sequences in cellular DNA are enriched in TA but not AT dinucleotides [48]. This feature contributes to the specificity of the ligands for the HIV genome since the narrow minor groove of AT sites is required for high affinity binding of most of the MGBDs [3,29,30]. In addition, our previous studies have shown that extent of drug-induced inhibition of nucleosome assembly on synthetic repetitive fragments followed the order (CGGGAAAACC)n >(CGGGAATTCC)n >> (CGGGATATCC) n > (CGGGTTAACC)n~(CGGGACAACC)n [13]. This order roughly follows the relative frequencies of these AT-sequence motifs in the HIV-1 genome (Fig 1D-1F). Thus, the A-bias favors the HIV genome that is rich in clusters of AT sites that preferentially bind to MGBDs and these features are responsible for the marked specificity of the MGBDs for the disruption of chromatin [13,14].
The aromatic diamidine MGBDs including berenil, DAPI, and pentamidine are a group of related low molecular weight drugs that bind to AT-rich sequences [37][38][39][40]. As shown in Fig  3A and 3B, DAPI was the most effective drug tested at inhibiting nucleosome formation on HIV DNA. DAPI was also more effective than distamycin and Hoechst at blocking the assembly of nucleosomes and negating the effects of intrinsic DNA curvature and anisotropic bendability in studies that used other naturally occurring and synthetic DNA sequences with multiple closely spaced A-tracts with narrow minor grooves [13,14]. The high activity of DAPI as compared to distamycin and Hoechst has been attributed to the short length of this ligand and its absolute specificity for A/T sites with narrow minor grooves [13]. This specificity may ensure that only particular minor grooves that give rise to curvature and anisotropic bendability are occupied by the drug. Berenil shows the same AT-specificity as seen with DAPI although the binding affinity of this ligand for AT sites is lower than for DAPI [29]. This observation may have clinical implications since berenil was active in disrupting HIV-containing nucleosomes in vitro and in vivo (Fig 3A and 3B & Fig 5) and this drug is currently used to treat a variety of parasitic infections in humans and other mammals [38,49,50]. In addition, in contrast to many of the MGBDs, berenil and pentamidine display low toxicity and both drugs have been reported to be non-mutagenic in mammalian cells [48,51,52]. Distamycin and Hoechst 33258 have most often served as the starting point for synthesis of most MGBD conjugates and analogs [12,52]. A similar approach could be adopted for the synthesis of conjugates of the structurally simpler drugs such as berenil, pentamidine, or DAPI in order to optimize the specificity of these compounds for the clustered AT-sites in the HIV genome. In addition, berenil has previously been used to target cytotoxic platinum to AT rich regions in the genome [53]. Such conjugates targeted to the bone marrow using bone marrow targeting systems [54] might also be useful in promoting selective nucleosome disruption and lethal mutagenesis of the integrated HIV provirus. Similarly, MGBDs could conceivably be used in combination with CRISPR/Cas9-based HIV-1/AIDS therapeutic strategies, especially since recent studies have suggested that nucleosomes can strongly inhibit CRISPR-Cas9 targeting and cleavage efficiency [55][56][57][58]. The positioned nucleosome that assembles on HIV fragment 1 might represent an ideal locus for CRISPR-Cas9 targeting since this region in the reverse transcriptase gene is among the most conserved sequences in the HIV genome [8]. In comparison to nucleosomes assembled on fragments 2-5, this nucleosome was the most sensitive to disruption by the MGBDs and has the highest AnTm occurrence as well as the greatest narrow MGW frequency (Figs 1-3, Table 2).
Dimers of distamycin and neutropsin connected by a diammine bridge displayed a higher affinity for long (N = 8) A/T sites in synthetic DNA than the corresponding monomers. These and similar bridged molecules show heighten anti-HIV and 2 activity in culture cells relative to the monomers [59,60]. However, such long A/T tracts are rare in the HIV genome. Rather, the A richness of the HIV genome is due principally to A-tracts that are 3-5 bp in length and these tracts are frequently arranged in a 10 bp periodicity (Fig 2H) [9], a period that is coincident with the pitch of the DNA helix. In addition, the specificity of the MGBDs for promoting nucleosome disruption requires long (>120 bp) stretches of DNA containing such phased Atracts [13,14]. These considerations raise the interesting possibility that dimers or higher multimers of the smaller MGBDs, such as berenil, with linker lengths that correspond to the spacing of narrow minor groove sites might be more effective and specific than the monomers for HIV nucleosome disruption.
The MGBDs prevented nucleosome formation on the five segments of HIV pol studied in this report, and it is likely that the drugs would exert this selective inhibitory action on the bulk of HIV chromatin. It is also clear from the data in Fig 3C that the MGBDs promoted a preferential loss of histone octamers from HIV DNA in high salt since the DNA released from these particles comigrated with free DNA on PA gels and failed to bind to nitrocellulose filters which bind protein-DNA complexes. Whether the drugs induced a loss of octamers from HIV sequences in yeast is more uncertain. The results in Fig 5A-5I are consistent with such a loss but the presence of loosely bound disrupted nucleosomes at these sites cannot be formally excluded. Perhaps the most direct evidence supporting nucleosome loss is provided by the results in Fig 5J which show that the drugs reduced the number of superhelical turns of HIVcontaining plasmids. However, the human SWI/SNF complex has been reported to promote a reduction in DNA supercoiling per nucleosome without causing nucleosome loss [61] so that the data in Fig 5J do not prove that the drugs cause nucleosome loss in yeast.
Nucleosome loss and/or disruption has been correlated with cell death, promoter activation, changes in higher order chromatin structure, and enhanced nuclease accessibility [34,35]. The importance of chromatin disruption in the mechanism of action of anti-cancer DNA-binding drugs has also recently been highlighted by Gurova [62]. Disruption of HIV chromatin by MGBDs is expected to be detrimental to the expression of the provirus and perhaps even to the cell that contains this integrated sequence since AT sites with narrow minor grooves are a major determinant for nucleosome stability and positioning [63,64]. In addition, the HIV genome is highly deficient in CpG dinucleotides and methylation of this dinucleotide enhances nucleosome stability [7,8,65,66].
The results in Fig 3 show that the MGBDs have no detectable effects on the assembly of nucleosomes on genomic DNA illustrating the specificity of the drugs for HIV sequences. Furthermore, the genomics studies carried in Figs 1 and 2 provide additional support for how these HIV-1 sequences are distinguished from bulk of the human genome in terms of AnTm and narrow minor groove width density. However, it is likely that there are drug sensitive genomic sequences and it might be informative to understand the nature of these sequences. We previously studied this question by first performing a computer search of all sequences in the bacterial and invertebrate subdivision of GenBank in order to identify all genes that have an unusual nucleotide composition similar to HIV [67]. The results revealed that about 80% of the identified genes coded for membrane protein antigens (virulence factors) from mammalian pathogens and the amino acid composition of these proteins were similar to those from HIV. We also demonstrated that the MGBDs inhibited nucleosome formation on these sequences with a drug specificity similar to that displayed by the HIV sequences analyzed in Fig 3 [68]. These membrane antigen genes included those from Pneumocystis jirovecii, Entamoeba histolytica, and Plasmodium vivax. Whether this effect is involved in the anti-pathogenic action of the MGBDs in these organisms is a topic for future investigation.
Human fragile sites are segments of chromatin that fail to compact during mitosis and are seen as weak staining gaps and breaks in metaphase chromosomes [69,70]. A subset of these sequences is AT rich and the expandable microsatellite FRA 16B is induced MGBD [70]. This site was predicted to be packaged into MGBD sensitive nucleosomes and this prediction has been confirmed by experimental studies [14,71]. Berenil has been recommended for detection of this site as compared to the other MGBDs in metaphase chromosomes because of its lower cost, it induces higher levels of the site and has no negative morphological effect on metaphase chromosomes in optimal concentration [72]. The possibility was considered that the drugs might promote the breakage of HIV sequences in yeast as a result of the chromatin disruption effect but the data described in the Results Section suggest that this is not the case at least under the conditions used in this work. Thus, although the consequences of nucleosome disruption by the MGBDs remain to be clarified, they point to the need for future studies on the analysis of their effects on HIV infected vs non-infected CD4+human cells in terms of cell viability, proviral sequence integrity and packaging of the sequences into nucleosomes. The results of such studies might then provide an assay for the optimization of desired effects using existing MGBDs, and testing new drug derivatives and drug multimers.