Identification of Binding Targets of a Pyrrole-Imidazole Polyamide KR12 in the LS180 Colorectal Cancer Genome

Pyrrole-imidazole polyamides are versatile DNA minor groove binders and attractive therapeutic options against oncological targets, especially upon functionalization with an alkylating agent such as seco-CBI. These molecules also provide an alternative for oncogenes deemed “undruggable” at the protein level, where the absence of solvent-accessible pockets or structural crevices prevent the formation of protein-inhibitor ligands; nevertheless, the genome-wide effect of pyrrole-imidazole polyamide binding remain largely unclear to-date. Here we propose a next-generation sequencing-based workflow combined with whole genome expression arrays to address such issue using a candidate anti-cancer alkylating agent, KR12, against codon 12 mutant KRAS. Biotinylating KR12 enables the means to identify its genome-wide effects in living cells and possible biological implications via a coupled workflow of enrichment-based sequencing and expression microarrays. The subsequent computational pathway and expression analyses allow the identification of its genomic binding sites, as well as a route to explore a polyamide’s possible genome-wide effects. Among the 3,343 KR12 binding sites identified in the human LS180 colorectal cancer genome, the reduction of KR12-bound gene expressions was also observed. Additionally, the coupled microarray-sequencing analysis also revealed some insights about the effect of local chromatin structure on pyrrole-imidazole polyamide, which had not been fully understood to-date. A comparative analysis with KR12 in a different human colorectal cancer genome SW480 also showed agreeable agreements of KR12 binding affecting gene expressions. Combination of these analyses thus suggested the possibility of applying this approach to other pyrrole-imidazole polyamides to reveal further biological details about the effect of polyamide binding in a genome.


Introduction
While protein-level inhibitors aimed to inhibit enzyme activity or disrupt binding enjoy a large degree of commercial success, pyrrole-imidazole polyamides ("PIPs") and their accurate DNA base pair recognition [1][2][3][4][5] provide a promising option against a large class of so-called "undruggable" targets such as KRAS [6][7][8] which lack solvent-accessible surfaces or pockets for ligand binding.From paired heterocyclic amide building blocks of pyrrole and imidazole, PIPs can distinguish G/C and A/T (T/A) pairs via DNA minor groove binding with Im/Py and Py/Py pairs, respectively, down to the precision of a single hydrogen bond between the bases.Chemical functionalization provides versatility in regulating biological events [9][10][11][12][13][14][15] such as cell reprogramming, NF-κB-dependent gene transcription, retinal development, and the inhibition of specific biological targets [16,17].Nevertheless, as PIPs principally bind typically well less than 20 bp, the impact of binding at genomic locations other than the intended design site remains unclear to-date.Without exploring other possible binding sites, it may be too premature to conclude that unanticipated binding to non-KRAS loci has no effect, since off-target effects could also contribute to cancer cell death and mask the true efficacy of a therapeutic candidate.In spite of recent developments in next-generation sequencing, PIP binding studies, especially those involving alkylating functional groups, remain few and far between at the genome level.Hopefully with the procedure discussed in this manuscript, we can introduce a useful addition in the design pipeline to understand whether off-target binding may have a large effect on the tumor before more costly studies, e.g.animal models.Sequencing studies involving bioactive DNA-targeting polyamides conjugated with a psoralen moiety for photocrosslinking and biotin as affinity tag [18] so far have been largely restricted to chromatinized regions in the living cell genome until the latest report with a 6-bp alkylating PIP [19], albeit in an ex vivo setting.Following our previous study of a PIP with 9-bp recognition demonstrating toxicity against KRAS G12 mutant cell lines [20], we seek to evaluate, via a new sequencing procedure with the biotinylated KR12 to affinity capture nucleotides bearing binding sites for the polyamide in vivo, and computational workflow to determine whether other genomic binding sites in living cells may present an issue in the effectiveness of alkylating PIPs as therapeutic candidates.Corroborating findings here with our previous publication in which KR12 demonstrated distinguishable toxicities in KRAS G12D/G12V over wild-type cells suggested specific targeting of the mutant driver gene by the polyamide.Results here also revealed, for the first time at 9-bp precision, insights into the manner in which PIPs access the human genome in cells.

Synthesis of KR12
The backbone chain (ImPy-β-ImImPyIm-γ-PyIm-β-PyPy) for the (+) strand motif 5'-TGWWGGCGW-3' was synthesized in a stepwise Fmoc solid-phase synthesis reaction on a PSSM-8 peptide synthesizer (Shimadzu) with a custom operating system on 10-μmol scale with Fmoc-β-alanine Wang resin.Cleavage from solid support was achieved with 0.5N LiOH/NMP at 55°C for 1 h.Purification of the cleavage product was performed using highperformance liquid chromatography on a LC-20 (Shimadzu), using a 10 mm × 150 mm Gemini-NX3u 5-ODS-H reverse-phase column (Phenomenex) using 0.1% acetic acid in Milli-Q water and a linear gradient of 0-66.7% acetonitrile over 20 m at a flow rate of 10 ml m -1 with detection at 310 nm.Collected fractions were analyzed by LC-MS2020 (Shimadzu).After purification, N-terminal biotinylation and C-terminal conjugation of seco-CBI to produce KR12 was achieved by incubating the cleavage product in the presence of biotin, PyBOP and DIEA for 2 h to allow reaction; after the conversion of the C-terminal carboxylic acid to the activated 1-hydroxybenzotriazole ester, the reaction was mixed with 5.1 μmol NH 2 -seco-CBI in 100 μL NMP [36] and stirred overnight to allow coupling.After solvent evaporation, the reaction product was characterized and purified by liquid chromatography and a linear gradient of 0.1% acetic acid in 30%-75% acetonitrile and water over 0.5 h with detection at 310 nm.The final product (t R = 16.87 m) after purification was lyophilized and stored.

Cell Cultures and Viability Assay
Colorectal adenocarcinoma LS180 and SW480 cells were cultured and maintained at 37°C, 5% CO 2 under humid conditions in Eagle's Minimum Essential Medium (MEM, Gibco) and Dulbecco's Modified Eagle's Medium (DMEM, Gibco), respectively.Media were additionally supplemented with 10% fetal bovine serum (Gibco) and 1% penicillin/streptomycin (Gibco).For sequencing experiments, cultures were prepared in replicates at 6 × 10 5 cells/dish prior to KR12 administration.For WST viability assays, cells were seeded at a final density of 10 3 cells per sample in a 96-well plate for overnight attachment prior to treatment with 0.1% DMSO, 300 and 1000 nM KR12 as well as non-biotinylated variant (KR12 N/B) in 0.1% DMSO for 48 h to assess compound toxicity.After compound administration, WST-8 reagent (Dojindo) was added to each sample for incubation at 37°C for 2 h prior to absorbance measurement at 450 nm on a MTP-310 photospectrometer (Corona).

Ion Torrent Sequencing
LS180 and SW480 cultures were treated with 500 nM KR12 in 0.05% DMSO for 6 h before washing, detachment in ice-cold phosphate-buffered saline and collection by centrifugation.Pellets were resuspended in a lysis buffer of 5 mM PIPES, 0.5% NP40 and 1% Complete Protease Inhibitor, incubated on ice for 10 m to extract nuclei by ultracentrifugation.After resuspension in nuclease lysis buffer of 50 mM Tris-HCl, 10 mM EDTA, 1% SDS and 1% Complete Protease Inhibitor at pH 8.1, samples were again incubated on ice for 10 m prior to overnight storage at -80°C.Since KR12 formed alkylation adducts with DNA, crosslinking was not performed during the DNA enrichment process.Genomic DNA was fragmented by sonication on a Covaris M220 for two 10 m runs of 200 cycles (peak power of 75, duty level of 26) before precipitation in a solution of 300 mM NaOAc in 70% ethanol (JIS Special Grade, Wako) at 30°C and isolation of genomic DNA by Micro-Vac evaporation (Tomy Digital Biology).Samples were resuspended in nuclease-free water (Millipore) for 15 m before collecting the supernatant for enrichment with Dynabeads MyOne streptavidin C1 beads (Invitrogen) over 15 m.Beads were washed twice in a buffer of 20 mM Tris-HCl, 0.1% SDS, 1.1% Triton X-100, 2 mM EDTA and 500 mM NaCl at pH 8.0, and five additional times in detergent-free buffer (5 mM Tris-HCl, 0.5 mM EDTA, 1 M NaCl at pH 7.5) prior to elution with 2% SDS, 0.1 mM NaHCO 3 and 3 mM biotin at 65°C for 4 h under agitation.
Enriched DNA was purified by RNase A (5 μg, Invitrogen) for 10 m at 37°C, followed by proteinase K (50 μg, Merck Millipore) treatment at 55°C for 40 m.Samples were mixed with glycogen (Ambion), resuspended prior to phenol extraction.The supernatant was retained, from the phenol layer, after centrifugation in 1:1 TE:NaCl and stored as aliquots in absolute ethanol at -20°C prior to precipitation by centrifugation (17,000 × g, 4°C for 0.5 h).Samples were evaporated in open atmosphere, reconstituted in 20 μL TE and quantified by Qubit 2.0 Fluorometer (ThermoFisher).Samples were end-repaired by 30 m incubation in the supplied end-repair buffer and enzyme (2 μL) in Ion XpressTM Plus Fragment Library Preparation Kit; the reaction mixtures were suspended with AMPure XP Reagent Beads (Agencourt), washed in ethanol (70%) and eluted in Low TE buffer before adapter ligation for 30 m and again purified with AMPure beads.Nick repair and amplification with Platinum PCR SuperMix High Fidelity mix (ThermoFisher) proceeded at 72°C for nick repair (20 m), 95°C for denaturation (5 m), 18 cycles of 97°C denaturation, annealing at 60°C and 70°C extension (15 s, 15 s and 1 m, respectively).Library preparation proceeded with 1.68 ng of enriched DNA with Ion Plus Fragment Library Kit (ThermoFisher) and AMPure XP Reagent Beads (Agencourt) per manufacturer's recommendations.Library sizes were validated on a Bioanalyzer 2100 (Agilent) and quantified using the Ion Library Quantitation Kit (ThermoFisher); after emulsion PCR of the prepared sample on Ion OneTouch 2 (ThermoFisher), a final quantity of 0.8 fmol DNA was used for sequencing on an Ion Proton sequencer (ThermoFisher).Data acquisition and alignment to hg19 genome were performed using Torrent Suite 5.0.4.

Differential Identification and Statistical Validation of KR12 Sites from Sequencing Results
Ambiguous reads, i.e. mapping quality = 0, were removed from the BAM files before summarizing base calls of aligned reads (mpileup) with samtools, with BAQs recalculated on the fly and genotype likelihood output in uncompressed BCF format.Sequencing data (treatment and control) were then converted to BED format before differential calling by diffReps with hg19 as the genome size, method of chi-square, fragment size of 0 and no annotations; window sizes varied from run to run within the range of 400-1200 bp.Sequences were regenerated over such diffReps-assigned regions from the source FASTA file with indels ignored (see General Computational Section), by bedtools, ignoring strand information.Matches to the KR12 motif ("TGWWGGCGW" on the (+) strand), in both forward and reverse complementary orientations, were identified by searching against the reconstructed sequence files, followed by a backtransformation to hg19 coordinates.Positions were again inspected against pileup results for indels; sequences from such sites were reconstructed by variant calling with bcftools over given regions by multi-allelic calling, with compressed output containing only variant sites.Indel information was used to reconstruct a small region for motif searches and translated to hg19 coordinates.Transcript positions and gene symbols from hg19 RefFlat (last accessed Feb 2015) were used to generate an annotation including transcript locations and promoter regions, defined as 1000 bp region upstream of transcription start site (TSS), based on RefFlat strand information.KR12 candidates were subsequently annotated to produce a list of all positions of candidate KR12 sites from sequencing results annotated by their respective gene or promoter symbols.
For statistical validation, a list of nonbinding regions was generated by subtracting candidate KR12 sites from the list of hypothetical sites via bedtools (removing entire feature if any overlap was present).Randomizations were performed to generate null foreground and background positions in R (sampling without replacement, with an equal number of null and candidate sites).Per-base coverage was calculated over candidate regions as well as paired null foreground and background regions from sequencing data, before unity normalization of each region.A differential distribution was determined by subtracting per-base coverage between the input and pulldown groups; for the null differential distributions, shuffling was performed to pair randomly the null foreground with a background to increase randomness.For each of the candidate differential distribution, two-sample Kolmogorov-Smirnov test was performed for each of the null differential distribution under the hypothesis that there was no difference between the distributions; resultant p-values were adjusted for multiple comparisons by Benjamini-Hochberg corrections.Statistical significance was assessed based on the adjusted p-value at the 99.9 th percentile against a pre-defined α-level of 0.05; sites with a p-value of 0.05 < p < 0.055 were considered marginally significant and likewise annotated.Fold enrichment was calculated based on the log 2 ratio of the maximum coverage within a given window in the pulldown and input samples.Results from multiple runs of varying window sizes for diffReps were compiled to produce the final output of 3,343 KR12 binding sites.

Expression Microarrays
LS180 and SW480 cultures at 9.6 × 10 4 cells per sample were plated in a 6-well microtiter plate for overnight attachment prior to treatment with 500 nM KR12 or 0.05% DMSO for 6 h.After RNA extraction with RNeasy Plus Mini Kit (Qiagen), samples at 100 ng were labeled with RNA Spike-In Kit and analyzed on SurePrint G3 Human GE 8x60K V2 microarrays per Agilent Technologies' recommendations.Arrays with sample replicates (2 × 2 for LS180, 3 for SW480) for each condition (DMSO, control; treatment, 500 nM KR12) were scanned on an Agilent SureScan microarray scanner, with differential expressions calculated by the LIMMA package [37] (background correction by the "normexp" method with an offset of 16, and scale-normalized for replicates between arrays).LIMMA calculated a linear model fit for each gene and utilized the method of empirical Bayes for statistical evaluations and differential expressions, with fold changes calculated from the difference between DMSO and KR12 treatments.Spots with matching RefSeq mRNA and ncRNA identifiers were filtered and retained for subsequent analyses.Statistical significance of gene expressions, unless otherwise specified, was assessed by two-sample t-test under the null hypothesis that there was no difference between their sample means against a predefined α-level of 0.05.For the purpose of comparison, genes with promoter region (defined as 1000 bp upstream of transcription start site) binding were also annotated with corresponding mRNA expressions.

Nucleosome Occupancy Profiling
DHS, H3K27Ac and nucleosome structure information were obtained as described in the General Computational Tools Section.ChromHMM tracks from eight cell lines (GM12878, H1 hESC, HepG2, HMEC, HSMM, K562, Nhek and Nhlf) were combined to identify common overlaps, and heterochromatin regions with !5/8 overlaps were retained to create a consensus heterochromatin track for comparison.LS180 H3K27Ac data were reprocessed in MACS 1.4.2 to generate a region file of signature peaks with default parameters and an effective genome size of 2.7 × 10 9 .H3K27Ac and KR12 binding sites were annotated to the closest transcript features to facilitate the determination of two features occupying the same gene or otherwise, and for each KR12 binding site assigned to a gene (or a promoter region), distance to the closest H3K27Ac feature was determined similarly.

Gene Set Enrichment and Network Analysis
Gene set enrichment analysis was performed with PANTHER 10.0 (May 15, 2015 release) statistical overrepresentation test, using the list generated from sequencing results as well as genes belonging to the Ras pathway (hsa04014, KEGG [38,39]) under default settings (predefined αlevel of 0.05, Bonferroni corrections against a reference list of Homo sapiens genes).A list of downregulated KR12-bound genes was selected by cross-referencing microarray expression results for negative log 2 FC values for interaction analysis by STRING 10.0 with the following parameters: highest confidence of 0.900, no additional nodes, custom limit of 0 interactors and all active prediction methods [40].The resultant network was reconstructed in Cytoscape [41] using STRING-computed combined scores, and sub-networks with nodes less than or equal to four hidden for illustrative purposes.

KEGG Pathway Significance
The KEGGREST package [42] in R was used to acquire pathway-specific information from the KEGG database.All 301 human pathways and corresponding gene lists were extracted from KEGG.The gene lists were then compared against KR12 binding sites from LS180 sequencing data to test for statistical significance in three different areas: 1) whether the mean gene expression in a given pathway compared to other genes was different; 2) whether mean expressions of KR12-bound genes in that pathway was different from non-KR12-binding genes, and 3) whether mean expressions of KR12-bound genes in the pathway were different from other KR12-bound genes.Pathways with less than 2 KR12-binding genes are excluded from tests of significance (81).For pathways satisfying all three criteria (9/220), they were considered candidates for off-target pathways of KR12.Common (duplicate) KR12-bound genes from candidate pathways were checked against all KEGG human pathways involving KRAS for possible involvement to assess off-target effects of KR12.

Prediction of hg19 binding sites of scrambled KR12 motifs
The last eight bases of the KR12 motif (TGWWGGCGW on the (+) strand) were randomly scrambled for 100 permutations to check for genomic binding sites in hg19, with coding region annotations extracted from the Table Browser from UCSC.To simulate 3' adenine alkylation by the conjugated seco-CBI moiety [43] for the scrambled 9-bp polyamides, the 5' terminal thymine was fixed for all 100 permutations.Counts for each motif were divided by the corresponding KR12 value by the same metric.
Sequence similarity between KR12-bound non-coding RNA and KRAS Annotations for lncRNA (colon lincRNA RNA-Seq Reads [44]) and miRNA (snoRNA and miRNA [45]) were extracted from the Table Browser from UCSC and intersected with the list of KR12 binding sites.Regions with at least 1 base overlap were retained and their sequences, along with flanking up-and down-stream 1000 bp, were extracted from the LS180 genomic sequence to generate a fasta file containing those 2009 bp regions for 9-mer composition analysis by jellyfish [46] with an initial hash size of 3G.The KRAS transcript (chr12: 25357722-25403865) was also extracted for comparison.Tantan [47] was used to perform masking of simple repeats with the letter N.

Data Availability
Sequence read datasets for the KR12-enriched ("pulldown") and unenriched ("input") LS180 and SW480 genomes are available in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA342228; expression microarray datasets are available at the NCBI Gene Expression Omnibus (NCBI GEO) database (accession number GSE86599).

Synthesis of KR12 and Cell Toxicity
We functionalized the original PIP (KR12 N/B) with biotin to create KR12 (Fig 1A ) from a modified Fmoc solid-phase peptide synthesis procedure [20].The one-pot preparation of KR12 via the use of excess PyBOP to allow simultaneous activation of the polyamide backbone C-terminus and biotin yielded the biotinylated intermediate 2, and the subsequent seco-CBI coupling and liquid chromatography successfully afforded the final product (Fig 1B).Since KR12 exhibited similar cell toxicity as the original PIP in LS180 cell lines (Fig 1C ), we proceeded with ligand affinity-captured DNA from LS180 6 h after KR12 (500 nM) administration for sequencing.

Design of Sequencing Pipeline
We then devised a method to enrich KR12-bound nucleotides to sequence for possible KR12 binding sites in the genome of LS180, a human colorectal cancer cell lines with KRAS G12V mutation.Upon DNA extraction and fragmentation by sonication, KR12-alkylated nucleotides underwent streptavidin enrichment and Ion Torrent sequencing to yield ~12 Gb per run of sequencing data.Initial inspection showed a mixture of broad and narrow peaks that deviated from typical TFBS ChIP-seq or histone modification, e.g.H3K36me3, experiments, and were also uncharacteristic of DNase-seq [25,[48][49][50].Since PIP-DNA interactions would have a smaller binding surface than their protein-DNA counterparts, it was unclear whether popular methods such as MACS or ZINBA would perform optimally with this dataset.
Additionally, our decision to omit formaldehyde crosslinking also attributed to the uncharacteristic peak distributions observed in the sequencing results.As the purpose of crosslinking was to preserve the interaction of DNA with protein targets, e.g.transcription factors, chromatin, etc., via the formation of a methylene bridge between a nucleoside and a nucleophilic nitrogen or sulfur of an amino acid, this additional step was unnecessary with alkylating PIPs due to the presence of an alkylating agent capable of forming a covalent linkage with the template 3' terminal adenine upon binding.Since concerns about the effectiveness of crosslinking to preserve all protein-DNA ligands in certain cases had also been raised, e.g.[51], it was unclear whether the added step will provide additional fidelity of securing polyamide-DNA interactions.The use of formaldehyde would most likely crosslink DNA to chromatin and other transcription factors, complicating analysis with alkylating polyamides.As previous methods, e.g., COSMIC [18], did not describe whether it was possible to isolate polyamide binding sites outside chromatinized regions, we decided to omit chemical crosslinking in this current procedure.
While ChIP-seq methods utilizing chemical crosslinking had been relatively mature and widely adopted, PIP or non-crosslinking sequencing experiments were relatively rare and thus less understood; as such, the extent of noise and peak qualities remained to be elucidated.Since the effect of omitting chemical crosslinking on sequencing results was unclear, we devised a different approach to identify genomic regions enriched by KR12, initially deploying a slidingwindow comparison by diffReps [24] followed by a PIP-specific filtering process to find sites in differentially enriched regions of the LS180 genome (Fig 2A).We reassembled genomic sequences in the enriched peaks for matches to 5'-TGWWGGCGW-3' (+), the KR12 9-bp motif previously found to be a full-match binder by Bind-n-Seq analysis [19].Following sequence reassembly over indels from multi-allelic variant calling, we designed an additional statistical validation routine to determine whether candidates from peak calling did possess sufficient experimental evidence to suggest enrichment over non-binding KR12 sites.While the extra rigor might appear to be unnecessarily conservative and complicate the peak discovery process, the added safeguard allowed us to approach the exclusion of a traditionally important step more cautiously.We constructed a null background population consisting of randomly sampled predicted KR12 sites based on motif matches in the hg19 genome for differential coverage comparison by two-sample Kolmogorov-Smirnov test, a non-parametric method to account for possible deviations from normality in the heterogeneous peak distributions.After Benjamini-Hochberg corrections for multiple comparisons, we assessed statistical significance for each candidate by their 99.9 th -percentile adjusted p-value.

KR12 Binding in the Human Colorectal Cancer LS180 Genome
In total, we identified 3,343 KR12 binding sites (S1 Appendix; sample spectra in Fig 2B ) in the LS180 genome, compared to 29,263 hypothetical positions in hg19.Among those, 1,556 and 23 were located within the transcripts and promoter regions (defined here as 1,000 bp upstream of transcription start site), respectively; 65 were on the list of 409 cancer-related genes [52].These numbers were in sharp contrast to a mere list of 20 sites by MACS (S1 Table ); this level of underestimation also suggested the possibility of encountering an elevated level of false negatives when adopting typical ChIP-seq procedures for PIP-based applications without modifications.Presumably, this would become more apparent with longer PIPs when their longer base-pair recognition and stronger interaction with DNA further reduced the number of possible sites.
As part of the validation, we performed PCR to check whether gene expressions of candidate genes such as KRAS and PIK3CA were disrupted by KR12 binding compared to sites that could potentially be alkylated yet not found to do so, e.g.GUSB (Fig 2C).Reduced transcript levels were indeed observed in KRAS and PIK3CA while GUSB and RPS18 failed to show discernible differences.These results also indicated that the enzymatic repair step, performed during sequencing but not PCR, did seem to restore the covalently modified nucleotides to a sequencer-compatible stage; this was evident from the presence of differential coverage and in a previous report [19].

Effect of KR12 Binding on Gene Expressions and Implications on KRAS
Gene expression microarrays showed a significant difference in the mean expression of KR12-bound genes (-log 2 FC of 0.658) and those without (0.339, Fig 3A left), echoing our Enrichment by streptavidin allows the capture of KR12-bound nucleotides.A computational routine maps candidate regions in sequencing data, followed by site calling via motif matching and statistical validation.Subsequent microarray analyses provide the means to confirm binding data with genome-level changes.(B) Sample sequencing coverage of PIK3CA and KRAS, genes with KR12 binding sites ("KR12+"), and for reference, a predicted site by motif matching but non-binding ("KR12-") in GUSB; windows centered around a KR12 site (black arrow, dashed line) are within -500 to +500 bp; blue and orange arrows indicate cumulative coverage for the pulldown and input tracks, respectively.(C) Semi-quantitative PCR of RPS18, KRAS, PIK3CA and GUSB in the presence of 500 nM KR12 or DMSO as control."Motif match" and "binding" indicate whether a particular gene contains a computationally predetermined match to the KR12 motif in the hg19 genome or a KR12 binding site determined by sequencing analysis, respectively.doi:10.1371/journal.pone.0165581.g002previous report that KR12 was capable of inducing toxicity in cancer cells.A significant decrease over the nonbinding sites was also observed (Fig 3A right).Comparing expressions of KR12-bound genes with all possible candidates prior to statistical validation also expectedly showed distinguishable reductions (Fig 3B).Coupled with a priori knowledge from previous Bind-n-Seq results [19], the workflow proposed here appeared to eliminate noise and false-positives by combining gene expressions.Further optimizations should allow us to address and reduce the level of false-negatives in the form of rare mismatch binding events.While gene expressions decreased upon KR12 binding, changes at the protein and subsequently the phenotypic levels were otherwise minimal due to the small fraction (~10%) of binding sites within protein coding regions.Comparing 9-mer frequencies of noncoding sections in the genome (2,000 bp centered around KR12 sites within lncRNA) with the KRAS transcript showed a relative low number of sites (138) contained with stretches of lncRNA and suggested that the possibility of KR12 affecting gene transcription via lncRNA interference was low, even given sequence similarities at the 9-mer resolution between lncRNA and the KRAS transcript (S2 Appendix).Coupled with our previous observations that KR12 treatment led to a reduction in Ras-GTP levels in vivo as well as phenotypic changes that were strongly Ras-dependent [20], it appeared that while KR12 did have a number of possible genomic targets, cellular responses to the polyamide remained overwhelmingly due to the transcriptional disruption of KRAS mutant codon 12. Previous results in which KR12 demonstrated appreciable toxicities in cell lines bearing KRAS G12D/G12V mutations over cells with wild-type or non-D/V mutations inferred the possibility of KR12 effectively targeting the mutant driver gene.We tried to confirm our analytical procedure with the SW480 genome, in which KRAS carried a homozygous codon 12 G12V mutation (S3 Appendix).Under the same criteria, site characterization results showed degrees of similarity between the makeup of KR12-bound genes in the two genomes and significance between the expression of KR12-bound and unbound genes.Considering that the transcriptomes of SW480 and LS180 were dynamic and thus understandably different, agreement between the two mutant KRAS colorectal cancer genomes at the same time infers that the approach can be extended to other genomes to identify binding sites by alkylating polyamides.
We then sought to understand if additional data exploration via interaction-association pathway analyses and a series of statistical assessments could substantiate results obtained from the workflow.From gene set enrichment analysis, we observed no statistical significance in the mean expressions of KR12-bound genes or otherwise (Table 1; S1 Fig) , indicating the absence of specific effect by KR12 in those pathways despite enrichment.At the same time, KR12-bound genes exhibited an entirely different makeup of pathway overrepresentation compared to members of the Ras pathway (hsa04014 from KEGG).Expressions of KR12-bound genes in other pathways were also evaluated (S2 Table ), and while some pathways did exhibit above-baseline changes in mean expressions, they frequently contained several common KR12-bound genes (S3 Table) either associated with KRAS; e.g.SMURF2 [53], or participated in the same pathways as KRAS.As such, expression changes in those pathways could also be the consequence of KRAS inhibition.As such, these observations would suggest that while KR12 did bind non-KRAS sites non-systematically, most binding events appeared to be relatively random and did not as a whole contribute to phenotypic changes to the extent that binding to KRAS mutant codon 12. Network analysis of down-regulated genes (Fig 3C ; S2 Fig) revealed only a small number of direct interactions stemmed from KRAS, namely SPTBN1, BCL2, TEK, KIT and PIK3CA.However, these first neighbors were heavily associated with other down-regulated genes in the same network, corroborating previous findings that KR12 binding to KRAS would then induce changes along the downstream pathway cascades.From these results, it appeared possible to use binding data with pathway information to generate statistically based criteria to elucidate the extent of non-systematic binding of a PIP.

Chromatin Accessibility and Other Implications
The broadened search space in our pipeline led us to explore chromatin accessibility in the context of KR12 binding.Comparing site overlaps with DHS and heterochromatin mappings suggested some propensity for KR12 to access condensed chromatin (Fig 4A ; ~80% DHS and 65% heterochromatin), a surprising departure from the earlier hypothesis of preferential binding in loose chromatin [54].A weak correlation between gene expression and the relative distance from H3K27Ac peaks for intragenic KR12 binding sites (Fig 4B) also suggested some degree of binding independence from chromatin structure.Upon examining the expressions of 161 transcription factors as listed in the ENCODE clustered transcription factor binding site data, we found no significant differences in the mean or variance of expressions between KR12-bound transcription factors or otherwise (p = 0.142 and p = 0.121 from two-sample t-and F-tests, respectively); this further suggested that KR12 did not have a direct role in influencing largescale gene transcription.Furthermore, out of the 161 transcription factors, only 67 contained predicted KR12 binding sites, and only 12 contained sites not located in the introns (S4 Table ).These observations echoed a recent proposal that sufficiently small transcription factors (~5 nm in diameter, or < 50 kDa) could penetrate and reorganize local chromatin structure to enable transcription [55].PIPs in general have molecular weights an order of magnitude lower than the smallest transcription factors, and thus are able to bind its targets in condensed chromatin; nonetheless, their high affinity to DNA may still induce steric hindrance to hamper transcription, leading to reduced expressions (Table 2).While such common measures to predict chromatin accessibility were sufficiently reliable, the dynamic nature behind nucleosome arrangement would still require further experimentation to verify so.Considering that previous reports of chromatin accessibility experiments only utilized synthetic or purified nucleotides in ex vivo settings, results here could potentially be one of the first sequencing-and expressionbased evidence of the genome-wide accessibility of PIP in living cells, which further improved their attractiveness as therapeutic candidates than previously imagined.
As PIP binding appeared to be chromatin-independent and only weakly correlated with transcriptional regulation, the disconnection between gene-and protein-level changes still remained to be explored.A possibility raised in the earlier section was the surprisingly small number of binding sites in coding regions leading to fewer disruptions of changes at the protein  level.We then compared the total number of possible binding sites in the hg19 genome for KR12 against 100 randomly scrambled motifs revealed a relatively small number of binding sites (~30 percentile) and a small fraction of coding regions (~43 percentile, Fig 4C); KR12's relatively infrequent binding in the coding region could attribute to some extent the discriminating toxicity it had on cell lines with G12D/G12V mutations over wild-type.The concept of "oncogenic addiction, " in which tumors relied on one dominant oncogene, e.g.KRAS, for growth and survival [56], has been advocated as possible approaches to molecular therapy; as such, the use of KR12 presents a viable option for treating mutant KRAS genotypes.While alkylation of occupied sites by KR12 did contribute to DNA damage and reduced gene expressions, the relatively few binding sites in promoter and coding regions perhaps explained the highly specific and directed suppression of mutant KRAS codon 12 by the polyamide.The relatively small subset of genomic binding by KR12 in contrast to the nonspecific alkylation of guanines and adenines by conventional alkylating neoplastic agents such as melphalan or mechlorethamine [57] could also lead to a potential reduction in the amount of adverse side effects.As more applications of PIP are explored, the need for similar sequencing and expression-driven exploratory analyses will also rise; incorporation of the workflow herein could then enhance the throughput of designing and optimizing new and existing polyamide designs, especially with further optimizations in the enrichment procedure and computational peak identification.

Conclusion
The use of biotinylated alkylating polyamides in next-generation sequencing, followed by Kolmogorov-Smirnov-validated sliding-window enrichment identification and statistical assessment of coupled site-expression microarray data, provides a workflow to characterize possible genome-wide effect of a PIP.Assessment of statistically significant pathways based on the impact of PIP binding, intra-association of genes with binding sites in a network analysis as well as chromatin accessibility and coding-region binding frequencies also provide a basis of deciding whether a polyamide will have large-scale genetic level effects as well.The analytical steps proposed in this workflow are sufficiently modular and implementable in-line with an existing PIP design protocol.
While the identification of KR12 binding sites in the LS180 genome suggested that PIP binding was not unique only to the oncogene, evaluation of gene expressions and pathway analyses revealed that such binding events appeared to be non-systematic.In conjunction to previous reports where KR12 exerted toxicity against specific mutant cancer cell lines and demonstrate changes in cell cycle, it also suggested that non-KRAS binding overall had little contribution to changes at the phenotypic level.

Fig 2 .
Fig 2. KR12 binding in the human colorectal carcinoma LS180 genome.(A) Workflow to identify KR12 binding sites in the LS180 genome by IonTorrent sequencing.Cells are administered with KR12 (500 nM, 6 h) prior to genomic DNA extraction and fragmentation by sonication.Enrichment by streptavidin allows the capture of KR12-bound nucleotides.A computational routine maps candidate regions in sequencing data, followed by site calling via motif matching and statistical validation.Subsequent microarray analyses provide the means to confirm binding data with genome-level changes.(B) Sample sequencing coverage of PIK3CA and KRAS, genes with KR12 binding sites ("KR12+"), and for reference, a predicted site by motif matching but non-binding ("KR12-") in GUSB; windows centered around a KR12 site (black arrow, dashed line) are within -500 to +500 bp; blue and orange arrows indicate cumulative coverage for the pulldown and input tracks, respectively.(C) Semi-quantitative PCR of RPS18, KRAS, PIK3CA and GUSB in the presence of 500 nM KR12 or DMSO as control."Motif match" and "binding" indicate whether a particular gene contains a computationally predetermined match to the KR12 motif in the hg19 genome or a KR12 binding site determined by sequencing analysis, respectively.

Table 1 .Fig 4 .
Fig 4. Chromatin accessibility of KR12.(A) Position-coverage plot of KR12 binding (9 bp per point, black and red) in relation to DHS regions (10,000 bp per point, gray) per chromosome.Horizontal axes, relative position along chromosome; vertical axes, relative coverage of a feature; for DHS regions, a coverage of 100% indicates that all 10,000 bp of a genomic region have DHS, while for KR12 sites, black spots (0%) indicate that a particular site is outside the boundaries of a DHS region, in contrast to the red sites (> 0%).(B) Gene expressions (vertical,-log 2 FC) as a function of KR12 site proximity (horizontal, 1000 bp) to the histone modification feature H3K27Ac.Spearman's correlation coefficient (ρ = -0.171)suggests a weak correlation with histone modification.(C) Comparison of predicted binding site counts in coding regions (CDS) and in the hg19 genome for KR12 (red) against 100 randomly selected motifs (black, see S4 Table); horizontal axis, ratio of binding sites per motif/KR12 binding sites; vertical axis, ratio of binding sites in CDS per motif/KR12.Dashed line with slope of 1 (gray) is provided as reference.

Table ) ;
horizontal axis, ratio of binding sites per motif/KR12 binding sites; vertical axis, ratio of binding sites in CDS per motif/KR12.Dashed line with slope of 1 (gray) is provided as reference. doi:10.1371/journal.pone.0165581.g004

Table 2 . Expressions and distributions of KR12 binding in local chromatin structures.
Chromatin ratio, odds of loose/condensed chromatin for DHS (within/outside) and heterochromatin (outside/inside); LC+ and LC-, mean expressions of KR12-bound genes in loose and condensed chromatin, respectively; corresponding values from hg19 predictions are listed as reference.Statistical significance between experiment (LS180) and predictions (hg19) is assessed by Welch's two-sample t-test.p-value is estimated from 1000 bootstrapping trials of n = 100; n.s., not significant.doi:10.1371/journal.pone.0165581.t002