Transcription apparatus of the yeast killer DNA plasmids: Architecture, function, and evolutionary origin

Transcription of extrachromosomal elements such as organelles, viruses, and plasmids is dependent on cellular RNA polymerase (RNAP) or intrinsic RNAP encoded by these elements. The yeast Kluyveromyces lactis contains killer DNA plasmids that bear putative non-canonical RNAP genes. Here, we describe the architecture and evolutionary origin of this transcription machinery. We show that the two RNAP subunits interact in vivo, and this complex interacts with another two plasmid-encoded proteins - the mRNA capping enzyme, and a putative helicase which interacts with plasmid-specific DNA. Further, we identify a promoter element that causes 5’ polyadenylation of plasmid-specific transcripts via RNAP slippage during transcription initiation, and structural elements that precede the termination sites. As a result, we present a first model of the yeast killer plasmid transcription initiation and intrinsic termination. Finally, we demonstrate that plasmid RNAP and its promoters display high similarity to poxviral RNAP and promoters of early poxviral genes, respectively.


INTRODUCTION
Linear double-stranded DNA plasmids were found in the cytoplasm of several yeast species.
Structural organization of these plasmids is quite uniform and they are often present as two or three differently sized DNA elements in yeast host cells (Jeske et al. 2007). Characteristic features of these plasmids are terminal proteins covalently linked to the 5′ ends of their DNA, terminal inverted repeats, and their cytoplasmic localization (Gunge et al. 1982;Kikuchi et al. 1984;Stam et al. 1986). Yeast linear plasmids of Kluyveromyces lactis, termed pGKL1 (or K1) and pGKL2 (or K2), have become a model system to study such DNA elements. These plasmids have compact genomes with occasional overlaps of open reading frames (ORFs) and a high AT content of ~74% (Sor and Fukuhara 1985;Tommasino et al. 1988). The presence of both pGKL plasmids in several K. lactis strains is associated with the extensively studied yeast killer phenotype (Gunge et al. 1981).
Functions of protein products for most ORFs encoded by the pGKL plasmids were predicted using bioinformatics approaches and some of these proteins were characterized by biochemical and genetic analyses (Jeske et al. 2007). Both pGKL1 and pGKL2 encode their own DNA polymerase and a terminal protein, and it is assumed that the mechanism of their replication is similar to the replication of viruses of the Adenoviridae family or Bacillus subtilis bacteriophage φ 29 (Stark et al. 1990). Consequently, marked sequence similarities between viral enzymes and putative products of several linear plasmid ORFs with expected function in replication and transcription resulted in yeast linear DNA plasmids being called virus-like elements (Satwika et al. 2012). Hence, it is believed that these linear plasmids may have originated from endosymbiotic bacteria or a virus (Kempken et al. 1992). Nevertheless, the exact evolutionary origin of the yeast linear plasmids remains unclear.
Transcription of plasmid-specific genes has been shown to be independent of mitochondrial (Gunge and Yamane 1984) and nuclear RNA polymerases (Romanos and Boyd 1988;Kämper et al. 1989;Stark et al. 1990;Kämper et al. 1991), and probably utilizes a plasmid-specific RNA polymerase (RNAP). Experiments with bacterial reporter and yeast nuclear genes fused with pGKL plasmid sequences identified an upstream conserved sequence (5′-ATNTGA-3′) preceding each of the open reading frames. This upstream conserved sequence (UCS), usually located at a distance of 20-40 nucleotides prior to the start codon, is essential for cytoplasmatic transcription of the downstream located gene (Schründer and Meinhardt 1995;Schickel et al. 1996;Schründer et al. 1996). Sequences located farther upstream of the UCS element have been shown to have no effect on transcription (Schickel et al. 1996). The UCS element is highly conserved among all yeast linear plasmids and the UCS sequence derived from the Pichia etchellsii pPE1B plasmid acts as a functional promoter when transplanted into the pGKL1 plasmid (Klassen et al. 2001). Thus, the UCS element is a universal cis-acting component of the plasmid-specific transcription system and it is essential for transcription initiation. After elongation, transcription then terminates after each gene because only monocistronic transcripts were revealed with Northern blot analyses of transcripts derived from ten pGKL-encoded ORFs (Romanos and Boyd 1988;Tommasino et al. 1988;Schaffrath et al. 1995b;Schaffrath et al. 1997;Jeske et al. 2006). This suggests the existence of a defined, yet unknown mechanism of transcription termination.
Unique RNAP subunits, and possibly also a putative helicase and the mRNA capping enzyme are the key elements of the plasmid cytoplasmic transcription machinery. Protein products of ORF6 (K2ORF6p; large subunit) and ORF7 (K2ORF7p; small subunit) of the plasmid pGKL2 should form a non-canonical RNAP. K2ORF6p was found to have a sequence similarity to three conserved regions of the two largest subunits (β and β ′ in bacteria) of canonical multisubunit RNAPs (Wilson and Meacock 1988). Sequence similarity of subunit, which are usually located at the C-terminus of β ′ (Schaffrath et al. 1995a).
The ORF4 sequence of the plasmid pGKL2 shows striking sequence similarity to viral helicases from the superfamily II of DEAD/H family helicases involved in transcription. The K2ORF4 protein product (K2ORF4p) displays similarity with two Vaccinia virus helicases -(i) NPH-I, which is encoded by the D11L gene, and (ii) the small subunit of the heterodimeric Vaccinia virus early transcription factor (VETF) encoded by the D6R gene (Wilson and Meacock 1988). NPH-I is known to provide the energy for elongation of transcription and for the release of RNA during transcription termination (Deng and Shuman 1998). VETF functions as a transcription initiation factor that binds and bends the promoter region of early genes (Broyles et al. 1991).
The protein product of ORF3 (K2ORF3p) encoded by the plasmid pGKL2 shows sequence similarity to the Vaccinia virus mRNA capping enzyme encoded by the D1R gene that consists of three domains responsible for the three enzymatic activities necessary to form the 5′ mRNA cap structure (Larsen et al. 1998). The methyltransferase activity of the D1 protein of the poxvirus Vaccinia is allosterically stimulated by heterodimerization with a smaller protein encoded by the D12L gene (Mao and Shuman 1994;Schwer et al. 2006). The complex of D1 and D12 proteins is sometimes also referred to as the vaccinia termination factor (VTF) because, together with NPH-I, it also acts as a transcription termination factor of early genes (Luo et al. 1995). Triphosphatase and guanylyltransferase activities of K2ORF3p have been already confirmed experimentally in vitro (Tiggemann et al. 2001).
As reported previously, the K2ORF3, K2ORF4, K2ORF6 and K2ORF7 genes are indispensable for the maintenance of the pGKL plasmids in the cell (Schaffrath et al. 1995a(Schaffrath et al. , 1995bSchaffrath et al. 1997;Tiggemann et al. 2001). However, understanding of interactions of their protein products with each other and with plasmid DNA in the cell is lacking, as well as understanding of linear plasmid DNA sequence elements required for transcription initiation and termination.
Here, we present a systematic in vivo study focusing on the architecture of the transcription complex of the yeast linear plasmids. Moreover, we identify a new promoter DNA element which is associated with 5′ mRNA polyadenylation of most pGKL-encoded genes and we uncover a link between RNA stem loop structures and 3′ end formation of plasmid-specific mRNAs in vivo. Further, we present an extensive phylogenetic analysis of amino acid sequences of linear plasmid RNAP subunits. Finally, we provide a detailed sequence analysis of pGKL promoters. Collectively, these analyses strongly suggest that the linear plasmid transcription machinery is closely related to the transcription machinery of poxviruses.

Plasmid RNAP subunits, mRNA capping enzyme, and helicase associate in vivo
To start characterizing the transcription machinery of the yeast linear plasmids we first tested whether the plasmid RNAP subunits (K2ORF6p, K2ORF7p), the mRNA capping enzyme (K2ORF3p), and the putative helicase (K2ORF4p) form a complex in vivo.
Initially, we tested interactions between K2ORF3p, K2ORF6p and K2ORF7p using a yeast two-hybrid system and its fluorescence variant called bimolecular fluorescence complementation but we failed to detect any interaction with either approach (data not shown).
This was most likely caused by the high AT content of linear plasmid genes that was shown recently to impair their nuclear expression due to RNA fragmentation mediated by the polyadenylation machinery (Kast et al. 2015). Therefore, we decided to prepare modified pGKL plasmids expressing the putative transcription machinery components containing various tags.
Next, we prepared a strain encoding yeast enhanced green fluorescent protein 3 (yEGFP3) (Cormack et al. 1997) that was fused to the N-terminus of the large RNAP subunit K2ORF6p (yEGFP3-K2ORF6p; strain IFO1267_pRKL2-4). We immunoprecipitated (IP) yEGFP3-K2ORF6p from this strain using GFP-Trap_A agarose beads that contain a monoclonal antibody against common GFP variants. As a control, we used the wt strain IFO1267 that had no modifications. Extensive washing was used to remove weakly bound proteins. The bound proteins were eluted, resolved on SDS-PAGE, and stained with Coomassie Brilliant Blue G-250 ( Figure 1). Gel lanes or bands of interest were excised from the gel, and analysed by mass spectrometry (MS). As shown in Table 1, peptides corresponding to yEGFP3-K2ORF6p, K2ORF7p (RNAP small subunit) and K2ORF3p (mRNA capping enzyme) were detected whereas no such peptides were identified in the parallel-treated IFO1267 control sample.
Then, to verify the interaction between the large RNAP subunit and the mRNA capping enzyme we decided to perform immunoprecipitation using tagged K2ORF3p as the bait.
Hence, we prepared a strain encoding yEGFP3 fused to the C-terminus of K2ORF3p (IFO1267_pRKL2-11 strain). Interestingly, selective cultivation of clones after transformation led to a loss of the pGKL1 plasmid (Supplementary Figure S1). Immunoprecipitation using GFP-Trap_A agarose beads and subsequent MS analysis revealed peptides corresponding to K2ORF6p,K2ORF7p,K2ORF4p,. For the fourth protein, the putative helicase K2ORF4p, the MS results suggesting it as part of the complex were not convincing due to low protein coverage (Supplementary Figure S2).To address whether it does, although perhaps weakly, interact with these proteins, we prepared a strain expressing HA-K2ORF4p together with yEGFP3-K2ORF6p (IFO1267_pRKL2-7), and a control strain expressing HA-K2ORF4p only (IFO1267_pRKL2-13). After IP and Western blotting we found HA-K2ORF4p to associate with yEGFP3-K2ORF6p ( Figure 2B). The results clearly showed that the putative helicase was specifically associated with the large RNAP subunit (or, was present in a complex containing this subunit) because yEGFP3-K2ORF6p and HA-K2ORF4p were not bound to the empty agarose beads, and HA-K2ORF4p alone did not bind to the GFP-Trap antibody ( Figure 2B).
Because the association of the putative helicase with the large RNAP subunit seemed rather weak, we decided to perform reciprocal immunoprecipitation. Further, we also tested whether the mRNA capping enzyme associated with the putative helicase. Strains expressing (i) yEGFP3-K2ORF4p together with HA-K2ORF6p (IFO1267_pRKL2-9), and (ii) yEGFP3-K2ORF4p together with K2ORF3p-HA (IFO1267_pRKL2-10) were prepared. With the first combination we confirmed that HA-K2ORF6p associated with yEGFP3-K2ORF4p ( Figure   2D). With the second combination we found that K2ORF3p-HA associated with yEGFP3-K2ORF4p ( Figure 2C). The detected interactions were specific because yEGFP3-K2ORF4p and HA-K2ORF6p were not bound to the empty agarose beads and K2ORF3p-HA alone did not bind to the GFP-Trap antibody ( Figures 2C and 2D).
As an additional control to demonstrate that the observed interactions were specific, we used another pGKL-encoded protein with a function unrelated to transcription. We selected K1ORF4p, a subunit of the toxin responsible for plasmid-associated killer yeast phenotype (Tokunaga et al. 1989). A strain co-expressing yEGFP3-K2ORF6p together with K1ORF4p-HA (IFO1267_pRKL1-4/2-4) was prepared. We found that K1ORF4p-HA was not associated with yEGFP3-K2ORF6p ( Figure 2E).
Finally, we wanted to know whether the association of the putative helicase and mRNA capping enzyme with the large RNAP subunit was dependent on nucleic acids. We prepared lysates from IFO1267_pRKL2-6 (yEGFP3-K2ORF6p, K2ORF3p-HA) and IFO1267_pRKL2-7 (yEGFP3-K2ORF6p, HA-K2ORF4p) strains. The lysates were incubated with GFP-Trap_A beads and, after washing, the beads were split into two parts which were treated or mock-treated with Benzonase Nuclease to digest DNA and RNA. Then, the beads were again extensively washed and the bound proteins were eluted. Subsequent Western blot analysis revealed both K2ORF3p-HA and HA-K2ORF4p to associate with yEGFP3-the IFO1267 control strain. First, we verified that the mouse monoclonal anti-HA HA-7 agarose efficiently immunoprecipitated HA-K2ORF4p ( Figure 3A). Then, we performed chromatin immunoprecipitation of HA-K2ORF4p from formaldehyde cross-linked cells using mouse monoclonal anti-HA HA-7 agarose. The immunoprecipitated and input DNAs were used as templates for subsequent PCR analysis using primers designed to detect chromosomal or linear plasmid DNA. We used primers specific for K. lactis actin (ACT) and high-affinity glucose transporter (HGT1) genes as markers of chromosomal DNA, and toxin immunity (K1ORF3) and mRNA capping enzyme (K2ORF3) genes as markers of pGKL plasmids. We found, that HA-K2ORF4p was specifically associated with pGKL plasmids and not with chromosomal DNA ( Figure 3B). These results were also confirmed by semiquantitative realtime PCR (data not shown).
We concluded that the previously uncharacterized helicase was associated with plasmidspecific DNA in vivo. Due to relatively weak interaction of the putative helicase with the core of plasmid-specific transcription complex in vivo, we assume that it possibly acts as a dissociable transcription factor.

Slippage of RNAP at the initiation site results in 5′ polyadenylation of plasmid mRNAs
Next, we wished to characterize transcription initiation of the linear plasmid genes. Our previous 5′ RACE-PCR experiments had revealed 5′ cap structures on the plasmid-specific mRNAs, likely synthetized by plasmid-encoded K2ORF3p mRNA capping enzyme, and also the presence of non-templated 5′ poly(A) leaders of heterogeneous lengths in mRNAs of 12 pGKL genes except for K2ORF2, K2ORF3 and K2ORF8 (manuscript submitted).
Interestingly, heterogeneous 5′ poly(A) leaders are a known feature of poxviral intermediate and late transcripts (Bertholet et al. 1987;Schwer et al. 1987 pGKL genes whose transcripts were 5′ polyadenylated ( Figure 4C). The first adenosine residue of the motif was considered to be the initiating nucleotide for the TSS annotation.
Subsequently, we tested whether the putative INR was responsible for the 5′ end polyadenylation of the pGKL-derived transcripts. We prepared three K. lactis strains with modified pGKL1 plasmids encoding the G418 resistance marker under the control of the K1UCR2 promoter. We prepared three variants of the K1UCR2 promoter that differed in the INR sequence: (i) TAAAA (wt; strain IFO1267_pRKL1-1); (ii) TAACA (strain IFO1267_pRKL1-2); and (iii) TACCA (strain IFO1267_pRKL1-3). Then, we purified total RNA from the three strains, prepared cDNA, and performed 5′ RACE-PCR to determine the 5′ end sequences. The results showed that the 5′ poly(A) leader was present when the K1UCR2 sequence contained the putative wt INR (TAAAA INR) ( Figure 4D) When TAACA was used, the length of the 5′ end poly(A) was significantly reduced ( Figure 4E). When TACCA was used, the poly(A) leader disappeared altogether ( Figure 4F).
Therefore, we concluded that slippage of plasmid RNAP at the initiation site was the mechanism responsible for 5′ polyadenylation of the transcripts. Moreover, the identified INR sequence constituted an independent DNA element, not influenced by the sequence of the gene because the pattern of the sequenced 5′ RACE-PCR clones for K1ORF2 transcripts was the same as for G418 R transcripts produced from the K1UCR2 with the wt INR (manuscript submitted).

RNA stem loop structures influence 3′ end formation of plasmid-specific mRNAs in vivo
Our previous 3′ RACE-PCR experiments had revealed the absence of 3′ poly(A) tails in mRNAs of all 15 pGKL ORFs (manuscript submitted). To shed light on the mode of transcription termination of the linear plasmids, we tried to identify sequence/secondary structure elements/signals near the 3′ termini. First, we searched for sequence motifs within the last 150 nt of each transcript that would be shared among the 15 pGKL ORFs but we detected none. Second, we searched for secondary structure motifs using the RNAstructure Server (Reuter and Mathews 2010). We identified putative RNA stem loop structures close to the experimentally determined 3′ ends of cDNA (Supplementary Figure S4). The putative RNA stem loops were typically in the vicinity of the respective ORF's stop codon with the median distance of 26 nt, and Gibbs free energy of −7.5 kcal/mol.
Hence, we tested, whether the predicted RNA stem loop structures influenced the 3′ mRNA end formation. Because pGKL plasmids contain almost no intergenic regions, the putative RNA stem loops are localized in the coding sequences of adjacent ORFs or within the terminal inverted repeats. This means that their sequences cannot be subjected to mutagenesis without the possibility of altering plasmid functions. Therefore, we prepared a K.
lactis strain with a modified pGKL1 plasmid encoding the G418 resistance marker under control of K1UCR2, followed by the 3′ UTR of the K2ORF5 gene (strain IFO1267_pRKL1-5; Figure 5A). The distal part of the K2ORF5 3′ UTR contained two putative partially overlapping RNA stem loops termed Stem loop 1 and 2 (Supplementary Figure S4I) We concluded that RNA stem loop structures were essential for the 3′ end formation of plasmid-specific mRNAs in vivo, presumably acting as factor-independent intrinsic terminators. Moreover, this termination was independent of the promoter and the gene used both with respect to its sequence and length.

Plasmid-specific RNA polymerase has unique architecture
To facilitate interpretation of the experimental data we created a 3D model of the pGKLspecific linear plasmid RNAP. This was feasible due to sequence similarity between parts of K2ORF6p, K2ORF7p and conserved regions of the canonical multisubunit RNAPs (Schaffrath et al. 1997;Ruprich-Robert and Thuriaux 2010 Figure S6).
The overall distribution of the conserved regions within K2ORF6p and K2ORF7p is depicted in Figure 6A. It should be noted that K2ORF6p displayed a unique fusion between β and β ′ subunit conserved regions, which is not known to be present in any other canonical or a14 regions), are missing in plasmid RNAP Figure   6D.
We concluded that the linear plasmid RNAP displayed a unique and novel architecture. We performed a detailed phylogenetic analysis to delve deep into the evolutionary past of yeast linear plasmids. We used sequences of Taken together, our phylogenetic analysis surprisingly points to a viral origin of plasmid RNAPs close to poxviruses, which is in contradiction to all previous hypotheses about the origin of these enzymes.

Plasmid promoters have a viral origin
Although the UCS (5′-ATNTGA-3′) essential for linear plasmid transcription was identified a long time ago, no similarities with known promoters that would indicate its origin were reported. We extended the UCSs preceding all pGKL-encoded ORFs both upstream and downstream by ~10 bp, and we created a consensus motif. This consensus was then used to search for similar elements that were associated with transcription by multisubunit RNAPs.
We particularly focused on promoters of viral RNAPs because our phylogenetic analysis of plasmid RNAPs had suggested a viral origin.
Notably, we detected great sequence similarity between the extended UCS ( Figure 8C) and the upstream control element (UCE), which is a promoter element of poxviral early genes ( Figure 8A). The UCE motif is a 15-nt long AT-rich element with any nucleotide at the 5 th position followed by TGA (Yang et al. 2011). This perfectly matched the extended UCS motif.
The median distances from the 3′ ends of the UCEs to the annotated TSSs of 84 Vaccinia virus ORFs displayed a median distance of 12 nt ( Figure 8B) (Yang et al. 2011 Figure   8D had a median distance of 11 nt. Moreover, we found an adenosine residue to be the TSS nucleotide in all pGKL-encoded ORFs (Supplementary Table S4), similar to the TSSs of the poxviral early genes where purines are the dominant TSS bases (Yang et al. 2011).
To conclude, it appears that promoters of poxviral early genes and linear plasmid genes are similar both with respect to their sequence and their spacing to the TSSs, implying a common origin.

DISCUSSION
In this study we characterized the considerably underexplored transcription machinery of the yeast cytoplasmic linear double-stranded DNA plasmids. We used both experimental and bioinformatic approaches, and determined the composition and interactions of the transcription complex and presented a 3D model of its two main subunits. Further, we defined DNA sequences required for initiation and termination. For a model of the key aspects of transcription of the linear plasmid see Figure 9. Finally, our analyses provided evidence strongly suggesting that poxviruses and the yeast cytoplasmic DNA plasmids have a common origin.

Composition of the linear plasmid transcription machinery
Biochemical characterization of proteins encoded by yeast linear plasmids was shown to be challenging in the past. Expression of genes located on the pGKL plasmids seemed to be rather weak (Schründer and Meinhardt 1995;Schickel et al. 1996;Schründer et al. 1996).
Also, it has been shown that expression of the K2ORF3p mRNA capping enzyme in routinely used E. coli systems was not possible, most likely due to the different codon usage dictated by the high AT content of plasmid genes (Tiggemann et al. 2001 remarkably if not entirely self-sufficient, because we did not find any cellular proteins to be specifically associated with the large plasmid RNAP subunit using mass spectrometry analysis.

Transcription initiation
Our previous 5′ RACE-PCR experiments revealed short poly(A) leaders at the 5′ mRNA ends of most pGKL-encoded genes (manuscript submitted). These 5′ poly ( (Bertholet et al. 1987;Schwer et al. 1987;Schwer and Stunnenberg 1988;Davison and Moss 1989). Some promoters of Vaccinia virus early genes containing the INR element can also produce mRNAs with short 5′ poly(A) leaders (Ahn et al. 1990;Yang et al. 2011). It has been shown that 5′ untranslated regions composed of 5′ poly(A) leader sequences prior to start codon have a regulatory role in translation initiation (Shirokikh and Spirin 2008;Xia et al. 2011). Using bioinformatics, we detected a putative INR element in promoters of plasmid genes whose transcripts were 5′ polyadenylated. Using

Transcription termination
We mapped the 3′ mRNA ends of all linear plasmid genes using 3′ RACE-PCR experiments.
We identified 1-4 putative RNA stem loops close to the 3′ mRNA terminus of each ORF.
Although the putative stem loops displayed relatively high values of Gibbs free energy (median of −7.5 kcal/mol), these values were comparable to the genome-wide predicted intrinsic terminators in Mycoplasma hyopneumoniae (median of −8.0 kcal/mol), an organism with a similarly high AT content (Fritsch et al. 2015). Further, RNA stem loops of bacterial intrinsic transcription terminators are usually followed by the typical 7-8 nt U-tract that promotes RNAP pausing at weak dA-rU DNA-RNA hybrid (Martin and Tinoco 1980;d'Aubenton Carafa et al. 1990;Gusarov and Nudler 1999). Interestingly, we detected T nucleotide enrichment in terminal 8 nt of plasmid-specific 3′ cDNA ends that corresponds to putative U-tract (Supplementary Figure S8). Importantly, we revealed a direct link between the putative RNA stem loop structure and the transcription termination pattern in vivo, suggesting an intrinsic transcription termination model for the yeast linear plasmids, similar to that in bacteria (reviewed in (Ray-Soni et al . 2016)). Even though we did not analyse the termination efficiency, we assume that transcription reads through at least some of the putative terminator sequences. Otherwise, functional expression of approximately half of the ORFs would not be possible due to the compact genomic organization inherent to pGKL plasmids. Future experiments will be required to understand this mechanism in more detail.
Even though Vaccinia virus RNAP, presumably related to plasmid-specific RNAP, terminates transcription of early genes in a factor-dependent manner it was recently shown that RNA stem loops can influence both efficiency and location of transcription termination in vitro (Tate and Gollnick 2015). Therefore, proposed substantial reduction of plasmid-specific RNAP ancestor might have contributed to adaptation of this enzyme to transcription termination induced by RNA stem loops and possible loss of auxiliary factors required for this process.

In silico 3D model of linear plasmid RNAP
Bioinformatic analysis of plasmid-specific RNAP proved to be challenging due to its unique reduced architecture and great evolutionary distance from other multisubunit RNAPs. From the 3D model it is evident that plasmid RNAP significantly differs from canonical RNAPs in several aspects: (i) Almost the entire clamp structure element is absent. Only a basal portion of the clamp formed by β a15, β a16 and β ′ a20 conserved regions is maintained. The clamp is a mobile RNAP element and its closure is important for high stability and processivity of the enzyme.
The clamp conformation is regulated by interaction of universally conserved elongation factors NusG and Spt4/5 with the clamp coiled-coil motif (Hirtreiter et al. 2010) -an element likely missing in plasmid RNAP. Therefore, it is highly unlikely that plasmid transcription machinery could use cellular Spt4/5 to increase processivity.
(ii) The lid and rudder elements are likely missing. The lid acts as a wedge to facilitate dislocation of RNA from the DNA-RNA hybrid molecule, and thereby maintains a constant size of the DNA-RNA hybrid between 7 to 10 base pairs (Vassylyev et al. 2007 templates. The lid was also suggested to participate in bacterial intrinsic termination using stem loops (Vassylyev et al. 2007). However, bacterial RNAP without the lid was capable of intrinsic termination in vitro (Toulokhonov and Landick 2006). Therefore, we hypothesise that this structural feature is not crucial for intrinsic termination by plasmid RNAP in vivo.
The rudder element interacts with the upstream edge of the DNA-RNA hybrid (Vassylyev et al. 2007). Experiments using bacterial RNAP with a deleted rudder reported defects in transcription initiation and less stable elongation complexes (Kuznedelov et al. 2002). This may correlate with the linear plasmid-specific termination of transcription.
(iii) The secondary-channel rim helices are missing. These helices are the binding sites for some transcription factors of multisubunit RNAPs, such as the transcription elongation factor TFIIS (Kettenberger et al. 2004). Therefore, it is highly unlikely that plasmid transcription machinery could use cellular TFIIS to overcome pause sites and increase proofreading.

The evolutionary origin of the linear plasmids transcription machinery
A viral origin of the linear plasmid genes encoding the mRNA capping enzyme and the putative helicase was suggested previously (Jeske et al. 2007). However, the same origin for RNAP genes was not expected because previous hypotheses proposed that those genes originated from ancestral yeast RNAP genes (Jeske et al. 2007;Ruprich-Robert and Thuriaux 2010) or that they were ancient representatives of multisubunit RNAP diversification due to their simplified architecture (Iyer and Aravind 2012). However, no phylogenetic analysis was conducted to support the aforementioned hypotheses. Our results indicate that all the plasmidspecific transcription machinery components of the yeast linear plasmids have the same origin close to poxviruses. Poxviral RNAP, such as that of Vaccinia virus, lacks obvious α subunit homologs (Knutson and Broyles 2008), similarly to linear plasmid RNAP, and is more simplified than eukaryotic RNAPs. Therefore, a reduction of poxviral RNAP instead of yeast RNAP to give rise to the plasmid RNAP seems more plausible.
Our promoter analysis suggests that not just the plasmid RNAP, but also the plasmid promoters are related to nucleo-cytoplasmic viruses. We noticed sequence similarity between the UCE motif of Vaccinia virus early genes and the extended UCS motif of pGKL plasmids, as well as their similar location prior to the TSSs. Invariant G residue and several A residues in the minor groove were proposed to be the UCE nucleotides contacted by Vaccinia virus VETF helicase (Broyles et al. 1991). Presence of the invariant G residue and AT residues at other positions of the extended UCS motif suggests that K2ORF4p might contact UCS of the pGKL plasmids.
In conclusion, the transcription apparatus of the yeast linear plasmids has most likely an origin close to poxviruses and uses transcription initiation mechanisms similar to those used by poxviral genes. Unlike poxviruses, however, linear plasmids are beneficial for the cell, and this exemplifies the ability of the cell to domesticate originally harmful elements.

Strains, plasmids and growth conditions
All of the strains used in this study are listed in Table 2. Escherichia coli cells were grown at 37°C in 2xTY medium which was supplemented with kanamycin (50 µg/ml) or ampicilin (100 µg/ml) for selection of transformants. Transformations of E. coli cells were performed by electroporation using Gene Pulser Xcell TM (BIO-RAD). K. lactis cells were grown at 28°C in YPD medium which was supplemented with G418 (250 µg/ml) and/or hygromycin B (200 µg/ml) for selection of transformants. Transformations of K. lactis cells were performed using the one-step LiCl method (Gietz and Woods 2002) and followed by five-hour incubation in non-selective conditions immediately after transformation. For detailed descriptions of plasmids used in this study see Supplementary Table S1. Constructed pGKL plasmids were verified by PCR and subsequent sequencing of amplified products.
The nucleotide sequences of the primers used for construction, verification and sequencing of recombinant pGKL plasmids, and RACE-PCR amplification are listed in Supplementary   Table S2. All polymerase chain reactions (PCRs) were performed using Taq DNA polymerase (Roche). PCRs for construction of recombinant pGKL plasmids were performed using mixture of Taq DNA polymerase (Roche) and Pwo DNA polymerase (Roche) in a 99:1 volume ratio, respectively.

Modification of pGKL plasmids using homology recombination in vivo
K. lactis IFO1267 strain was transformed with PCR-generated fragment consisting of 5′ and 3′ ends homologous to the part of the pGKL plasmid to be modified and non-homologous part that introduced purification and/or detection tag (yEGFP3, HA-tag, Flag-tag) into plasmidspecific ORF together with a gene encoding resistance marker (G418 or hygromycin B) whose expression is driven by pGKL1-derived upstream control region (UCR, the sequence extending from the AUG initiation codon up to and including the UCS of the selected ORF).
This type of construct was prepared by PCR or fusion PCR methodology (for details see Supplementary Table S3).
After PCR amplification and gel electrophoresis, corresponding fragments were purified  Eluted immunocomplexes and 50 µl of the input clarified lysates were mixed with 400 µl of TBS (50 mM Tris-HCl, pH 7.5, 150 mM NaCl) supplemented with 5 µl of proteinase K (20 mg/ml; Sigma Aldrich), and the cross-linking was reversed by incubation for 5 hr at 65°C.

Co-immunoprecipitation and mass spectrometry
The immunoprecipitated and input DNA was isolated by phenol-chloroform extraction followed by ethanol precipitation supplemented with 1 µl of linear polyacrylamide (25 mg/ml; Sigma Aldrich), and then used for PCR amplification for 25 cycles followed by electrophoresis. PCR amplifications were carried out on 1/30 of the chromatin immunoprecipitation (ChIP) and 1/1200 of the chromatin before immunoprecipitation (Input) using primers listed in Supplementary Table S2.
The membranes were blocked in 5% non-fat dry milk (Hero) in a TBS-Tween buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl and 0.5% Tween-20) and incubated with a primary antibody overnight at 4°C. After washing in TBS-Tween buffer and blocking with 5% non-fat milk, the membranes were incubated with a goat anti-mouse HRP-conjugated antibody (1:5,000 dilution; Santa Cruz Biotechnology). Finally, after washing in TBS-Tween buffer, the membranes were immersed in a luminol detection solution and the signal was detected using ImageQuant TM LAS 4000 (GE Healthcare). To confirm the expression of the target protein and the successful immunoprecipitation, a mouse monoclonal anti-Flag M2 antibody (1:1,000; Sigma Aldrich), mouse monoclonal anti-HA 6E2 antibody (1:1,000; Cell Signaling), and mouse monoclonal anti-GFP B-2 antibody (1:1,000; Santa Cruz Biotechnology) was used.

RNA isolation, electrophoresis, reverse transcription, 5′ and 3′ RACE-PCR
25 ml of the yeast cells from the exponential growth phase (OD 600 = 0.5-1) were quickly pelleted and frozen. Total yeast RNA was isolated by the hot acidic phenol procedure followed by ethanol precipitation (Lin et al. 1996). Remaining DNA was removed by DNAfree TM Kit (Ambion). The quality of RNA was assessed by electrophoresis and UV spectrophotometry (Mašek et al. 2005).
In the case of 5′ RACE, subsequent reverse transcription was carried out as follows: 1 μ g of total yeast RNA and 0.15 μ g of random hexamer primers (Invitrogen) were used for cDNA synthesis using 100U of SuperScript ® III Reverse Transcriptase (Invitrogen). The cDNA was purified using High Pure PCR Product Purification Kit (Roche) and used for cDNA tailing using 800U of rTdT (Fermentas) and 0.5 mM dGTP (Roche) in 50 µl reaction for 30 min at 37°C with subsequent heat inactivation of rTdT for 10 min at 70°C. For PCR amplification of cDNA ends, 2.5 μ l of the reaction mixture was used with olig2(dC)anchor primer and appropriate gene-specific primer for 35 cycles.
In the case of 3′ RACE, 1 μ g of total yeast RNA was polycytidinylated using Poly(A) Tailing Kit (Applied Biosystems) and 2 mM CTP (Thermo Scientific) for 90 min at 37°C.          This RNA can be subsequently 5′ capped by the K2ORF3p viral-like mRNA capping enzyme (ORF3p, orange). Transcription termination most likely proceeds in a factor-independent manner that involves intrinsic terminators consisting of RNA stem loop structure(s) and 3′ terminal U-tract.  Detailed description of all plasmids used in this study is listed in Supplementary Table S1. UCR -sequence located between AUG initiation codon and UCS (including) of the selected ORF * -UCR sequence bearing one point mutation in putative initiator region (INR) ** -UCR sequence bearing two point mutations in putative initiator region (INR) ° -3′ UTR of K2ORF5 gene bearing mutations in putative Stem loop 2 °° -3′ UTR of K2ORF5 gene bearing rescue mutations in putative Stem loop 2