TReSR: A PCR-compatible DNA sequence design method for engineering proteins containing tandem repeats

Protein tandem repeats (TRs) are motifs comprised of near-identical contiguous sequence duplications. They are found in approximately 14% of all proteins and are implicated in diverse biological functions facilitating both structured and disordered protein-protein and protein-DNA interactions. These functionalities make protein TR domains an attractive component for the modular design of protein constructs. However, the repetitive nature of DNA sequences encoding TR motifs complicates their synthesis and mutagenesis by traditional molecular biology workflows commonly employed by protein engineers and synthetic biologists. To address this challenge, we developed a computational protocol to significantly reduce the complementarity of DNA sequences encoding TRs called TReSR (for Tandem Repeat DNA Sequence Redesign). The utility of TReSR was demonstrated by constructing a novel constitutive repressor synthesized by duplicating the LacI DNA binding domain into a single-chain TR construct by assembly PCR. Repressor function was evaluated by expression of a fluorescent reporter delivered on a single plasmid encoding a three-component genetic circuit. The successful application of TReSR to construct a novel TR-containing repressor with a DNA sequence that is amenable to PCR-based construction and manipulation will enable the incorporation of a wide range of TR-containing proteins for protein engineering and synthetic biology applications.


Section 1. TReSR Computer Code and Documentation
TReSR Instructions. The TReSR package included with this manuscript contains the executable script TReSR.py, input files that were used to make the scDBD sequence shown in Fig 2,and output files generated by the calculation. All scripts were written for and executed using the Python 3.10 programming language (https://www.python.org/downloads/). The TReSR algorithm was executed in a series of five steps as indicted in the comments included with the TReSR.py script (Script 1). Comments (red), built-in Python functions (purple), functions coded for TReSR (blue), characters and strings (green), and built-in expressions (orange) are annotated by colour for ease of interpretation. To generate a redesigned TRencoding DNA sequence for a different target protein, the TReSR.py script must be modified as specified below, and the protocol executed according these instructions: Step 1. Specify the directory location (Line 6) and encode the single-letter amino acid sequence for your TR protein (1 copy) separated into segments 5 -7 residues in length (Line 9). Ensure that each segment has a unique identification character or string (e.g., 'A', 'B', 'C',… ) specified in Line 8. The comma-delimited file 'segments.csv' will be created listing the set of all DNA sequence codon combinations encoding the amino acid sequence for each segment when TReSR is executed (Line 15).
Step 2.1. For each sequence in 'segments.csv', calculate melting temperature (Tm) for thermodynamic parameters using the UNAFold webserver (http://www.unafold.org/Dinamelt/applications/two-statemelting-hybridization.php) two-state melting hybridization application for: homodimerization of forward sequence (TFF), homodimerization of the reverse complement sequence (TRR), hybridization of the forward and reverse complement sequences (TFR), and if needed, hybridization of the forward sequence with the reverse complement belonging to the WT sequence (TWT). If consideration of the wild-type sequence is to be excluded from thermodynamic parameter analysis, then enter a TWT value of 0.0 for all sequence entries. Add Tm values to the file 'segments.csv' and save as 'evaluated_segments.csv' ensuring that the top row includes the labels defining each thermodynamic parameter: T(FF), T(RR), T(FR), and T(WT).
Step 2.2. Reinitialize execution of TReSR to implement the percentile-based (Line 26) pruning of segment sequences and write the retained sequences to the file 'filtered_segments.csv'. The fraction of filtered sequences can be altered by adjusting the percentile rank passed to the function ComputePercentile (Line 26) or the thermodynamic parameter thresholds passed directly to the function FilterSegment (Line 31).
Step 3. Conduct a pair-wise comparison of sequence identity for each segment's set of filtered sequences and write the results to the file 'paired_segments.csv'. The results of this pair-wise comparison are then read back into TReSR to perform a grouping analysis assigning a group identification index number for each sequence (in square parentheses following the segment identifier assigned in Step 1) whose results are then written to the file 'grouped_segments.csv'. Grouping analysis identifies segment-specific sets of similar (same group index number) and dissimilar (different group index number) sequences by comparing the identity between sequence pairs in addition to the cosine similarity of their identities with respect to every other sequence in the segment. The degree of cluster aggregation can be adjusted by altering the parameter thresholds set in the GroupAnalysis function (Line 57).
Step 4.1. To reduce the size of the computational problem, the total number of filtered and grouped sequences listed in 'grouped_segments.csv' can be reduced by pruning the number of sequence groupings considered by TReSR (performed external to and independent of Script 1). For the purposes of this manuscript, TReSR was restricted to sequences belonging to four randomly selected groups for each segment, written to the file 'select_grouped_segments.csv'.
Step 4.2. Modify the list of paired adjacent segments in Line 71 in TReSR.py so that it specifies adjacent paris of segments for your sequence following the format provided in the script (originally applied to the DNA sequence encoding the 89-residue TR gene from the 11 segments that had been defined in Step 1). Resume execution of TReSR to create a list of adjacent segment sequence pairs from the set of select grouped and filtered sequences compiled in the file 'select_grouped_segments.csv'. The list of adjacent segment sequence pairs is written to the file 'select_adjacent_segments.csv'.
Step 4.3. For each sequence listed in 'select_adjacent_segments.csv' calculate the Tm values for TFF, TRR, and TFR thermodynamic parameters using the UNAFold webserver as was done previously in Step 2.1. Add these Tm values to the file 'select_adjacent_segments.csv' along with their respective labels (e.g., 'T(FF)', 'T(RR)', and 'T(FR)') to the top row and save the file as 'evaluated_adjacent_grouped_segments.csv'.
Step 4.4. Reinitialize execution of TReSR to implement the pruning of the adjacent segment sequence pairs listed in the file 'evaluated_adjacent_grouped_segments.csv' using percentile-based thresholds for TFF and TRR (Line 86) and a fixed Tm value of 80.0 °C for TFR (Line 92) thermodynamic parameters. The fraction of filtered sequences can be altered by adjusting the percentile rank passed to the function ComputePercentile (Line 86) or the thermodynamic parameter thresholds passed directly to the function FilterSegment (Line 91). The list of filtered adjacent segment sequence pairs is then written to the file 'filtered_adjacent_grouped_segments.csv'.
Step 5.1. Generate a set of segment sequences suitable for encoding the TR gene by a depth-first-search (Line 110) using the path specified on Line 103 describing the order of corresponding segments and groups used to reassemble the full-length DNA sequence by specifying the set of adjacent segments pairs using the segment designation and group assignment (i.e., ['A [1]', 'B [2]'] to employ codon combinations belonging to segment A, group 1 and segment B, group 2) as a nest list. The list of segment paths solved by depthfirst-search is then written to the file 'segment_paths.csv'. The DepthFirstSearch function includes settings (Line 111) specifying the target number of paths to assemble (goalCount) as well as the permitted number of unsuccessful attempts (failCount) which can be adjusted according to the difficulty of the computational problem.
Step 5.2. The TReSR algorithm is terminated with a final step compiling the set of segmented paths specified in 'segment_paths.csv' into assembled DNA sequences encoding the TR gene which are then written to the file 'sequence_paths.csv'. The sequence output is organized according to the cassette (set of tandem repeat duplications specified as a dictionary, Line 120) and fragment (assembled set of contiguous segments specified as a dictionary, Line 122) formatting passed to the function CompilePaths (Line 130). This formatting assigns a tandem repeat identifier (e.g., 'N' or 'C' representing N-terminal and C-terminal domain duplications comprised of fragments 2 and 4, and 5 and 7, respectively) to the cassette dictionary (Line 120). The formatting also requires that the list of segments belonging to each fragment be specified in the fragments dictionary (Line 122) using the fragment nomenclature employed in the cassette dictionary (Line 120). cols= [4,5,6,7] idxT_WT=7, delim=',') 39 WriteSegmentCodons(directory + 'filtered_segments.csv ',filterSequences,\ 40 'Segment(Codons),FWD,REV,FRQ,T(FF),T(RR),T(FR),T(WT)') 41 # Step 3| Generate all DNA sequence codon combination pairs for each protein 42 # Step 3| segment and evaluate their percent sequence identity, then assign 43 # Step 3| group identities according to their comparative percent identities 44 sequences = ReadSegments(directory + 'filtered_segments.csv ', delim='(',\ 45 name=  (directory + 'paired_segments.csv', pairs,\ 52 'iSegment,jSegment,iFWD,iREV,jFWD,jREV,BP(n/N)') 53 pairs = ReadPairs(directory + 'paired_segments.csv ',iName='iSegment',\ 54 jName='jSegment', BP='BP(n/N) ' TReSR Technical Description. The TReSR algorithm is executed from a python script (Script 1) over a series of five steps. The TReSR algorithm employs dictionary objects and a functional programming architecture to read, execute, and save data producing comma delimited file format outputs (i.e., .csv files). The TReSR algorithm employs 18 functions whose programming and technical descriptions are included and described in the section (TReSR Functions) following this technical description of the TReSR script. The first step (Lines 6 through 16) involves the generation of all codon combinations (sequences, executed by calling the function CreateCodonCombinations, Line 13) encoding a list of amino acids (protein, Line 9) belonging to a series of protein segments (segments, Line 8). These codon combinations are built using definitions provided via an input file (codons, the space delimited input file 'codons.txt' is read into a dictionary using the function CodonTable, Line 7). Each codon combination is assigned a designation which includes their segment name followed by an identification code (an integer, 1 ≤ index ≤ 6) identifying the index of the selected silent mutations from codons enclosed in curved parentheses. For example, the designation A (1.2.3.4.1.6) indicates that the codon combination for segment A, encoding the protein sequence MKPVTL, is constructed from the 1 st , 2 nd , 3 rd , 4 th , 1 st , and 6 th codons listed in the codons dictionary. The codon combinations are then written to the file 'segments.csv' via the function WriteSegmentCodons (Line 15). Initiation of the second step (Lines 20 through 40) first requires that the external-to-program calculation of thermodynamic parameters be conducted providing melting temperature (Tm) values for homodimerization and hybridization of DNA sequences listed in 'segments.csv', including: 1. forward and forward (TFF), 2. reverse complement and reverse complement (TRR), 3. forward and reverse complement (TFR), and 4. forward and wild-type reverse complement (TWT). For the purposes of this study, Tm values were calculated using the DINAMelt two-state melting hybridization application made available through the UNAFold Web Server. These thermodynamic parameters are incorporated into the 'segments.csv' file, amending the first row to include the identity of each parameter: T(FF), T(RR), T(FR), and T(WT), and saved as 'evaluated_segments.csv'. This file is then read back into TReSR (sequences, executed by calling the function ReadSegments, Line 20). A set of thresholds for each thermodynamic parameter are solved by computing percentiles (TFF = 0.5, TRR = 0.5, TFR = 0.1, and TWT = 0.5) for each segment (parameters, using the function ComputePercentile, Line 26). These thermodynamic parameter thresholds are then used to prune codon combination sequences (filterSequences, using the function FilterSegment, Line 31) which are then written to the file 'filtered_segments.csv' using the function WriteSegmentCodons (Line 39). The third step of the TReSR algorithm (Lines 44 through 63) generates pairs of codon combinations (pairs, by calling the function GenerateSegmentsPairs, Line 50) restricted to filtered sequences belonging to the same segment listed in and read into TReSR (sequences, using the function call ReadSegments, Line 44) from the file 'filtered_segments.csv'. The function GenerateSegmentsPairs (Line 50) performs a comparison of sequence identities between the codon combination pairs (reported as a fraction) and the results are written to the file 'paired_segments.csv' using the function WriteSegmentCodons (Line 51). This output is then read back into the program (pairs, using the function call ReadPairs, Line 53) for grouping analysis. Grouping analysis utilizes the percent sequence identities to define the edges of a network graph consisting of vertices representing individual codon combinations. This graph is used to group codon combinations belonging to the same segment according to their pair-wise similarity (groups, using the function call GroupAnalysis, Line 57). Codon combinations are considered to be similar (i.e., belong to the same group) if: 1. they share at least 0.80 sequence identity, and 2. they share a highly similar profile of percent sequence identities with the remaining codon combinations in the segment, as determined by cosine similarity (threshold set to 0.9975, function ComputeCosine is called within the function GroupAnalysis). Codon combination designations are then amended to include their group assignment indicated as an integer enclosed in square parentheses. For example, the codon combination A [2] (1.2.3.4.1.6) belongs to the 2 nd grouping of codon combinations for segment A. This grouping analysis is then written to the file 'grouped_segments.csv' executed by calling the function WriteSegmentCodons (Line 62). The incorporation of group assignment in this manner allows for the rapid recall and identification of dissimilar codon combinations which encode for the same amino acid sequence. The fourth step of the TReSR algorithm is conducted in four parts (Lines 67 through 76 and Lines 80 through 96, for Steps 4.2 and 4.4, respectively) starting with Step 4.1 involving the reduction of the total sequence space considered in the calculation by randomly selecting sequences from four groups from each segment, written to the file 'select_grouped_segments.csv'. Step 4.2 reads this reduced sequence space into TReSR (sequences, using the function call ReadSegments, Line 67) to compile a list of sequences assembled from pairs of adjacent segments (adjacents, using the function GenerateAdjacents, Line 73) for a set of user specified segment pairs (adjacentMap, Line 71) which defines the order and assembly of the redesigned TR DNA sequence. This set of adjacent sequences are then written to the file 'select_adjacent_segments.csv' using the function WriteSegmentCodons (Line 75). Step 4.3 involves the external-to-program evaluation of thermodynamic parameters (TFF, TRR, and TFR) for sequences listed in 'select_adjacent_segments.csv' using the UNAFold Web Server (as was performed in Step 2). The list of adjacent codon combinations and their thermodynamic parameters is then saved to the file 'evaluated_adjacent_grouped_segments.csv' and read back into the program (sequences, executed by calling the function ReadAdjacents, Line 80) to initiate Step 4.4. This list is then pruned (filtered, implemented by calling the function FilterSegment, Line 91) employing thermodynamic parameter thresholds computed from percentiles for each adjacent segment pair (parameters, percentiles set at TFF = 0.2 and TRR = 0.2 using the function call ComputePercentiles, Line 86) or set at a fixed value (parameters, TFR = 80.0 °C). This pruned list of grouped sequences from adjacent segment pairs is then written to the file 'filtered_adjacent_grouped_segments.csv' using the function WriteSegmentCodons (Line 95). The fifth and final step of the TReSR algorithm (Lines 99 through 131) solves for adjacent segment sequence paths that assemble the full-length DNA sequence encoding the TR gene using the set of pruned and grouped adjacent sequences pairs listed in the file 'filtered_adjacent_grouped_segments.csv' (sequences, read into TReSR by calling the function ReadAdjacents, Line 99). Solution of adjacent segment sequence paths is guided by a user-specified list of segments that includes specific and unique grouping designations (groupPath, Line 103). This path search is conducted using a depth-first-search algorithm (paths, executed by calling the function DepthFirstSearch, Lines 110) to generate a list of segments representing DNA sequences that encode for duplicated amino acid sequences. The result is a list of paths assembled from dissimilar codon combinations (i.e., sequences belonging to unique segment groupings) that are then written to the output file 'segment_paths.csv' using the function WritePaths (Line 113). The list of segment paths is then read back into TReSR using the function ReadPaths (Line 115) and transcribed into DNA sequences (sequences, executed by calling the function CompilePaths, Line 130) serving as candidate DNA templates for construction of the TR gene. DNA sequences are then written to the file 'sequence_paths.csv' using the function WriteSequences (Line 131) having output organized as a list of fragments (e.g., 'Fragment. 2','Fragment.4','Fragment.5',and 'Fragment.7') conforming to user specified definitions describing the identity (e.g., 'N' or 'C' for N-terminal or C-terminal domain, respectively) of duplicated cassettes (cassettes, Line 120) matching the names of their constituent segments (fragments, Line 122) as written in the file 'segments_paths.csv'. TReSR Calculation Considerations. The complexity of design problems solved using the TReSR protocol is determined by multiple factors, including: the number of amino acid residues encoded for each protein segment, the number of codon combination pairs to be evaluated, the number of adjacent segments used to assemble the gene sequence, and the efficiency of thresholds employed to eliminate non-productive codon combinations and adjacent segment sequence pairs. While it may initially appear advantageous to design TR DNA sequences by simply considering segments encompassing full-length domain sequences, the number of codon combinations constituting increasingly larger amino acid sequences, calculated as the product for the number of all degenerate codons encoding each residue in the segment, is computationally prohibitive (e.g., there are 1,024 and 1,048,576 unique DNA sequence combinations built from the four degenerate codons encoding poly-alanine segments 5 and 10 residues in length, respectively). The size of this problem is further compounded by the task of evaluating pairs of codon combinations prior to grouping analysis, with the number of compared pairs computed as: n × (n -1) ÷ 2, where n is the number of codon combinations (e.g., there are 523,776 and 549,755,289,600 paired sequences generated from unique codon combinations encoding poly-alanine segments 5 and 10 residues in length, respectively). To simplify the scope of the design problem, thresholds for calculated thermodynamic parameters (TFF, TRR, TFR, and TWT) were applied to filtered codon combinations based on their suitability for PCR. Importantly, while application of stringent thresholds will reduce the computational bottleneck associated with evaluating and grouping pairs of codon combinations, the application of severely selective thresholds may reduce the diversity of codon combinations impacting the feasibility of identifying unique DNA sequence groupings. To circumvent this difficulty, we chose to implement percentile-based filtering thresholds which ensured selection of PCR compatible primers, while simultaneously restricting the adjacent segment path search space by randomly selecting four groups of unique codon combinations. Because it is difficult to anticipate the parameters and scope of the codon design space for future applications, the computer code and documentation for the TReSR protocol has been included allowing users to independently select which parameters and protocols best suit their calculation needs.
TReSR Functions. This section provides computer code and technical descriptions for the 18 programmed functions employed by the executable script TReSR.py. Function names (blue), input arguments and keywords (red), output variables (red), built-in Python functions (purple), characters and strings (green), and built-in expressions (orange) are annotated by colour for ease of interpretation.

Function 1. Technical description:
The function CodonTable reads a user-generated input text file (with location and file name specified via the argument inputFile) containing codon information used to generate codon combinations for all protein segments. The input text file (inputFile) is organized with a unique codon on each line, with in-line information separated by a delimiter (specified via the keyword delim), organized in columns with the following order: the amino acid character code (aa, single-character or multiplecharacter string), the codon index number (no, an integer: 1 ≤ no ≤ number of degenerate codons), the codon's forward DNA sequence (fwd, three-characters {'A', 'C', 'G', 'T'} organized 5′ to 3′ in the forward sequence direction), the codon's reverse complement DNA sequence (rev, three-characters {'A', 'C', 'G', 'T'} organized 5′ to 3′ in the reverse complement direction), and the codon's frequency (frq, a floating point number: 0 ≤ frq ≤ 1). The information is compiled into a set of nested dictionary objects (Line 9, codons) such that each codon is referenced using three dictionary keys, indicating: its amino acid character code (aa), its codon number (no), and followed by the dictionary keys 'FWD', 'REV', or 'FRQ' to retrieve the codon's forward sequence (fwd), reverse compliment sequence (rev), or codon frequency (frq), respectively. The function returns this compiled set of nested dictionary objects (codons).

Function 2. Technical description:
The function CompilePaths transcribes and returns sequences in a nested dictionary (seqdict, keyed using the path index string and fragment name) from a set of segments (paths, passed to the function as a dictionary and keyed using the path index integer) for segment paths which assemble to produce the full-length DNA template encoding the TR gene computed by the TReSR algorithm. CompilePaths utilizes a dictionary of segment sequences passed to the function as an argument (segments, keyed using the segment name delivering a nested list of sequences for that segment). The function keywords segIndex and seqIndex identify the index of list elements corresponding to the sequence identification (e.g., A [2](1.2.3.4.1.6), representing the sequence encoding segment A constructed from the set of 1 st , 2 nd , 3 rd , 4 th , 1 st , and 6 th codon combinations) and forward DNA sequence for each list element in the dictionary segments, respectively. CompilePaths operates by iteratively (Line 4) extracting the segment names (nameSeg, Lines 9 and 10) from each path in paths. These segment names are iteratively (Line 6) matched (Lines 13 through 17) to the segment names in fragments (a dictionary passed to this function as an argument), organized by the dictionary cassettes (a dictionary passed to this function as an argument). Upon identification of a matched segment, CompilePaths then appends the forward sequence (seq, Line 18) to the string of sequences (ss, Line 21) encoding the current fragment (frg, Line 6) following the removal (Lines 19 and 20) of any characters belonging to the list rmv (a function keyword). The completed sequence for each fragment is then written to the output nested dictionary seqdict (keyed using the path and fragment names, Line 22) and returned by the function. iiCos = iiCos + (pow(i, 2)) 8 jjCos = jjCos + (pow(j, 2)) 9 iiCos = sqrt(iiCos) 10 jjCos = sqrt(jjCos) 11 ijCos = ijCos / (iiCos * jjCos) 12 return(ijCos)

Function 3. Technical description:
The function ComputeCosine calculates and returns the cosine similarity (ijCos, a floating point number: −1.0 ≤ ijCos ≤ +1.0) between two-equal length (paired) input vectors (iVec and jVec, each passed to the function as arguments organized as a list of floating point numbers). The calculation iteratively (Lines 5 through 8) evaluates the dot product between the input vectors (Line 6) and their respective magnitudes (iVec: Lines 7 and 9, jVec: Lines 8 and 10). The cosine similarity (ijCos) is then calculated (Line 11) by normalizing the dot product by the product of each vector's magnitude: cos θ = Two identical vectors are reported as having a cosine similarity value of one (cos θ = 1), while two completely dissimilar (orthogonal) vectors have a cosine similarity value of zero (cos θ = 0), and two completely opposite vectors have a cosine similarity value of negative one (cos θ = −1).

Function 4. Technical description:
The function ComputePercentile calculates and returns a dictionary of threshold values (thresholds, Lines 14 and 15) for a corresponding list of user-specified percentiles (prcs, set as list of floating point numbers passed to the function as a keyword) across data sets belonging to a dictionary of codon combinations (codons, passed to the function as an argument). Each dictionary entry in codons is keyed by the codon combination designation (i.e., a string of characters indicating the segment name in square parentheses followed by the codon combination in curved parentheses) that includes a list of parameters, whose identities (headers, list of strings) and list element indices (cols, list of integers) are specified via function keywords. The function compiles a dictionary of parameter lists (values, Lines 7 through 9) and sorts (sorted, Lines 10 through 13) these lists from least to greatest (reverse is False) or greatest to least (reverse is True) based on a list of Boolean elements (reverse, passed to the function as a keyword) having the same organization as the function keywords headers, cols, and prcs.

Function 5. Technical description:
The function CreateCodonCombinations generates all codon combinations (codons, a dictionary of DNA codons passed to this function as an argument) for a user specified amino acid sequence (sequence, a string composed of characters matching keys belonging to the codons dictionary). The function names each codon combination beginning with the segment designation (segment, an identifying character or string unique to the segment and passed to this function as an argument) and includes the integer index of the codons used to assemble the sequence, flanked by the characters idStart and idEnd (set as function keywords). The function returns a comma-delimited list of codon combinations (output) which includes the name, forward DNA sequence (terminated with a semicolon), reverse complement DNA sequence (terminated with a semi-colon), and average codon frequency for the assembled sequence (Lines 18 and 19).

Function 6. Technical description:
The function DepthFirstSearch solves for and returns a list of paths of adjacent segment codon combinations (adjacents, a dictionary passed to the function as an argument) which are assembled to form a larger sequence via a user-defined order of segments belonging to specific groupings (groupPath, a nested list of grouped-segment pairs passed to the function as an argument). For example, if groupPath is passed to the function as: 1]', 'C [2]']], the search function will be constrained to codon combinations that assemble a path which has the structure: [2], where the character specifies the segment and the number in square parentheses indicates the group. The function conducts this search by first assembling the set of all adjacent codon combinations for the current iteration (curAdj, Lines 10 through 32). If this is the first iteration of the search function (Lines 26 through 32) or if the current iteration visits a segment which was not present in the previous iteration (Lines 19 through 25) then the search function collects any adjacent codon combination satisfying the pair of segment and group constraints. However, if the search function must continue to assemble a sequence path through a contiguous set of segments (Lines 11 through 18) then the function must compile adjacent segment pairs which begin with the identical codon combination terminating the path from the previous iteration. If no adjacent segment codon combinations can be found (Line 33), then a failure counter is updated (fails, Line 34) and the terminal codon combination from the previous iteration is stored in a list (rejects, Lines 35 and 36) to be avoided in future solution paths. If the number of failed paths encountered (fails) exceeds the maximum allotment (failCount, Line 37) then the function terminates returning the set of all solutions (paths, Line 38) compiled to that instance. Should the search algorithm remain within the maximum allotment of failed paths encountered, the function recursively calls itself (Lines 39 through 46) passing the updated list of rejected codon combinations to be avoided (rejects). Conversely, if the search function finds one or more adjacent codon combinations satisfying the segment and group constraints specified in groupPath for the current iteration, the function will randomly select an adjacent codon combination from the compiled list (curAdj, Line 48) to append it to the end of the path and continue on to the next iteration. Should a full-length path be assembled, the current path solution is appended onto a list of solutions (paths, Line 50). If the number of solved paths (goals) exceeds the user specified amount (goalCount) the function terminates returning the list of solved paths (Lines 51 and 52). However, if more paths may be searched, the function recursively calls itself updating the list of solved paths and the goal counter (Lines 54 through 59).

Function 7. FilterSegment
for codon in segment:

Function 7. Technical description:
The function FilterSegment prunes a list of codon combinations passed to the function as an argument (segment, a nested list of lists with each entry containing a unique codon combination and its set of thermodynamic parameter values) based on a set of user specified thresholds. These thresholds passed to the function as keywords and are set as melting temperatures (Tm, °C) and include: the maximum Tm of homodimerization for the forward sequence (T_FF), the maximum Tm of homodimerization for the reverse complement sequence (T_RR), the minimum heterodimerization Tm between the forward and reverse complement sequence (T_FR), and the maximum Tm of hybridization between the forward sequence and the reverse complement sequence of the wild-type gene (T_WT). The function accesses each thermodynamic parameter value using an index provided by a set of keywords: index for the T_FF parameter (idxT_FF), index for the T_RR parameter (idxT_RR), index for the T_FR parameter (idxT_FR), and index for the T_WT parameter (idxT_WT). The function performs a comparison for the parameters T_FF, T_RR, and T_FR (Lines 7 through 9) if the keyword ADJ is True, and T_FF, T_RR, T_FR, and T_WT (Lines 12 through 15) if the ADJ keyword is False. The function returns the pruned list of codon combinations (output) with each entry included as a text string, delimited using a character keyword (delim, Lines 17 through 19), having the same organization as the list of codon combinations in the input argument segment.

Function 8. Technical description:
The function GenerateAdjacents assembles the complete list of codon combination pairs (output) from a dictionary (segments, passed to the function as an argument) listing codon combinations keyed by their respective segment names. Specific pairs of segments are generated (Lines 6 through 11) specified by the user as an argument (adjmap, a nested list of lists containing pairs of segment names corresponding to the keys in the segments dictionary). The compiled list of adjacent codon combinations (output) is returned as a list of strings, with each entry containing a string of the codon pair name (a contraction using the names from each respective codon combination joined by the delimiter specified by the keyword adjDelim, Line 8), forward sequence, and reverse complementary sequence, delimited according to the keyword strDelim. Additional function keywords (idxName, idxFwd, and idxRev) indicate the index of the name, forward sequence, and reverse complement for each list of dictionary entries in segments.

Function 9. GenerateSegmentPairs
for i in range(len(codons) + 1): 5 for j in range(i + 1, len (codons) Function 9. Technical description: The function GenerateSegmentPairs compiles and returns a list of paired (Lines 4 and 5) codon combinations (output) organized as a list of delimited strings (strDelim) where each element includes the codon combination pair (idxName) and their forward (idxFwd) and reverse complement (idxFwd) sequences, in addition to a comparison of their percent identity (Lines 8 through 11). The function operates on codon combinations that belong to the same segment, passed to the function as an argument (codons, a list of codon combinations belonging to the same segment).

Function 10. Technical description:
The function GroupAnalysis groups a list of codon combinations (segments, passed to this function as an argument) belonging to a specific protein segment (name identified by the argument key) by comparing the percent sequence identity between pairs of codon combinations (pairs, passed to this function as an argument). The function begins by compiling a dictionary (BP, Lines 4 through 6) to facilitate the rapid look-up of percent sequence identity values, by extracting the metric (ijBP, keyword indicates the nested index location in the pairs list) for the first (iCmp, keyword indicates the nested index location for the codon identifier in the pairs list) and second (jCmp, keyword indicates the nested index location for the codon identifier in the pairs list) codon combinations. The first codon combination in segments is automatically assigned to the first grouping (Line 8) while all successive codon combinations are subjected to the grouping analysis (Lines 9 through 29). This grouping analysis is conducted in two steps, with the first step identifying pairs of codon combinations which meet or exceed a threshold (set as the keyword prct) for percent sequence identity (Line 14). This is followed by a second step where the cosine similarity (function ComputeCosine, Line 21) is computed for the codon combination, using a pair of vectors generated by retrieving their percent sequence identity with the remaining codon combinations (vecName, Lines 16 through 18) in the segment (vectors iBP and jBP, Lines19 through 21). If the compared pair of codon combinations share similar percent sequence identities values with the remaining codon combinations in the segment (specified by the threshold set as the cos keyword for this function) then the pair of codon combinations are considered to belong to the same group (Lines 28 through 29). If the codon combination pair is considered to be distinct (i.e., different groups) then the comparison is completed for the remaining codon combinations in the segment. If no grouping can be assigned to the codon combination, then the codon combination is assigned to its own unique group (Lines 25 through 27). This process is iterated until all codon combinations belonging to segments have been assigned a group identifier delimited between start (groupStart) and end (groupEnd) characters and formatted as a list (groupList) of these assignments. The function then inserts the corresponding group designation into each entry of segments, using the character string specified via the keyword codonDelim to identify the point of insertion, and returns a list of these concatenated coma delimited entries (output). Function 11. Technical description: The function ReadAdjacents extracts and returns a dictionary of adjacent codon combinations (segments, dictionary containing adjacent codon combination lists keyed by their appended adjacent segment names, Lines 26 through 29) belonging to multiple segments from a .csv file (inputFile, detailing the file name and location passed to this function as an argument). This function utilizes keywords (name, fwd, rev, T_FF, T_RR, and T_FR for adjacent codon combination identifier, forward DNA sequence, reverse compliment DNA sequence, melting temperature for forward sequence homodimerization, melting temperature for reverse complement sequence homodimerization, and melting temperature for heterodimerization of the forward and reverse complement sequences, respectively) identical to the column headers in the first line of inputFile to index (Lines 7 through 13) and extract the corresponding information (Lines 15 through 25). Function 12. Technical description: The function ReadPairs extracts and returns a dictionary of paired codon combinations (pairs, dictionary containing paired codon combination lists keyed by their segment name, Lines 18 through 21) belonging to multiple segments from a .csv file (inputFile, detailing the file name and location passed to this function as an argument). This function utilizes keywords (iName, jName, and BP for the codon combination identifier of the first segment and second segment in addition to the percent sequence identity, respectively) identical to the column headers in the first line of inputFile to index (Lines 7 through 9) and extract the corresponding information (Lines 11 through 17). Function 13. Technical description: The function ReadPaths extracts and returns a dictionary containing segment sequences that reconstitute segment paths that assemble the DNA template for the TR gene (paths, triple-nested set of dictionary objects keyed by path, segment, and cassette identifiers, Lines 17 and 18) from an input .csv file (inputFile, detailing the file name and location passed to this function as an argument). This function employs a list of strings defining the names of TR cassettes (cassettes, passed to the function as an argument) whose elements are duplicated for every TR segment (casslist, Lines 7 through 11). This function utilizes keywords the delimLine and delimEntry to provide instruction on how to interpret file contents defining characters representing new lines (e.g., '\n') and characters delimiting column entries (e.g., ','). Path index (element, Line 13) in addition to segments (key), cassettes (cas), and their sequences (val) are iteratively extracted (Line 16) and reorganized for compilation into the output dictionary paths (Line 18).  Function 14. Technical description: The function ReadSegments extracts and returns a dictionary of codon combinations (segments, dictionary containing codon combination lists keyed by their segment name, Lines 30 through 33) belonging to multiple segments from a .csv file (inputFile, detailing the file name and location passed to this function as an argument). This function utilizes keywords (name, FRQ, fwd, rev, T_FF, T_RR, T_FR, and T_WT for codon combination identifier, average codon frequency, forward DNA sequence, reverse compliment DNA sequence, melting temperature for forward sequence homodimerization, melting temperature for reverse complement sequence homodimerization, melting temperature for heterodimerization of the forward and reverse complement sequences, and melting temperature of the codon combination with the reverse complement sequence belonging to the wild-type sequence, respectively) identical to the column headers in the first line of inputFile to index (Lines7 through 15) and extract the corresponding information (Lines 17 through 29).   Function 17. Technical description: The function WritePaths creates a file (outputFile, a character string specifying the location and name of an output file passed to this function as an argument, Line 2) and writes the set of paths (a list of character strings passed to this function as an argument, Lines 8 through 11) to the file. The file has a specific comma delimited organization, with the index of each path proceeded by the segment names, group identifiers (delimGroup, a keyword indicating the character string used to identify the group designation), and codon combinations for each segment in the path (delimPair, a keyword indicating the character string used to distinguish successive segments belonging to the same path), as specified by the header written in the first line of the file (Lines 3 through 6).

Function 18. Technical description:
The function WriteSequences creates a file (outputFile, a character string specifying the location and name of an output file passed to this function as an argument, Line 2) and iteratively (Lines 7 through 10) writes the DNA sequence content from a nested dictionary (outputData, passed to this function as an argument and keyed by path and fragment names) to the file.

Section 2. TReSR Results, Thermodynamic Parameters for PCR Primers, and Synthesis of scDBD Constructs
Organization of the Supplementary Information for TReSR Implementation. The main text discussion of TReSR algorithm implementation (i.e., Implementation of the TReSR Protocol) is supported with supplementary results included in this section. Specifically, a summary of the TReSR output file data is provided in supplementary tables and figures, detailing: 1. the values for thermodynamic parameter thresholds employed to prune sequences from each segment and the remaining number of filtered sequences (S1 Table), 2. the number of segment group sets identified by grouping analysis with the group identity and sequence count for selected groups used in the remainder of the calculation (S1 Table), 3. the values for thermodynamic parameter thresholds employed to prune sequences from paired adjacent segments and the remaining number of filtered sequences (S2 Table), 4. the thermodynamic parameters for solution sequences designed by TReSR for each segment (S3 Table), and 5. the melting temperatures for hybridization between primers designed by TReSR (S1 Fig).

S3 Fig. Agarose gel electrophoresis of PCR products used in synthesis of scDBD constructs.
PCR products listed by expected (Exp) and observed (Obs) length examined by 2% ( w /v) agarose gel electrophoresis against Invitrogen 1 Kb Plus DNA ladder. The synthesis of single-chain tandem repeat repressor constructs was conducted in three steps, starting first with synthesis and gel purification of gene fragments (A) encoding the promoter (A1) and template sequence unaltered fragments (A4 and A5) by PCR amplification using the plasmid pDBD(eGFP)•pGFP(dGFP)•pLacI(LacIW220F). Synthesis of these fragments was accompanied by the assembly PCR and gel purification of gene fragments using primers designed by the TReSR protocol producing Fragments 2 (A2 and A3) and 5 (A8 and A9) encoding LacI residues 1 through 30, delivering the DFT (D17/F18/T22) and IAN (I17/A18/N22) triple-mutations, respectively, and Fragments 3 (A4) and 7 (A7) encoding LacI residues 60 through 89. Cassettes (B) encoding N-terminal (B1 and B2) and C-terminal (B3 and B4) duplicated DNA binding domains were constructed in a second step by SOE PCR using the corresponding gene fragments from the first step. The third step concludes the synthesis with SOE of DNA binding domain cassettes producing full-length singlechain tandem repeat repressor constructs (

Section 3. Engineering and Optimization of the three-Component Genetic Circuit
Architectures and Performance Requirements for Genetic Circuit Promoters. For a genetic circuit to reliably measure the ability of an engineered protein to repress transcription, expression of this repressor protein must be tightly regulated such that differences between maximum and minimum output signals are reproducible, significant, and directly dependent on the controllable expression and function of the experimental repressor. This requires that the promoter controlling expression of the engineered repressor (pDBD) and the regulatory component that binds this promoter (LacI) be tuned to minimize expression from pDBD in the absence of inducer (i.e., leaky expression) while simultaneously maximizing output expression upon induction, thereby increasing the dynamic range between off (i.e., absence of inducer) and on (i.e., presence of inducer) states. Maximum pDBD expression levels for the experimental repressor (scDBD) should ideally saturate the promoter (pGFP) responsible for expression of the reporting gene (eGFP) [1]. Accordingly, a three-component genetic circuit was designed (Fig 3A) with promoters driving scDBD (Cloning Site I: pDBD) and eGFP (Cloning Site II: pGFP) expression incorporating two copies of the lacO sym and lacO tta operator sequence, respectively [2]. These lacO sym and lacO tta operators were inserted at core and proximal positions relative to the RNA polymerase recruitment site (Fig 3B) to maximize repressor occupancy and reduce leaky expression [3,4]. Reporting Strategy for Quantifying Genetic Circuit Outputs. To evaluate the minimum and maximum expression output from Cloning Site I we constructed a genetic circuit outfitting pDBD with a functional copy of eGFP while simultaneously pairing the pGFP promoter of Cloning Site II with a decoy reporting protein (called dGFP) encoded by a copy of eGFP with the point-mutation R96A [5]. The dGFP gene product exhibits delayed maturation of the chromophore (t½ ≈ 3 months) resulting in the production of a gene which is expected to comparably burden the host organism without producing a quantifiable green fluorescence signal [6]. The use of dGFP in this genetic circuit setup, annotated in S5 Fig, allowed us to employ a reporting strategy quantifying output signal from Cloning Site I while accounting for metabolic burden imposed by protein expression from Cloning Site II. The output signal (F) for any given genetic circuit is quantified as the culture fluorescence (GFP, λex = 485 nm, λem = 510 nm, gain = medium, 30 flashes per read), normalized by culture density, as measured by optical density (OD, λOD = 600 nm), (Equation S1). = *+, -.

Equation S1
Overview of Experiments Used to Tune Expression Performance of pDBD by Evaluating Cloning Site III Regulatory Elements. In order to tune expression of the engineered repressor such that leaky expression is minimized and induced expression is maximized, LacI under the control of its weak constitutive promoter pLacI was incorporated into Cloning Site III of the pDBD(eGFP)•pGFP(dGFP) genetic circuit. In addition to wild-type LacI, we also tested the W220F single-mutant variant (LacIW220F) since it had been demonstrated to improve dynamic range of expression [7]. Either wild-type or W220F mutant LacI was paired with the weak constitutive pLacI promoter or its variant pLacI Q (C/T mutation to −35 promoter box) that had previously been shown to increase constitutive expression of LacI by ten-fold [8]. Fluorescence output and culture densities were monitored over a 6-to 9-hour time-course grown in the presence of various inducer concentrations (1 μM to 25 mM IPTG). The experiment was performed across the combinatorial set of four LacI promoter-repressor pairs inserted into Cloning Site III, giving pDBD expression results for genetic circuits reported in: S4  (LacIW220F). The following sections detail how the results from these experiments were used to optimize dynamic range of expression from Cloning Site I, LacI expression levels from Cloning Site III, IPTG inducer concentrations required for maximal expression from Cloning Site I and to identify steady-state growth conditions needed to ensure the collection of reproducible results. Tuning Dynamic Range of Expression from pDBD. To identify the LacI repressor variant that gives the best dynamic range of expression from Cloning Site I, density-normalized fluorescence was measured for cultures transformed with the pDBD(eGFP)•pGFP(dGFP) circuit containing either LacI or LacIW220F repressors, each constitutively expressed with the same pLacI promoter. As shown in respectively) at non-inducing concentrations of IPTG (i.e., < 10 µM), density-normalized fluorescence was very low across all time-points tested (i.e., 7, 8 and 9-hours).
There was no statistically significant difference between normalized fluorescence values between the circuits containing wild-type and W220F mutant LacI repressors. For example, at 1 μM IPTG normalized fluorescence was measured at 363 ± 5 and 368 ± 3, for wild type and W220F LacI circuits at the 8-hour time-point, illustrating that there is no significant difference (p-value = 0.343, two-tailed homoscedastic ttest) in baseline expression output between the two constructs. In contrast, a significant difference in normalized fluorescence output (i.e., p-values < 0.001, two-tailed homoscedastic t-test) was observed when inducing concentrations of IPTG were used with the W220F mutant giving rise to a larger dynamic range. For example, at the 8-hour time-point, cultures grown in the presence of 10 mM IPTG yielded 5,827 ± 99 (wild-type LacI) and 7,681 ± 138 (W220F LacI) normalized fluorescence units, resulting in a 16.1 ± 0.4 and 20.9 ± 0.4 fold gain over their baseline expression output (1 μM IPTG), respectively. Dynamic Range Conclusion: These results demonstrate that while no discernable difference can be observed for basal expression output between pLacI(LacI) and pLacI(LacIW220F) regulatory elements, that incorporation of pLacI(LacIW220F) into Cloning Site III of the genetic circuit does improve the dynamic range of expression from the pDBD promoter. Tuning Repressor Saturation of pDBD. The pDBD promoter architecture incorporates two copies of the lacO sym operator, which raised the possibility that the levels of LacI expression under its native constitutive promoter may not be sufficient to saturate the pDBD promoter. Therefore, we evaluated the utility of incorporating the stronger constitutive promoter pLacI Q into Cloning Site III as the approximately 10-fold greater cellular concentrations of LacI may help to further reduce baseline expression. As shown in panels D through F in S6−S9 Figs, baseline expression in the presence of non-inducing concentrations of IPTG (<10 μM IPTG) was the same for all pLacI promoter and LacI repressor combinations. For example, at the 8-hour time-point output from the pDBD promoter under repression by pLacI Q (LacI) and pLacI Q (LacIW220F) was 375 ± 6 and 380 ± 6 density-normalized fluorescence units, respectively. These values were not significantly different relative to baseline expression output achieved using regulatory repressor expressed with pLacI (p-values = 0.022 and 0.012 for two-tailed homoscedastic t-tests, respectively). However, growth in the presence of inducing concentrations of IPTG (e.g., >1 mM) showed significantly reduced normalized fluorescence for both pLacI Q constructs compared to their counterparts with pLacI. For example, induction of pLacI Q (LacI) and pLacI Q (LacIW220F) genetic circuits with 10 mM IPTG produced 506 ± 7 and 613 ± 6 density-normalized fluorescence units, having only 1.35 ± 0.03 and 1.66 ± 0.03 fold gain over baseline expression output (1 μM IPTG), respectively. This reduction in pDBD output expression upon induction may be attributed to the increased cellular concentration of LacI that may exceed accumulated intracellular concentrations of IPTG.
Repressor Saturation Conclusion: Together, these results suggest that the incorporation of pLacI(LacIW220F) into the genetic circuit at Cloning Site III achieves cellular repressor concentrations sufficient to saturate the pDBD promoter without curtailing pDBD expression output upon induction. Tuning Inducer Concentration for pDBD Repression and Induction. Because the host organism for these experiments includes endogenous and interacting LacI-type regulatory and regulated elements, it is critical to ensure that experiments are conducted with inducer concentrations that will allow reproducible measurement of reporter gene expression output. The expression output of pDBD from the four genetic circuits outfitted with the combinatorial set of LacI and LacIW220F repressors and pLacI and pLacI Q promoters was recorded under an inducer series spanning 1 μM to 25 mM concentrations of IPTG. Evaluation of the inducer series at 6-, 7-, 8-, and 9-hour time-points required that the density-normalized fluorescence output (Equation S1) from the genetic circuit (y-axis: F) be plotted against the IPTG concentration transformed into a unitless negative base-10 logarithm (x-axis: ) as described in Equation S2.
Equation S2 Inducer expression curves were fit to a generalized logistic function representing a sigmoid function (curve) adapted with a linear function (line) manipulating the maximum asymptote (Equation S3). ;<=>?
The sigmoid function (curve) has the form represented in Equation S4 adjusting the growth rate (k) of the curve toward the maximum asymptote (sym). Because this function is asymmetric, the midpoint of the curve (mid) must be calculated adjusting the midpoint (kD) of an unmodified symmetric sigmoid function, as shown in Equation S5.

Equation S4
= The linear adjustment of the maximum asymptote of the sigmoid curve is achieved using the function in Equation S6 (omitted for genetic circuits incorporating the pLacI Q promoter). This function applies the linear correction to the maximum asymptote based on a reference point (ref, Equation S7) suitably distinguished from the midpoint as calculated in Equation S5.

Equation S6
= Sigmoid curves were fit to experimental data by non-linear regression with error on parameters estimated by bootstrap analysis (S8 Table). Bootstrapping was conducted by randomly applying observed error on experimentally collected data to generate 300 independent resampled data sets. Non-linear regression across the resampled data sets provided a set of parameter estimates from which average and standard deviations were computed. Non-linear regression analysis for expression output as a function of the inducer binding series is demonstrated to have strong agreement with the sigmoidal function (Equation S4) for all time-points and genetic circuits (0.9581 ≤ R 2 ≤ 0.9998) with the exception of the 6-hour time-point for the pDBD(eGFP)•pGFP(dGFP)•pLacI Q (LacI) genetic circuit (R 2 = 0.868) as indicated by their coefficients of determination (R 2 ). The midpoint for the sigmoid function fitted to the 8-hour time-point inducer series was identified at 53.7 ± 0.8 and 263 ± 3 μM IPTG for pLacI(LacI) and pLacI(LacIW220F) genetic circuits, respectively. Importantly, growth with inducer concentrations in excess of 25 mM IPTG exert considerable metabolic burden on the host supressing culture densities consistent with previously published results [6].

Section 4. Expression Controls for the three-Component Genetic Circuit
Overview of Experiments to Define Genetic Circuit Expression Controls. To demarcate the minimum and maximum output signals produced by the genetic circuit in both the absence and presence of inducer (10 mM IPTG), density-normalized culture fluorescence was monitored for a 6 to 9-hour time-course for genetic circuits outfitted with combinations of the reporter protein (eGFP) and its decoy counterpart (dGFP) inserted into Cloning Sites I and II under the control of promoters pDBD and pGFP, respectively. The identity and purpose of these control plasmids include: S10 Fig: the genetic circuit pDBD(dGFP)•pGFP(dGFP)•pLacI(LacIW220F) establishing output of basal culture fluorescence originating from non-reporter sources, S11 Fig: the genetic circuit pDBD(eGFP)•pGFP(dGFP)•pLacI(LacIW220F) establishing expression output from the pDBD promoter of Cloning Site I responsible for expressing the experimental single-chain tandem repeat repressor, S12 Fig: the genetic circuit pDBD(dGFP)•pGFP(eGFP)•pLacI(LacIW220F) establishing expression output from the pGFP promoter of Cloning Site II responsible for expressing the reporting protein for the genetic circuit, and S13 Fig: the genetic circuit pDBD(eGFP)•pGFP(eGFP)•pLacI(LacIW220F) establishing expression output from both the pDBD and pGFP promoters of Cloning Sites I and II thereby establishing output signal fidelity and stability of the genetic circuit. Establishing Baseline Fluorescence for the Genetic Circuit. To define the baseline fluorescence output that occurs in the absence of eGFP expression, the dGFP gene was inserted into both Cloning Sites I and II producing the pDBD(dGFP)•pGFP(dGFP)pLacI(LacIW220F) genetic circuit (S10A Fig). Density-normalized culture fluorescence (F) was measured as a function of time in the absence (F−IPTG) and presence (F+IPTG) of 10 mM IPTG inducer (S10 Fig, panels C and D). For all time-course measurements, no significant difference in fluorescence between induced and repressed growth conditions was observed, as expected. For example, density-normalized culture fluorescence recorded at the 8-hour time-point collected in the absence of IPTG (F−IPTG = 360 ± 10) was similar (p-value = 0.005, two-tailed homoscedastic t-test) to the output signal obtained for cultures grown in the presence of 10 mM IPTG (F+IPTG = 390 ± 10).
Baseline Fluorescence Conclusion. These measurements establish the range in baseline output signal for background fluorescence that defines the minimum values that can be obtained using this genetic circuit setup. Characterization of Expression Output for the pDBD Promoter. To evaluate pDBD promoter activity, the dGFP gene at Cloning Site I was replaced with a fluorescently active copy of the eGFP to create the pDBD(eGFP)•pGFP(dGFP)•pLacI(LacIW220F) genetic circuit (S11 Fig). Density-normalized culture fluorescence measured at the 8-hour time-point demonstrated a significant (p-value < 0.001, two-tailed homoscedastic t-test) 19.7 ± 0.8 fold increase in output signal upon induction with 10 mM IPTG (F+IPTG = 7,300 ± 200 and F−IPTG = 370 ± 10). Importantly, the low F-IPTG value was indistinguishable from baseline values (p-value = 0.207, two-tailed homoscedastic t-test) confirming the absence of leaky-expression originating from the pDBD promoter as also demonstrated in the previous section (Tuning Repressor Saturation of pDBD).
Cloning Site I Promoter Output Conclusion. These observations establish the utility of pDBD, having negligible leaky-expression and significant dynamic range for the expression of the gene that is placed under the control of this promoter, validating its utility for the evaluation of experimental single-chain tandem repeat repressors created in this study. Characterization of Expression Output for the pGFP Promoter. The genetic circuit was designed to function through the concerted activity of Cloning Sites I and II to control pDBD-driven expression of an experimental repressor (scDBD in this study) and report on its ability to repress expression of the fluorescent reporter from pGFP. Because the pDBD-pGFP promoter pair functions interdependently there is the potential that induction of expression from pDBD may impose a metabolic burden that would reduce expression from pGFP by competing for limited endogenous transcription and translation factor resources which would complicate interpretation of the genetic circuit output data [6]. To investigate this possibility, pGFP-driven expression activity was evaluated in the presence of constitutive pDBD expression. This involved a control genetic circuit constructed with copies of dGFP and eGFP genes inserted into cloning Sites I and II, respectively, to produce pDBD(dGFP)•pGFP(eGFP)•pLacI(LacIW220F) (S12 Fig).  a inducible expression from cloning Site I under the control of the pDBD promoter b constitutive expression from cloning Site II under the control of the pGFP promoter c data collected at 6-, 7-, 8-, and 9-hour time-points in quadruplicate (n = 4) d plots shown in panels C and D of S10-S13 Figs e plots shown in panels E and F of S10-S13 Figs S14 Fig. Summary of expression controls for the three-component genetic circuit. A comparison of expression output from Cloning Sites I and II of the genetic circuit was compiled by monitoring normalized output fluorescence for arrayed combinations of eGFP and dGFP. A two-tailed homoscedastic t-test was performed comparing genetic circuit output in the absence and presence of inducer (10 mM IPTG) with pvalues reported for populations exhibiting statistically significant difference (p-value < 0.001, two-tailed homoscedastic t-test, replicates = 4).