On an algorithmic definition for the components of the minimal cell

Living cells are highly complex systems comprising a multitude of elements that are engaged in the many convoluted processes observed during the cell cycle. However, not all elements and processes are essential for cell survival and reproduction under steady-state environmental conditions. To distinguish between essential from expendable cell components and thus define the ‘minimal cell’ and the corresponding ‘minimal genome’, we postulate that the synthesis of all cell elements can be represented as a finite set of binary operators, and within this framework we show that cell elements that depend on their previous existence to be synthesized are those that are essential for cell survival. An algorithm to distinguish essential cell elements is presented and demonstrated within an interactome. Data and functions implementing the algorithm are given as supporting information. We expect that this algorithmic approach will lead to the determination of the complete interactome of the minimal cell, which could then be experimentally validated. The assumptions behind this hypothesis as well as its consequences for experimental and theoretical biology are discussed.

S1-1. Results of applying the 'condensed to expanded' (C2E) algorithm for each element of S i in the RNA polymerase SI (See Table 2  'Name' -Name of each one of the elements in the set of internal elements, S i , of the SI for the RNA polymerase (pol; see Table 2 in the main text for the keys of the named elements). 'Condensed' -Condensed formula for the element in 'Name', given by the binary operator; see column 'Binary operator' in Table 2 in the main text. 'Expanded' -Expanded formula found for the corresponding element in 'Name'; the result of the C2E algorithm applied to that structure. < g.o, pol >. Also note that the expanded expression for pol given in this row is different to the one that we previously found 'by hand' in equation 3 of the main text. This is due to the fact that the C2E algorithms exits as soon as one term equal to the input name is found in the expanded formula. This do not alter the fact that all the expanded formulae found by the C2E algorithm and presented in Table  S1-1 are recursive. The classification of a formula as recursive or non recursive can be automatically computed.

S1-2. The C2E algorithm applied to the streptomycin SI
To give examples of formulae that are not recursive, we present the synthesis of streptomycin, a secondary metabolite exhibiting antibiotic activities, and which is produced by bacteria in the in the genus Streptomyces (Schatz et al., 1944). The SI for streptomycin synthesis was summarized from (Flatt and Mahmud, 2007), and it is presented in Table S1-2. Table S1-3 presents the full names for the elements shown in the 'Name' column of Table S1-2.
Note that the set of external elements for the SI in Table S1-2, labeled as S e in columns '1st Set' and '2nd Set', gives the 'root' of the SI and includes apart from the ribosome (rib) and the RNA polymerase (pol), three secondary metabolites, (28), (34) and (40), corresponding to 'Streptidine 6-phosphate', 'dTDP-Ldihydrostreptose bound to StrH' and 'NDP-N-methyl-L-glucosamine', respectively. In contrast with the essential elements rib and pol, these secondary metabolites are not essential for cell survival.
Applying the C2E algorithm to the SI in Table S1-2 we obtain the expanded formulae for each one of the internal elements of that interactome. Those formulae are shown in Table S1-4.  Table S1-3. Gene names begin with 'g.' while transcript names begin with 't.'. Columns '1st Set' and 'Set 2nd' give the sets in which the first and second operands of 'Binary operator' exist. Columns 'Type Name', '1st Type' and '2nd Type' give the types of elements for column 'Name', and the first and second operands of 'Binary operator', respectively.
Table S1-3. Names of elements involved in the synthesis of the secondary metabolite streptomycin (ST R).

Name Full Name rib
Ribosome. ST R Streptomycin. pol RNA polymerase (RNA-pol). StrH StrH protein predicted to mediate the ligation of (28) with (34). DeH Dehydrogenase involved in the step from (42) to STR. StrK Enzyme streptomycin phosphatase.
Streptomycin 6-phosphate. Notes: Main elements involved in the synthesis of streptomycin (ST R), compiled and simplified from Flatt and Mahmud (2007). Column 'Name' gives the shorten element identifier, as used in the 'Name' column of Table  S1-2, while column 'Full name' gives the biochemical designation of the elements.
In the first row of Table S1-4, showing the condensed and expanded formulae for the transcript of the protein StrH, 't.StrH', we note that both formulae, condensed and expanded, are identical. This happens in the C2E algorithm when no substitutions are possible to expand the condensed formula. But more interestingly, it can be corroborated that none of the expanded formulae in the 11 rows of Table S1-4 presents within its operands the element being defined, i.e., none of the expanded formula for the synthesis of streptomycin components is recursive. In particular, the last row, showing the expanded formula for Table S1-4. Results of applying the 'condensed to expanded' (C2E) algorithm for each element of S i in the streptomycin (ST R) SI (Table S1-2).
As noted in the main text, essentiality rule ER1 could classify as essential only elements in S i , while ER2 could also include as essential elements in G or S e . A first example can be seen in Table S1-1 that present the expanded formulae for the elements of S i in the RNA polymerase. All formulae in the 'Expanded' column of this table include within its operands the corresponding element in the 'Name' column; i.e., all the elements in the 'Name' column have a recursive formula and thus by ER1 all of them are essential. By applying ER2 we found that also all non internal elements named at the SI, say the ones in G = { g.bp, g.b, g.o, g.a } (the genes for each one of the RNA polymerase subunits), as well as the only external element named in the SI, S i = { rib }, are classified as essential, because they appear as operands in one or more of the expanded formulae in column 'Expanded'. In contrast, none of the elements that intervene in the synthesis of streptomycin ('STR'), which expanded formulae are shown in Table S1-4, is recursive and thus none is classified as essential. Even when the ribosome ('rib') -an intrinsically essential element of the cell, appears as operand in various of the expanded formulae of Table S1-4, its essentiality is not evident given that none of the internal structures of this SI is classified as essential and thus there is no information to judge its essentiality (by either rule ER1 or ER2). This shows the dependence of our essentiality rules on the quantity of information given by the SI analyzed.
In conclusion, elements that are not classified as essential by the information in an SI could be in fact essential for cell functioning -but the SI is not giving enough information for such classification. But in contrast, if an element is classified as essential, such conclusion is definite; no further information added to the SI could change such judgment.
With reference to the essentiality rule ER3, we could give a mathematical representation for this rule by defining the genome replication 'operation', say D(G) ⇒ 2G, implying with this symbolic representation that the complete genome, 'G' will be duplicated. Furthermore, the synthesis of all the elements needed to perform this operation in a given cell can be written as a list of binary operators; an SI, say SI * * , containing the synthesis definition for all elements necessary for genome replication. Then, ER3 can be trivially rewritten as Essentiality rule 3' (ER3'): Essentiality of genome replication machinery.: s * is an essential element for genome replication if the absence of s * impedes the D(G) ⇒ 2G operation.
nevertheless, this rule does not give any algorithmic guideline to discover the elements of such group from a computational manipulation -as did rules ER1 and ER2, thus, we must employ experimental methods to search for such elements, as done, by different approaches and in different organisms, as done for example in (Gerdes et al., 2003;Kobayashi et al., 2003;Berquist et al., 2007;Blomen et al., 2015;Wang et al., 2015).

S1-4. Notes and prerequisites
The remaining part of this document corresponds to computational examples for "On an algorithmic definition for the components of the minimal cell", and can be read in two ways, say, concentrating in the concepts and results presented, disregarding computational details -presented as examples with the R 1 (R Core Team, 2016) package 'InterPlay', or additionally following the step by step R examples.
In the latter case you must install 'InterPlay' using the supplementary file 'S1 Binary' included in the manuscript "On an algorithmic definition for the components of the minimal cell " (the file corresponding to 'S1 Binary' was named S1Binary.tar.gz, and for installation that file must be present in your R working directory). To install the package you could use the command 'R CMD INSTALL S1Binary.tar.gz' in your operative system command window, or follow instructions at Install Packages from Repositories or Local Files. If you have any problem with the installation please send an e-mail to Octavio Martinez, including the word 'InterPlay' in the subject.
You will also need to install the R package 'igraph' (Csardi and Nepusz, 2006) and include in your R environment the code in the supplementary file 'S2 Text' (which was uploaded to PLOS ONE as S2Text.txt). To do this you must copy that file into your R working directory and type 'source("S2Text.txt")' at the R prompt ('>').
R commands and output are given here in numbered boxes contained within remarks. In these boxes (see BOX 1 below) the R prompt, '>', is included, but, if reproducing the commands in R this prompt must be excluded; you can copy and paste the command to obtain the output which is shown within the box. We assume that you are familiar with R basic use. If you want to follow and reproduce the R commands given in the numbered BOXes, it is important that you do so in the same order than such boxes are presented (BOX 1, BOX 2, · · · , etc.), because in some cases results of a box will depend on operations performed in a previous one.
In case of comments or troubles about our package, please send an e-mail to Octavio Martinez, including the word 'InterPlay' in the subject.
1 Web links which you can follow are given in blue.
Name Binary operator s 1 s 1a , s 1b s 2 s 2a , s 2b · · · · · · s i s ia , s ib · · · · · · s k s ka , s kb The conditions for a well formed SI are that i) all the names in column 'Name' must be different and ii) All operators in column 'Binary operator' must also be different (see legend of Table 1 as well as 'Mathematical addendum' in main text). (i) implies that the synthesis of a given structure, say, s * is defined in a unique way and (ii) implies that a binary operator always result in the synthesis of the same element. SIs can be represented in R (or other computer environments) as tables with three columns, containing the names of elements (column 'Name' in Table S1-5; elements 's i '), and two columns for the left and right hand side of each binary operator, say, the 's ia ' and 's ib ' in Table S1-5, and k rows -one for each i = 1, 2, · · · , k binary operator. BOX 2 shows examples of tables that do not represent well formed SIs.
The meaning of these columns and the sets infer from the SI by 'normalize.SI' are presented in Table  S1-6. All but the first three columns (name, o1 and o2) are optional in the initial input of a pre-formatted SI; columns set.o1, set.o2, type.n, type.o1, type.o2 and structure will be added in the output (when they are not already present in the input), and columns set.o1 and set.o2 will be automatically filled by 'normalize.SI' with the rules given in Table S1-6.
For real or realistic SIs the input of the columns could be performed in an spreadsheet program (as Excel or alike) and imported into R. In such case, extra columns as 'Comments' or other could be added for the researcher reference; however, such columns will be deleted and ignored by the InterPlay functions (for an example see help for dataset 'dummy3.SI' and BOX 4 below). • Assume that a row of an SI contains 'Name' = a and 'Binary operator' = < b, c >.

########################################
• If there exist other SI row that contains 'Name' = b and 'Binary operator' = < d, e > then a 'valid substitution' for a in the formula < b, c > is given by << d, e >, c > and • If there exist other SI row that contains 'Name' = c and 'Binary operator' = < f, g > then a 'valid substitution' for a in the formula < b, c > is given by < b, < f, g >>. A character vector with the left hand side operand of the binary operator for element in column name. o2 A character vector with the right hand side operand of the binary operator for element in column name. set.o1 A character vector with the name of the set to which o1 belongs.

set.o2
A character vector with the name of the set to which o2 belongs. type.n A character vector with the type of component that the element on column name has. type.o1 A character vector with the type of component that the element on column o1 has. type.o2 A character vector with the type of component that the element on column o2 has. structure A character vector with the name of the structure defined in the SI.
Second component (a list with 3 sets) Component Content G A character vector with the names in the set G (when the name of the element begins with 'g.').

Si
A character vector with the names in the set S i (these are the same elements that appear in the column 'name').

Se
Character vector with the names in the set S e (elements in columns 'o1' or 'o2' which are not in G or S e ).
Note that applying the definitions we can obtain an 'expanded' formula for a given by << d, e >, < f, g >>⇒ a, etc. (see also 'Mathematical addendum'). In fact, the C2E algorithm implements nested substitutions to 'expand' formulae for all the elements in the column 'Name' of a given SI. An example is presented in BOX 5.
######################################## From BOX 5 we can easily 'guess' how the C2E algorithm works. In fact, given an initial formula for an element in the form of a binary operator, C2E greedily makes all possible substitutions in the initial formula, by iteratively reviewing the rows of the SI. This is explained in detail and illustrated by pseudo-code in the following paragraphs. Table S1-5, that here we denote as 'C(s i )' (column 'Binary operator' in Table S1-5), are 'condensed', in the sense of having only two operands, s ia and s ib . Assume that by inspecting Table S1-5 we find that the formula for s ia (only if s ia ∈ S i ) is equal to C(s ia ) = s ia , s ib . Then we can substitute the value of C(s ia ) in the original expression for C(s i ) to obtain a new expression for C(s i ), say C 1 (s i ) = s ia , s ib , s ib , where the subindex 1 in C denotes that we have carried out one substitution to obtain such formula. We could repeat the described substitution process to obtain more 'expanded' formulae, until we exhaust the possibilities for substitution. By iterating the substitution process we will always obtain 'well formed strings', which can be converted into the original condensed formula. To formalize the substitution process we define as E(X) the set of operands that exist into a well formed formula. For example, assume that at some point in the substitution process we find an expression X given by X = s 3 , s 4 , s 5 , s 3 , s 4 , s 2 then the set E(X) will be given by all the distinct operands found in expression X, that is E(X) = {s 2 , s 3 , s 4 , s 5 } Also, we define the operation 'S(X)' as the substitution in X of all elements of S i that exist in the string X by the corresponding binary operators. Thus for example, having X as above, and assuming that we find that only s 2 ∈ S i and that the formula for s 2 in the interactome is ' s x , s z ' then we obtain S(X) = s 3 , s 4 , s 5 , s 3 , s 4 , s x , s z With this notation we can define an algorithm to find an 'expanded' formula from a 'condensed' one, say the C2E algorithm (from 'condensed' to 'expanded'). C2E is presented in the following pseudo-code, implemented into 'InterPlay' as function 'C2E'.

Expressions for binary operators in
In', the interactome, and 'm.c', the maximum of cycles ('loops') of nested substitutions allowed. 2 In the C2E algorithm, steps 3 and 5.3 are set to avoid a primary infinite loop. Such primary infinite loop will happen if the formula defining s, say f or nf , contains s itself within the operands of the formula.
If that is the case, i.e., if s ∈ E, then if step 5.3 is not included, then the substitutions will continue ad infinitum within the loop defined in step (5) of the C2E algorithm. Note that steps 3 and 5.3 will always give as output a recursive structure.
There is a second possibility for an infinite loop within this algorithm. This secondary infinite loop will happen always that, at any point, the expanded formula for an element has within its operands a recursive structure. In such case the C2E algorithm will keep substituting the recursive element ad infinitum. To scape such possibility, only a maximum number of nested substitutions, 'm.c', is allowed. If that number of loops is reached within the 'WHILE' loop (5), then C2E return at step 5.2 a list with two components: The word 'UNDECIDED' and the current formula for the structure, nf . In the next section we will see how to deal with that undecided formulae.
If the possible substitutions in a formula had been exhausted, then the nf will not change in two successive evaluations of the 'WHILE' loop defined in step (5) and the condition 'nf = f ' will be false (i.e., in that case 'nf = f ') and thus the loop is exited and the final nf is output at step (6).
Thus, the possible results of the C2E algorithm are i) An expanded recursive formula, when the output occurs at steps 3 or 5.3; ii) An 'UNDECIDED' formula when the output occurs at 5.2 or, iii) A non recursive expanded formula when the output occurs at 6). In BOX 5 we saw an example (with 'dummy1.SI') in which the results of applying 'C2E' for each row of the SI produced the 3 different output; examining the results of 'expandAll(dummy1.SI)' -which is a wrapper that applies C2E to each one of the structures of the SI, we see that results for elements 'a' and 'b', i.e, the formulae '<<a,d>,c>' and '<<b,c>,d>' are recursive because they contain the elements being defined -'a' and 'b' respectively. Thus these two formulae were output at step (5.3) and are of type (i), while when making substitutions for the structure 'i' C2E falls into an infinite loop and outputs 'UNDECIDED' when the maximum number of loops is reached (type (ii) output at (5.2)). In contrast, output for elements 'f' and 'g' -formulae '<e,h>' and '<<e,h>,<e,h>>', are output of type (iii) that happens at step (6) when the formula does not change in two successive loops. S1-5.3. Avoiding infinite loops in the search for expanded formulae. In the previous section we saw that, for an initial condensed formula, the C2E algorithm performs as many substitutions as possible, but falls into an infinite loop when a recursive element is present within a formula that is being expanded.
To avoid the possibility of 'UNDECIDED' formula, it is necessary to detect recursive elements 'on the fly' and avoid it subsequent substitution in the expanding formulae. To do this it is necessary to examine the complete SI as a whole, and not one element at the time, as C2E does.
To solve this problem one of us (M.H R-V) designed the "turtle" 2 algorithm, which performs a double cycle of operand substitutions, but as soon as an operand is found within the formula for the corresponding element, such element is "frozen" and it is not substituted in any of the following cycles, avoiding the entrance into infinite loops. A list of all operands is output, including a list of the cases where an operand (in column 'Name' of the SI) was frozen -signaling the fact that its corresponding formula is recursive. In a parallel way, the algorithm keeps track of the formula being expanded and, at the end, it output those expanded formulae. In BOX 6 we present the results of the "turtle" algorithm, as implemented in the function 'findEssential'; see also 'Mathematical addendum' for a more detailed explanation. An example is presented in BOX 5 with the SI 'dummy1.SI'. In BOX 6 we can see that 'findEssential(dummy1.SI)' gives an explicit expanded formula, equal to '<<<a,d>,c>,<a,d>>', for the structure 'i' (row 5 on the last table at BOX 6). In contrast with C2E (implemented within 'expandAll'), the turtle algorithm was able to give this explicit formula for the synthesis of the 'i' component because the component 'a' was "frozen" as soon as it was recognized that its formula was recursive, and thus avoided infinite loops. Note that components 'a' and 'b' in column 'Recursive' had value 'TRUE', pointing to the fact that their corresponding formulae are recursive.
A fundamental difference between the C2E and the turtle algorithms is that C2E gives only expanded formulae for the elements in the set S i (and only when such formulae are not undetermined by the algorithm), while turtle gives, apart from final expanded formulae, the complete list of operands which entered -at any point, in the obtention of the expanded formulae. S1-6. Distinguishing essential cell elements in SIs We have seen that the turtle algorithm within 'findEssential' marks as recursive elements in which its expanded formula includes, as operand, the element being defined. This implements the first essentiality rule, 'ER1' (see main text), which can be stated as 'Elements with recursive expanded formula are essential '. Even when it has not been mentioned here, 'findEssential' also implement the second essentiality rule, say, ER2 (see main text), i.e., 'All operands that appear in the expansion of the formula for an essential element are also essential elements'. Table S1-7 explain the contents of the output given by the 'findEssential()' function, while BOX 7 demonstrate the implementation of the second essentiality rule in another example. Character: names of elements defined in the SI (the set of internal elements S i ) Recursive Logical; is the corresponding expanded formula recursive (and thus the corresponding element is essential by rule 1)? Essential Logical; is the corresponding element essential? (by essentiality rules 1 or 2) o1 A character vector with the left hand side operand of the binary operator for element in column name. o2 A character vector with the right hand side operand of the binary operator for element in column name. Operands Character strings with the operands that appeared during formula expansion Expanded Character strings with the final expanded formulae for the corresponding elements Second component (a list with 8 sets) Component Content G A character vector with the names in the set G (when the name of the element begins with 'g.').

EG
Character vector with the names of essential genomic elements (a proper subset of G).

Si
A character vector with the names in the set S i (these are the same elements that appear in the column 'name').

ESi
Character vector with the names of the essential elements of S i Se Character vector with the names in the set S e (elements in columns 'o1' or 'o2' which are not in G or S e ).

ESe
Character vector with the names of the essential elements in the set S e . E Character vector with the names of the essential elements contained in the SI.

NE
Character vector with the names of the non essential elements contained in the SI.
In particular it is important to underline that the column 'Operands' in the 'findEssential' output gives a list of all operands that were found during the process of formula expansion, and not only the ones that are present in the final expanded formula in column 'Expanded'. As example take the row 3 of the 'main' result: In this case, for element 'd', the expanded formula, '<<<b,c>,d>,<f,g>>', contain operands b, c, d, f and g but not the element a, which is part of the list presented in the result of column 'Operands': 'a;d;f;g;b;c'. This is because, at some point in the expansion of the formula for d, the operand a appeared, but was posteriorly substituted (by the formula <b,c>) and thus it does not form part of the final expanded formula. However, ER2 is about 'operands' that appear (at some point) in the expansion of a formula. Given that the expanded formula for d is recursive, this element was classified as 'essential' (by ER1), but because a appears as operand in the expansion of the formula for d, a is classified as essential by the ER2. However, if d is essential it follows that a is also essential, because to synthesize d at some point we need the existence of a, as stated by ER2. As a final result of 'findEssential(dummy2.SI, give.sets = T)' we have that all 8 elements named in the SI are essential. a.a and d are essential because their corresponding expanded formulae are recursive (ER1), while the remaining 6 elements (a, b, c, e, f and g) are essential because they appear as operands for a.a or d (ER2).
We can use R functions to define the different sets present in the 'dummy2.SI' SI to corroborate the results obtained by 'findEssential(dummy2.SI, give.sets = T)'; this will be useful in cases with more complex SIs. BOX 8 present such calculations.
Boxes 9 and 10 presents the analysis of 'dummy3.SI', which is an SI obtained from 'dummy2.SI' by adding elements and detailing the types of elements present. 'a' is an inactive enzyme 2 'a.a' is the active enzyme 3 'd' is a cofactor synthesized from 'e' by 'a.a' 4 'e' is a pre-cofactor synthesized from 'f' and 'g' 5 ' x' is a metabolite produced from 'y' by 'a.a' 6 The m-RNA for 'b' is obtained from its gene 7 Peptide 'b' is synthesized 8 The m-RNA for 'c' is obtained from its gene 9 Peptide 'c' is synthesized Normalizes dummy3.SI > normalize. SI Argument "SI" has more than 9 columns; extra columns will be deleted in the output! ###################################### END BOX 9 ##################################### From BOX 9 we can notice that columns name, o1 and o2 in rows 1 to 4 are equal in both SIs (dummy2.SI and dummy3.SI), but dummy3.SI includes all columns needed for a normalized SI (see Table S1-6), plus an extra column named 'Comment'. We can also notice that dummy3.SI include 'biological details' absent from the purely abstract case of dummy2.SI. In fact, we now know that a.a is an holoenzyme 3 , i.e., an enzyme that needs a cofactor (d) to be in its active form, and also realize that element a is a heterodimer enzyme formed by two peptides, b and c. Even more, we can see that a.a is needed to obtain its own cofactor, d, from a pre-cofactor, e, and that a.a also catalyze a reaction that gives product x from its interaction with y (row 5 of dummy3.SI). We can also observe that rows 6 to 9 define the synthesis of peptides b and c from their corresponding genes and with the participation of the RNA polymerase (pol) to form the mRNAs (t.b and t.c) and the ribosome (rib) to synthesize the peptides from their corresponding mRNAs. Column 'Comment' could be useful for the reader but it is not a formal part of the SI (see Table S1-6) and thus will be eliminated from it when the SI is processed by any of the functions of InterPlay.
In BOX 10 we can see the results of the analysis performed by 'findEssential' in 'dummy3.SI'.
Going back to the sets obtained in BOX 11 in the result of findEssential(dummy3.SI, give.sets = T), we can see that the only non essential structures (in set NE) are x and y. x is an internal element (in set Si), which synthesis is defined in the row 5 of dummy3.SI which contains the binary operator <a.a, y>, thus y is an external element in set the set Se.
Note that 'essentiality', as given in the output of the function 'findEssential' depends on the full content of the particular SI analyzed. For example, in the context of dummy3.SI the (intrinsically) essential elements RNA polymerase (pol) and ribosome (rib) are classified as 'essential' (into the set E), only by the fact that they participate as operands in the synthesis of essential structures. But it is easy to exemplify cases of SIs where intrinsically essential structures (as pol and rib) play exactly the same roles, i.e., synthesis of mRNAs from genes and synthesis of polypeptides from mRNAs, respectively, but they are not classified as essential if they do not participate as operands into the synthesis of essential structures. This is illustrated in BOX 13.
BOX 14 shows the first steps to summarize the information present in the SI int.SI.
[1] 184 # Let's normalize this SI, including its sets: > int.SI.n <-normalize.SI(int.SI, give.sets = TRUE) # We know by nrow(int.SI) that there are 184 elements defined within int.SI # But, what are the numbers of other structures (in G and Se)? > names(int.SI.n$sets) # To remaind us about the names of the sets [1] "G" "Si" "Se" > length(int.SI.n$sets$G) # Number of genomic elements.
[1] 3 # Let's define the set of all elements named in int.SI > S.int <-union(int.SI.n$sets$Si, union(int.SI.n$sets$Se, int.SI.n$sets$G)) > length(S.int) From BOX 14 we see that int.SI is an SI with 184 rows which contains information for a total of 249 elements, of which 62 (≈ 25%) are genomic elements (in set G), 184 (≈ 74%) are internal elements defined within the SI (in set S i ) and only 3 (≈ 1%) are external elements not defined within the SI (in set S e ).
We also see that inside int.SI we have information for the synthesis of three 'structures', say pol (RNA polymerase) which is determined by 12 rows of int.SI representing ≈ 7% of the total, rib (ribosome) which is determined by 161 rows of int.SI representing ≈ 88% of the total and STR (streptomycin) which is determined by 11 rows of int.SI representing ≈ 6% of the total number of rows.

########################################
From the analysis of 'str.SI' performed in BOX 16 we can see that none of the elements named in this SI are classified as essential. First, none of the expanded formula for the synthesis of elements in the interactome (set S i , column 'name') is recursive and as a consequence none of such elements could be classified as essential by rule ER1. For this reason ER2 is not applied to any of the elements named in the SI; i.e., no operands are classified as essential. The set of external elements in the SI includes "(28)" "(34)" "(40)" "pol" and "rib", where the first three are metabolites (intermediates in streptomycin synthesis; see Table 5 in main text for full names of these compounds), and we also have within this set the the intrinsically essential structures "pol" and "rib", which however in the context of the synthesis of streptomycin are classified as 'non essential'. This is not a contradiction because, as remarked before, essentiality is dependent on the context; if none of the elements defined within the SI ('str.SI' in this case) is classified as essential, then neither will be any of the elements in the set of internal elements of the SI.

########################################
It is important to note that 'findEssential' relies in the existence of the value of 'name' within the corresponding 'Operands' to declare the element in 'name' as an essential component by the application of the ER1. By the way in which the 'Expanded' formula is obtained within 'findEssential' -which is guided by the turtle algorithm, there could be cases in which the formula shown in 'Expanded' will not contain among its operands the term 'name'. This will happen in highly complex SIs (as 'rib.SI') because the element in 'name' is not yet found in the growing expanded formula when such 'name' is frozen. Expanded formula output by 'findEssential' are not unique, in the sense that the expansion process can be stop at different points, and the formula will be labeled as 'expanded' even when further substitutions could be possible (for example with the C2E algorithm, which in fact exhaust all possible substitutions). We have seen before that expanded formula obtained by 'findEssential' and 'C2E' could differ, and this fact is also illustrated in BOX 18.
One of the important points of comparing the analysis of the integrated interactome 'int.SI', contrasting it with the partial analyses of 'pol.SI', 'str.SI' and 'rib.SI' is to see how the integration of information within the whole SI clasifies the roles and essentiality of the structures "rib" and "pol" -which, as mentioned before are 'intrinsically essential'. In the whole analysis of int.SI both structures are 'internal' -because there is synthesis information for both of them, and 'essential' -because the sets of operands of their corresponding formulae include the corresponding structures, fulfilling the ER1. In contrast, in the analysis of pol.SI, "rib" is classified as an external and essential structure; in the analysis of rib.SI, "pol" is classified as an external and essential structure, while in the analysis of str.SI, both "pol" and "rib" are classified as external and non essential structures -because non of the structures named in str.SI is essential. Thus we see that the completeness of information about one structure in part determines the ability of the algorithms to classify them into the 'essential' or 'non essential' categories; if enough information is present within the SI for the synthesis of an structure or component, then the algorithms can categorically and correctly classify them into one of the classes, however, when no synthesis information about the structure is present such classification can vary, depending on the fact that the element in question appears within the operands of an essential element (as "rib" in pol.SI or "pol" in rib.SI) or not (as "rib" and "pol" in str.SI).
From the analysis of both, 'int.SI' and 'str.SI' we confirm that the secondary metabolite streptomycin (STR) is non essential, and neither are any of the primary elements which enter into its synthesis -except for the intrinsically essential structures "rib" and "pol" for which there is synthesis information in 'int.SI'. In fact, all elements in the set 'NE' from the analysis of int.SI (see those elements shown in the result of 'fe.int.SI$sets $NE' in BOX 19) are exclusively involved in the synthesis of streptomycin.
An interesting complementary result of the analysis of SIs is the fact that the number of distinct operands that enter into the synthesis of a given structure gives a relative measure of the complexity of the corresponding structure. Thus, by comparing the number of operands used in the synthesis of a given element (or structure) with the total number of posible operands (the number of elements in the set S) we can have an idea of the complexity of such element or structure. BOX 20 presents calculations of elements complexity using the function 'mea.compl' which is included in the supplementary file 'S2 SupplFunct.txt' which must be previously loaded for the code in BOX 20 to work.
By obtaining the relative complexity of all elements defined in the 'int.SI' (BOX 20), we find that it ranges between 10 and 93 percent, with a median of 54 an a mean of 42. By measuring the relative complexity of the elements which enter in the synthesis of each structure, we also confirm what was previously observed; the 7 elements that enter into the synthesis of pol are in average (28.88) less 'complex' than the 6 that enter into the synthesis of STR (40.96), and the average complexity of the 88 elements that enter in the synthesis ofrib is the highest (43.55).

S1-7. Visualizing SI relations with graph theory
Mathematically a graph is an ordered pair G = (V, E), where V is a set of vertices or nodes and E is a set of edges or lines, which are 2-element subsets of V (Bang-Jensen and Gutin, 2008). In the interactome, cell elements are nodes (elements of the set V ), while binding or synthesis define two different types of edges (the E set).
We have seen that algebraic manipulations can determine the essentiality of cell elements defined within an SI. However, visualization of the plots derived from SIs is of great help for a better understanding of the relations existent in SIs, as well as to corroborate the essentiality of SI elements. At least two relations can be visualized as a network from an SI: the 'binding' and the 'synthesis' relations between elements. In both cases the elements of the SI are 'nodes' while the relations are visualized as 'edges' between nodes. 'edges' will be non-directed lines in the case of the 'binding' relation while them will be directed lines or 'arrows' in the second. S1-7.1. 'binding' and 'synthesis' networks from the RNA polymerase SI. BOX 21 presents a first example with the networks generated by the RNA polymerase SI.
In BOX 21 we see how to use the function 'SI2d' to extract from an SI either, the binding (shown in the example) or the synthesis relations (to be seen in BOX 22) which exist into an SI. 'SI2d' gives a list, which first component is a data.frame named 'd', with columns 'from' and 'to', indicating that an 'edge' (undirected line) must join the corresponding elements in the row. The second component of the list is another data.frame named 'nod.at' (node attributes), which use will be seen later. The 'd' part of the list given by 'SI2d' can be used by the igraph function 'graph from data frame' to obtain a graph which can be then plotted (see the help of that function which will be repeatedly used in the following examples). For example, the first line printed when the command '> g.pol.bind'' is given, say IGRAPH a5feae0 UN--17 12 --', means that we have an UNdirected (the 'U' part) graph with '17' nodes and '12' edges or lines.
By plotting the graph with the command 'plot(g.pol.bind)' we obtain the result shown in Figure 1. Your results could be different in the positions of the nodes -because there is a random component on that, but the 17 elements of the SI and the 12 binding connections between them will be correctly shown as in Figure 1. By obtaining and ploting the binding network of 'pol.SI' we are showing explicitly (in Figure 1) the binding operators that exist in such SI. For example, the 'a' structure has a line connecting 'a' with itself -because two 'a' elements exist in the binary operator that determine the structure '2a'; see 'pol.SI[2,]' in your R command window.
We are not going to give more examples of 'binding' graphs that arise from SIs, but will concentrate in the more interesting 'synthesis' graphs latent in SIs. BOX 22 presents the synthesis network determined by 'pol.SI'.

######################################## BOX 22 ######################################
# Obtains the object containing data frames for the synthesis interactome in pol.SI. > d.pol <-SI2d(pol.SI) In the cases of graphs obtained for the synthesis of elements, as the case of 'd.pol', the first component, a data frame with with column names 'from' and 'to', determines a directed graph, meaning that instead of simple lines as vertices, we have directed vertices shown as 'arrows'. In the context of an SI, the function 'SI2d' with the (default) option 'which.one="synthesis"' will give in the 'd' component indication of the synthesis path of elements; in that context columns 'from' and 'to' mean that the element in column 'from' is obtained by adding something to the element in column 'to'. In BOX 23 we will see the plotting of the synthesis graph obtained in 'd.pol' and in BOX 24 we will see how to decorate this plot with particular attributes of the nodes, present in 'd.pol$nod.at', which contains columns 'name', 'set', 'type' and 'structure'.
Before proceeding, we can see Figure 3 (presented as Fig. 2 in the main text) to remember that external elements (in set S e ; box (A) in Figure 3) can only have 'departing' or 'out' arrows, as do genomic elements (in set G; box (B) in Figure 3), while elements which synthesis is described into the SI, i.e., internal elements in set S i (box (C) in Figure 3) have exactly 2 'incoming' or 'in' arrows, because synthesis in an SI is defined by binary operators, and these elements could have any number of departing arrows.  Figure 2 can be improved, by denoting with different colors, for example, to which set each one of the nodes belong. In BOX 24 we show how to decorate this plot with particular attributes of the nodes, present in the component 'd.pol$nod.at', which contains columns 'name', 'set', 'type' and 'structure'.
In BOX 24 we have seen how to color the synthesis network of the RNA polymerase (from the SI 'pol.SI') by the set of origin of each one of the elements (Figure 4) or, alternatively by the 'type' of component of each one of the elements ( Figure 5).
From Figures 2, 4 and 5 can be seen how all the elements of the internal set of 'SI2d', i.e., the elements for which synthesis information exist in the interactome, bp-b, 2a, ba, pol, t.bp, t.b, t.a, t.o, bp, b, a and o form part of one or more 'cycle' or closed walk. This confirms that all those elements fulfill the ER1, because if these elements form a cycle it means that their pre-existence is needed for their synthesis. In these figures we also see that all genomic elements in the interactome, genes g.a, g.b, g.bp and g.o which code for the corresponding essential peptides a, b, bp and o are also essential by fulfilling ER2, i.e., they are direct operands of essential structures.
Figures 3 to 5 in the main text where produced following the same steps delineated here, i.e., first creating a graph object with our auxiliar function 'SI2d', for a synthesis network and from this obtaining an igraph object with the function 'graph from data frame', which then was plotted with elements colored by the colors obtained with our auxiliar function 'map2rain'.
From figures 6 and 7 we can see that none of the elements of the synthesis network for a cycle or closed walk, reflecting the facts that within the context of the 'str.SI' none of its elements can be classified as essential. Figure 7. Streptomycin synthesis network with elements colored by type of element (for definitions of names see Table 5 in main text).

S1-8. Conclusions and perspectives
We have seen how the functions contained in our 'InterPlay' package help to classify the elements of a synthesis interactome (SI) as 'essential' or 'non essential', always in the framework of the information contained in the SI. We have also seen how plotting the networks among elements of the SI helps in the understanding of the information contained within them, reflecting essentiality of elements as cycles of synthesis.
We expect that the gathering of interactome information can be automatized from different data bases to have a comprehensive SI for the minimal cell, as proposed in the main text.