Unsupervised tensor decomposition-based method to extract candidate transcription factors as histone modification bookmarks in post-mitotic transcriptional reactivation

The histone group added to a gene sequence must be removed during mitosis to halt transcription during the DNA replication stage of the cell cycle. However, the detailed mechanism of this transcription regulation remains unclear. In particular, it is not realistic to reconstruct all appropriate histone modifications throughout the genome from scratch after mitosis. Thus, it is reasonable to assume that there might be a type of “bookmark” that retains the positions of histone modifications, which can be readily restored after mitosis. We developed a novel computational approach comprising tensor decomposition (TD)-based unsupervised feature extraction (FE) to identify transcription factors (TFs) that bind to genes associated with reactivated histone modifications as candidate histone bookmarks. To the best of our knowledge, this is the first application of TD-based unsupervised FE to the cell division context and phases pertaining to the cell cycle in general. The candidate TFs identified with this approach were functionally related to cell division, suggesting the suitability of this method and the potential of the identified TFs as bookmarks for histone modification during mitosis.


Introduction
During the cell division process, gene transcription must be initially terminated and then reactivated once cell division is complete. However, the specific mechanism and factors controlling this process of transcription regulation remain unclear. Since it would be highly time-and energy-consuming to mark all genes that need to be transcribed from scratch after each cycle of cell division, it has been proposed that genes that need to be transcribed are "bookmarked" to easily recover these positions for reactivation [1][2][3][4]. Despite several proposals, the actual mechanism and nature of these "bookmarks " have not yet been identified. [5] suggested that condensed mitotic chromosomes can act as bookmarks, some histone modifications were suggested to serve as these bookmarks [6][7][8], and some transcription factors (TFs) have also been identified as potential bookmarks [9][10][11][12][13]. Recently, [14] suggested that histone 3 methylation or trimethylation at lysine 4 (H3K4me1 and H3K4me3, respectively) can act as a "bookmark" to identify genes to be transcribed, and that a limited number of TFs might act as bookmarks. However, there has been no comprehensive search of candidate "bookmark" TFs based on large-scale datasets.
We here propose a novel computational approach to search for TFs that might act as "bookmarks" during mitosis, which involves tensor decomposition (TD)-based unsupervised feature extraction (FE) (Fig 1). In brief, after fragmenting the whole genome into DNA regions of 25,000 nucleotide, the histone modifications within each region were summed. In this context, each DNA region is considered a tensor and various singular-value vectors associated with either the DNA region or experimental conditions (e.g., histone modification, cell line, and cell division phase) are derived. After investigating singular-value vectors attributed to various experimental conditions, the DNA regions with significant associations of singular-value vectors attributed to various experimental conditions were selected as potentially biologically relevant regions. The genes included in the selected DNA regions were then identified and uploaded to the enrichment server Enrichr to identify TFs that target the genes. To our knowledge, this is the first method utilizing a TD-based unsupervised FE approach in a fully unsupervised fashion to comprehensively search for possible candidate bookmark TFs.

Materials and methods
Sample R code is available in S1 Text.

Histone modification
The whole-genome histone modification profile was downloaded from the Gene Expression Omnibus (GEO) GSE141081 dataset. Sixty individual files (with extension .bw) were extracted from the raw GEO file. After excluding six CCCTC-binding factor (CTCF) chromatin immunoprecipitation-sequencing files and six 3rd replicates of histone modification files, a total of 48 histone modification profiles were retained for analysis. The DNA sequences of each chromosome were divided into 25,000-bp regions. Note that the last DNA region of each chromosome may be shorter since the total nucleotide length does not always divide into equal regions of 25,000. Histone modifications were then summed in each DNA region, which was used as the input value for the analysis. In total, N = 123,817 DNA regions were available for analysis. Thus, with approximately 120,000 regions of 25,000 bp each, we covered the approximate human genome length of 3 × 10 9 .

Tensor data representation
Histone modification profiles were formatted as a tensor, x ijkms 2 R N�2�4�3�2 , which corresponds to the kth histone modification (k = 1: acetylation, H3K27ac; k = 2: H3K4me1; k = 3: H3K4me3; and k = 4:Input) at the ith DNA region of the jth cell line (j = 1: RPE1 and j = 2: USO2) at the mth phase of the cell cycle(m = 1: interphase, m = 2: prometaphase, and m = 3: anaphase/telophase) of the sth replicate (s = 1,2). x ijkms was normalized as ∑ i x ijkms = 0 and P i x 2 ijkms ¼ N (Table 1). There are two biological replicates for each of the combinations of one of cell lines (either RPE1 or USO2), one of ChIP-seq (either acetylation or H3Kme1 or H3Kme4 or inout), and one of three cell cycle phases.

Tensor decomposition
Higher-order singular value decomposition (HOSVD) [15] was applied to x ijkms to obtain the decomposition

PLOS ONE
Tensor decomposition applied to histone modification bookmarks in post-mitotic transcriptional reactivation where G 2 R 2�4�3�2�N is the core tensor, and and u ' 5 i 2 R N�N are singular-value vector matrices, which are all orthogonal matrices. The reason for using the complete representation instead of the truncated representation of TD is that we employed HOSVD to compute TD. In HOSVD, the truncated representation is equal to that of the complete representation; i.e., u ℓ 1 j , u ℓ 2 k , u ℓ 3 m , and u ℓ 4 s are not altered between the truncated and the full representation. For more details, see [15].
Here is a summary on how to compute Eq (1) using the HOSVD algorithm, although it has been described in detail previously [15]. At first, x ijkms is unfolded to a matrix, x iðjkmsÞ 2 R N�48 . Then SVD is applied to get Then, only u ℓ 5 i is retained, and v ℓ 5 , jmks is discarded. Similar procedures are applied to x ijkms by replacing i with one of j, k,m, s in order to get u ℓ 1 j , u ℓ 2 k , u ℓ 3 m , u ℓ 4 s . Finally, G can be computed as

TD-based unsupervised FE
Although the method was fully described in a recently published book [15], we summarize the process of selecting genes starting from the TD.
• To identify which singular value vectors attributed to samples (e.g., cell lines, type of histone modification, cell cycle phase, and replicates) are associated with the desired properties (e.g., "not dependent upon replicates or cell lines," "represents re-activation," and "distinct between input and histone modifications"), the number of singular value vectors selected are not decided in advance, since there is no way to know how singular value vectors behave in advance, because of the unsupervised nature of TD.
• To identify which singular value vectors attributed to genomic regions are associated with the desired properties described above, core tensor, G, is investigated. We select singular value vectors attributed to genomic regions that share G with larger absolute values with the singular value vectors selected in the process mentioned earlier, because these singular value vectors attributed to genomic regions are likely associated with the desired properties.
• Using the selected singular value vectors attributed to genomic regions, those associated with the components of singular value vectors with larger absolute values are selected, because such genomic regions are likely associated with the desired properties. Usually, singular value vectors attributed to genomic regions are assumed to obey Gaussian distribution (null hypothesis), and P-values are attributed to individual genomic regions. P-values are corrected using multiple comparison correction, and the genomic regions associated with adjusted P-values less than the threshold value are selected.
• There are no definite ways to select singular value vectors. The evaluation can only be done using the selected genes. If the selected genes are not reasonable, alternative selection of singular value vectors should be attempted. When we cannot get any reasonable genes, we abort the procedure.
To select the DNA regions of interest (i.e., those associated with transcription reactivation), we first needed to specify the singular-value vectors that are attributed to the cell line, histone modification, phases of the cell cycle, and replicates with respect to the biological feature of interest, transcription reactivation. Consider selection of a specific index set ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 as one that is associated with biological features of interest, we then select ℓ 5 that is associated with G with larger absolute values, since singular-value vectors u ℓ 5 i with ℓ 5 represent the degree of association between individual DNA regions and reactivation. Using ℓ 5 , we attribute P-values to the ith DNA region assuming that u ℓ 5 i obeys a Gaussian distribution (null hypothesis) using the χ 2 distribution where P χ 2[> x] is the cumulative χ 2 distribution in which the argument is larger than x, and s ' 5 is the standard deviation. P-values are then corrected by the BH criterion [15], and the ith DNA region associated with adjusted P-values less than 0.01 were selected as those significantly associated with transcription reactivation. Algorithm displayed with mathematical formulas can be available in Fig 2.

Enrichment analysis
Gene symbols included in the selected DNA regions were retrieved using the biomaRt package [16] of R [17] based on the hg19 reference genome. The selected gene symbols were then uploaded to Enrichr [18] for functional annotation to identify their targeting TFs.

DESeq2
When DESeq2 [19] was applied to the present data set, six samples within each cell lines measured for three cell cycles and associated with two replicates were considered. Three cell cycles were regarded to be categorical classes associated with no rank order since we would like to detect not monotonic change between cell cycles but re-activation during them. All other parameters are defaults. Counts less than 1.0 were truncated so as to have integer values (e.g., 1400.53 was converted to 1400).

csaw
Since csaw [20] required bam files not available in GEO, we first mapped 60 fastq files to hg38 human genome using bowtie2 [21] where 60 fastq files in GEO ID GSE141081 were downloaded from SRA. Sam files generated by bowtie2 were converted and indexed by samtools [22] and sorted bam files were generated. Generated bam files that correspond to individual

PLOS ONE
combinations of cell lines and ChIP-seq were loaded into csaw in order to identify differential binding among three cell cycle phases.

Identification of overlapping regions between peak call
We retrieved 36 peak call data set (with extension peaks.txt.gz) that correspond to 48 Chip-Seq files with excluding 12 input files. Starting from these 48 peak call files, using findOver-lapsOfPeaks function included in ChIPpeakAnno package in R, we selected overlap regions step by step as follows.
• Identify overlap regions between two biological replicates; this results in 9 regions for U2OS cell lines and RPE1 cell lines, respectively, in total 18 peak calls.
• Identify overlap regions among three cell cycles; retrieve regions commonly expressed in three cell cycle phases for H3K4me1 and H3K4me3 whereas those expressed only in interphase and anaphase/telophase; this results in three regions, each of which was attributed to H3K4me1, H3K27ac, or H3K4me3, for U2OS cell lines and RPE1 cell lines, respectively, in total 6 peak calls.
This process was illustrated in Fig 3.

Results and discussion
We first attempted to identify which singular-value vector is most strongly attributed to transcription reactivation among the vectors for cell line (u ℓ 1 j ), histone modification (u ℓ 2 k ), cell cycle phase (u ℓ 3 m ), and replicate (u ℓ 4 s ) (Fig 4). First, we considered phase dependency. Fig 5 shows the singular-value vectors u ℓ 3 m attributed to cell cycle phases. In the case that there are a set of genes that share some dependence, singular value vectors reflect their mean behaviour. Specifically, singular value vectors act as some kind of pseudo representative genes. Thus, by investigating singular value vectors, we can find what kind of cell cycle dependence can appear in the group of genes. Since the reactivation means that being expressive in inter and ana/telophases whereas not expressive in prometapahse, singular value vectors supposed to be related to be reactivation take opposite signs between inter/ana/telophased and prometaphase. Thus, u 3m are most likely associated with reactivation. Although u 2m and u 3m were associated with reactivation, we further considered only u 3m since it showed a more pronounced reactivation profile. Next, we investigated singular-value vectors u ℓ 2 m attributed to histone modification (Fig 6). There was no clearly interpretable dependence on histone modification other than for u 1k , which represents the lack of histone modification, since the values for H3K27ac, H3K4me1, and H3K4me3 were equivalent to the Input value that corresponds to the control condition; thus, u 2k , u 3k , and u 4k were considered to have equal contributions for subsequent analyses. By contrast, since u 1j and u 1s showed no dependence on cell line and replicates, respectively, we selected these vectors for further downstream analyses (Fig 7). Finally, we evaluated which vector u ℓ 5 i had a larger P 4 ' 2 ¼2 jGð1; ' 2 ; 3; 1; ' 5 Þj a ; a ¼ 1; 2; 3 ( Fig 8); in this case, we calculated the squared sum for 2 � ℓ 2 � 4 to consider them equally.

PLOS ONE
Although we do not have any definite criterion to decide α uniquely, since ℓ 5 = 4 always takes largest values for α � 1, ℓ 5 = 4 was further employed. The P-values attributed to the ith DNA regions were calculated using Eq (4), resulting in selection of 507 DNA regions associated with adjusted P-values less than 0.01. We next checked whether histone modification in the selected DNA regions was associated with the following transcription reactivation properties: 1. H3K27ac should have larger values in interphase and anaphase/telophase than in prometaphase, as the definition of reactivation. 3. H3K4me1 and H3K4me3 should have larger values than the Input; otherwise, they cannot be regarded to act as "bookmarks" since these histones must be significantly modified throughout these phases.
To check whether the above criteria are fulfilled, we applied six t tests to histone modifications in the 507 selected DNA regions ( Table 2). The results clearly showed that histone modifications in the 507 selected DNA regions satisfied the requirements for transcription

PLOS ONE
Tensor decomposition applied to histone modification bookmarks in post-mitotic transcriptional reactivation reactivation; thus, our strategy could successfully select DNA regions that demonstrate reactivation/bookmark functions of histone modification.
After confirming that selected DNA regions are associated with targeted reactivation/bookmark features, we queried all gene symbols contained within these 507 regions to the Enrichr server to identify TFs that significantly target these genes. These TFs were considered candidate bookmarks that remain bound to these DNA regions throughout the cell cycle and trigger reactivation in anaphase/telophase (i.e., after cell division is complete). Table 3 lists the TFs  Among the many TFs that emerged to be significantly likely to target genes included in the 507 DNA regions selected by TD-based unsupervised FE, we here focus on the biological functions of TFs that were also detected in the original study suggesting that TFs might function as histone modification bookmarks for transcription reactivation [14]. RUNX was identified as an essential TF for osteogenic cell fate, and has been associated with mitotic chromosomes in multiple cell lines, including Saos-2 osteosarcoma cells and HeLa cells (Young et al. 2007). Table 4 shows the detection of RUNX family TFs in seven TF-related categories of Enrichr; three RUNX TFs were detected in at least one of the seven TF-related categories. In addition, TEADs (Kegelman et al. 2018), JUNs [23], FOXOs [24], and FosLs citepKang01072020 were reported to regulate osteoblast differentiation. Tables 5-8 show that two TEAD TFs, three JUN TFs, four FOXO TFs, and two FOSL TFs were detected in at least one of the seven TF-related categories in Enrichr, respectively.
Other than these five TF families reported in the original study [14], the TFs detected most frequently within seven TF-related categories in Enrichr were as follows (Table 9): GATA2 Table 2. Hypotheses for t tests applied to histone modification in the selected 507 DNA regions. The null hypothesis was that the inequality relationship of the alternative hypothesis is replaced with an equality relationship. int: interphase, ana: anaphase, tel: telophase, pro: prometaphase.

Table 4. Identification of RUNX transcription factor (TF) family members within seven TF-related categories in Enrichr.
Roman numerals correspond to the first column in Table 3.

PLOS ONE
Tensor decomposition applied to histone modification bookmarks in post-mitotic transcriptional reactivation Table 5

. Identification of TEAD transcription factor (TF) family members within seven TF-related categories in Enrichr.
Roman numerals correspond to the first column in Table 3.

Table 7. Identification of FOXO transcription factor (TF) family members within seven TF-related categories in Enrichr.
Roman numerals correspond to the first column in Table 3.

Top 10 most frequently listed transcription factor (TF) families (at least four, considered the majority) within seven TF-related categories in Enrichr.
Roman numerals correspond to the first column in Table 3.

TF (I) (II) (III) (IV) (V) (VI) (VII)
https://doi.org/10.1371/journal.pone.0251032.t009 [25], ESR1 [26], TCF21 [27], TP53 [28], WT1 [29], NFE2L2 (also known as NRF2 [30]), GATA1 [10], and GATA3 [31]. All of these TFs have been reported to be related to mitosis directly or indirectly, in addition to JUN and JUND, which are listed in Table 6. This further suggests the suitability of our search strategy to identify transcription reactivation bookmarks. One might wonder why we did not compare our methods with the other methods. As can be seen in Table 1, there are only two samples each in as many as 24 categories. Therefore, it is difficult to apply standard statistical tests for pairwise comparisons between two groups including only two samples. In addition, the number of features, N, which is the number of genomic regions in this study, is as many as 1,23,817, which drastically reduces the significance of each test if we consider multiple comparison criteria that increase P-values that reject the null hypothesis. Finally, only a limited number of pairwise comparisons are meaningful; for example, we are not willing to compare the amount of H3K4me1 in the RPE1 cell line at interphase with that of H3K27ac in the U2OS cell line at prometaphase. Therefore, usual procedures that deal with pairwise comparisons comprehensively, such as Tukey's test, cannot be applied to the present data set as it is. In conclusion, we could not find any suitable method applicable to the present data set that has a small number of samples within each of as many as 24 categories, whereas the number of features is as many as 1,23,817.
In order to demonstrate inferiority of other method compared with our method, we applied DESeq2 [19] to the present data set, although DESeq2 was designed to not ChIP-seq but RNAseq. The outcome is disappointing as expected (Table 10) if it is compared with Table 2. First of all, there are no coincidences between two cell lines. Although there are as many as 4227 regions within which H3K4me1 is distinct among three cell cycle phases when RPE1 is considered, there were no regions associated with distinct H3K4me1 when U2OS was considered. In addition to this, although only H3K27ac among three histone modifications measured is expected to be distinct during three cell cycle phases, other histone modifications are sometimes detected as distinct during three cell cycle phases. Finally, the number of genomic regions considered in each comparison varies, since DESeq2 automatically discarded regions associated with low variance among distinct classes. The reason why there are no regions associated with distinct histone modification for Input and H3K4me1 when RPE1 was considered is definitely because almost all genomic regions were considered for these two comparisons; too many comparisons increase the P-values because of multiple comparison corrections. On the other hand, our proposed TD based unsupervised FE can deal with all of the genomic regions, which resulted in more stable outcomes. Thus, it is obvious that DESeq2 was inferior to TD based unsupervised FE when it is applied to the present data set.
One might still wonder if it is because of usage of DESeq2 not designed specific to ChIP-seq data. In order to confirm this point, we sought integrated approaches designed specific to treatment of ChIP-seq data. In addition, we need some approaches that enable us not only pairwise comparison but also comparisons among more than two categories, since we have to compare among three cell cycle phases, i.e., terphase, prometaphase, and anaphase/telophase. There are not so many approaches satisfying these conditions [32][33][34]. For example, although DBChIP [35] was designed to treat ChIP-seq data set, since it was designed to be specific to TF binding, it required to input single nucleotide positions where binding proteins bind, Thus, it is not applicable to histone modification measurements where not binding points but binding regions are provided. On the other hand, although DiffBind [36] was designed to deal with histone modification, it can accept only pairwise comparisions. SCIFER [37] can identify enrichment within single measurement compared with input experiment, MACS2 which is modified version of MACS [38], can also accept only pairwise comaprisons, ODIN [39] also can accept only pairwise comparisons, RSEG [40] also can accept only pairwise comparisons, MAnorm [41] also can accept only pairwise comparisons, HOMER [42] also can accept only pairwise comparisons, QChIPat [43] also can accept only pairwise comparisons, diffReps [44] also can accept only pairwise comparisons, MMDiff [45] also can accept only pairwise comparisons, PePr [46] does not perform even pairwise comparison. ChIPComp [47] was tested toward only pairwise comparisons when it was applied to real data set. Although MultiGPS [48] can deal with multiple files, they must be composed of condition A and its corresponding input vs condition B and its corresponding input, it cannot be applied to the present case composed of three cell cycle phases and their corresponding inputs. Thus as far as we investigated there are no approaches designed to be applicable to three independent conditions, each of which is composed of a pair of treated and input experiments. This difficulty is because of two kinds of distinct differential binding analyses required (Fig  9), one of which is the comparison between treated and input experiments and another of which is the comparison between two experimental conditions (e.g., patients versus healthy control, two different tissues) whereas they are easily performed in tensor representation as shown in the above. Nevertheless, in order to emphasize the inferiority of ChIP-seq specific pipeline aiming differential binding analysis toward TD based unsupervised FE, we considered csaw [20] as a representative since it accepts, at least, not pairwise but comparisons among multiple conditions as performed by DESeq2 (Table 10). Table 11 shows the results. It is very disappointing as expected. For example, although H3K27ac is expected to support reactivation, differential binding region among distinct cell cycle phases in U2OS cell line is almost none (only 0.1% of whole tested regions). Although H3K4me3 should not distinctly bind to chromosome among thee cell cycles since it is expected to play a role of bookmark, it distinctly binds Fig 9. Schematics that illustrates the difficulty of differential binding analysis. In contrast to differential expression analysis that requires only inter conditions comparisons (displayed by broken bidirectional arrows), differential binding analysis requires additional intra conditions comparisons between treated and input experiment (displayed by bidirectional solid arrows). There are no pipelines that aim to identify differential binding considering simultaneously more than two conditions. https://doi.org/10.1371/journal.pone.0251032.g009 to chromosomes among three cell cycle phases for two cell lines. These behaviours are very contrast to those in Table 2 which exhibits the expected differential/undifferential binding to chromosome. Thus, in conclusion, even if we employ pipelines specifically designed to ChIP-Seq data analyses, they cannot outperform the results obtained by TD based unsupervised FE.
Since one might wonder why we have specifically used region length of 25,000 nucleic acid length, we discuss about it as follows.
• We have successfully used the region length [49,50]. When started to employ this procedure, we tried multiple values and identified that it is most successful.
• Optimizing region length from studies to studies is not a good way to identify something biological. Region length should not be optimization parameters. If the optimal region length vary from studies to studies, we need to rationalize it. Nevertheless, the fact that employment region of 25,000 nucleic acid length was successful in three independent studies (including the present one) definitely suggest that this choice is reasonable.
• We expected that each region is coincident with one gene in average. Since the number of selected regions, 507, is almost equivalent to the number of gene symbols in these 507 regions, 525 (see S1 Table), employment of region of 25,000 nucleic acid length seems to be reasonable.
• Since average gene length on human genome is * 3 × 10 4 , the selection of region of 25,000 nucleic acid length is supposed to be association of one gene in each region. As denoted in the above, this expectation was fulfilled.
Although the above discussion might be enough to rationalize the usage of region of 25,000 nucleic acid length, we tried an alternative strategy as described in Materials and Methods. We downloaded peak call data set from GEO and tried to identify overlaps between peak regions. As a result, we could find only 22 regions of mean length of 5000 nucleic acid, with which only 13 gene symbols were associated. This tells us two things. Smaller region length, 5,000, results in regions without gene symbols. Shorter region length reduces the number of commonly identified regions between multiple experiments. This prevents us from performing downstream analysis. This failure of an alternative approach definitely suggests the suitability of the selection of region of 25,000 nucleic acid length.
One might also wonder if TFs can also work in cell line specific ways; thus there might be no reasons to select TFs common between two cell lines. It is really true that TFs can work in cell line specific ways; nevertheless, what we are interested in is a more robust bookmark that can likely work in mitoic process universally. If we selected TFs that work in cell line specific manner, it reduces the possibility that selected TFs work universally in mitoic process. The reason why we validated the selected genes based upon Enrichr that might include the results for other cell lines than U2OS and RPE1 is similar; if the selected genes are coincident with data

PLOS ONE
Tensor decomposition applied to histone modification bookmarks in post-mitotic transcriptional reactivation bases retrieved from other cell lines, results are more unlikely accidental and are more likely robust and universal. In this study, reliability of selected genes was evaluated by enrichment analysis. Since we have selected very small amount of genes, as small as c.a. 500, it is very unlikely for them to be associated with numerous enrichment. In spite of that, since our selected genes are associated with so many TF activities, we can assume that our selection of genes are reasonable. In the case that we cannot find any enrichment, we regard that our selection of singular value vectors is failure and we try to check if other selections can work better or not. It is worth noting that because other methods are not designed to deal with the studied problem, applying these methods generate inferior outcomes.
We show that selected TFs are expressive in cell lines as follows: First of all, we evaluated TFs by not only binding to genome but also co-occurrence with selected genes (e.g. (III) and (V) in Table 3). Thus, it is very likely that some TFs are expressive in cell lines where the selected genes are expressive. Second, we seek GEO Profiles in order to see if these TFs are expressive in U2OS cell lines and RPE cells. Then, we have found that almost all TFs were expressive in both U2OS cell lines and RPE cells in GEO profiles (see S3 Table). Thus, it is not unreasonable to expect the expression of these TFs in two cell lines used in this study.

Conclusions
We applied a novel TD-based unsupervised FE method to various histone modifications across the whole human genome, and the levels of these modifications were measured during mitotic cell division to identify genes that are significantly associated with histone modifications. Potential bookmark TFs were identified by searching for TFs that target the selected genes. The TFs identified were functionally related to the cell division cycle, suggesting their potential as bookmark TFs that warrant further exploration.
Supporting information S1