TMEM106B, a risk factor for FTLD and aging, has an intrinsically disordered cytoplasmic domain

TMEM106B was initially identified as a risk factor for FTLD, but recent studies highlighted its general role in neurodegenerative diseases. Very recently TMEM106B has also been characterized to regulate aging phenotypes. TMEM106B is a 274-residue lysosomal protein whose cytoplasmic domain functions in the endosomal/autophagy pathway by dynamically and transiently interacting with diverse categories of proteins but the underlying structural basis remains completely unknown. Here we conducted bioinformatics analysis and biophysical characterization by CD and NMR spectroscopy, and obtained results reveal that the TMEM106B cytoplasmic domain is intrinsically disordered with no well-defined three-dimensional structure. Nevertheless, detailed analysis of various multi-dimensional NMR spectra allowed defining residue-specific conformations and dynamics. Overall, the TMEM106B cytoplasmic domain is lacking of any tight tertiary packing and relatively flexible. However, several segments are populated with dynamic/nascent secondary structures and have relatively restricted backbone motions on ps-ns time scale, as indicated by their positive {1H}-15N steady-state NOE. Our study thus decodes that being intrinsically disordered may allow the TMEM106B cytoplasmic domain to dynamically and transiently interact with a variety of distinct partners.


Introduction
Transmembrane protein 106B (TMEM106B) is 274-residue protein whose gene locus was initially identified by a genome-wide association study to be a risk factor for the occurrence of frontotemporal lobar degeneration (FTLD), which is the second most common form of progressive dementia in people under 65 years [1,2]. In 2012 it was reported that TMEM106B SNPs might be also critical for the pathological presentation of Alzheimer disease [3]. Remarkably, the risk allele of TMEM106B was found to accompany significantly reduced volume of the superior temporal gyrus, most markedly in the left hemisphere [4], which includes structures critical for language processing, usually affected in FTLD patients [5]. TMEM106B was also reported to be a genetic modifier of disease in the carriers of C9ORF72 expansions [6]. PLOS  Very recently, differential aging analysis further identified that variants of TMEM106B are responsible for aging phenotypes. Briefly, TMEM106B risk variants were shown to be associated with inflammation, neuronal loss, and cognitive deficits for older individuals (> 65 years) even without any known brain disease, and their impact is particularly selective for the frontal cerebral cortex [7]. TMEM106B is a member of the TMEM106 family of proteins of 200-300 amino acids with largely unknown function, which consists of TMEM106A, TMEM106B, and TMEM106C. Immunohistochemical assessments showed that TMEM106B is expressed in neurons as well as in glial and endothelial cells [8]. The expression levels of TMEM106B appear to be critical for lysosome size, acidification, function, and transport in various cell types; and its upregulation leads to lysosomal defects such as improper lysosomal formation, poor lysosomal acidification, reduced lysosomal transport, and increased lysosomal stress [9][10][11][12][13][14]. These studies revealing the functions of neuronal TMEM106B in regulating lysosomal size, motility and responsiveness to stress, therefore underscore the key role of lysosomal biology in FTLD and aging.
Recently, biochemical characterizations revealed that TMEM106B is a single-pass, type 2 integral membrane glycoprotein predominantly located in the membranes of endosomes and lysosomes with a transmembrane domain predicted to be over residues 96-118 ( Fig 1A). However, the molecular mechanisms underlying the physiological functions of TMEM106B remain largely elusive. So far, no protein binding partner has been identified to interact with the luminal C-terminus of TMEM106B. On the other hand, the N-terminal cytoplasmic domain of TMEM106B has been amazingly implicated in dynamically and transiently interacting with very diverse proteins or complexes, which include endocytic adaptor proteins such as clathrin heavy chain (CLTC) [12]; proteins including CHMP2B of the endosomal sorting complexes required for transport III (ESCRT-III) complex [13); and microtubule-associated protein 6 (Map6) [14]. In this regard, characterization of its high-resolution structure is expected to provide a key foundation for addressing how the TMEM106B cytoplasmic domain can interact with significantly distinctive partners.
So far no biophysical/structural study has been reported on the cytoplasmic domain of TMEM106B in the free state, or in complex with binding partners. In the present study, we aimed to determine its solution conformation and dynamics by CD and NMR spectroscopy. Our results reveal that despite having relatively random sequence, the TMEM106B cytoplasmic domain is intrinsically disordered without any stable secondary and tertiary structures. Nevertheless, residue-specific NMR probes allowed the identification of several regions of the TMEM106B cytoplasmic domain which are weakly populated with either helical or extended conformations to different extents. Therefore, our present study deciphers that being intrinsically disordered appears to offer a unique capacity for the TMEM106B cytoplasmic domain to interact with diverse binding partners dynamically and transiently, which appears to be required for implementing its physiological functions.

Bioinformatics analysis, cloning, expression and purification of the cytoplasmic domain of TMEM106B
The sequence of the TMEM106B cytoplasmic domain over residues 1-93 was extensively analyzed by various bioinformatics programs to assess its hydrophobicity [15], Disorder score by IUPred [16], and secondary structures by GOR4 [17].
The DNA encoding residues 1-93 of TMEM106B was purchased from GeneScript and subsequently cloned into a modified vector pET28a with a C-terminal His-tag [18,19]. The recombinant protein was found in supernatant and thus first purified by Ni2+-affinity column under native condition. His-tag was removed by in-gel cleavage with thrombin, and the released TMEM106B protein was further purified with FPLC on a Superdex-200 column. To produce isotope-labeled proteins for NMR studies, the same procedures were used except that the bacteria were grown in M9 medium with addition of (15NH4)2SO4 for 15N-labeling; and (15NH4)2SO4/13C-glucose for 15N-/13C-double labeling [18,19].

CD and NMR experiments
All CD experiments were performed on a Jasco J-1500 spectropolarimeter equipped with a thermal controller [18,19]. Far-UV CD spectrum was collected on a protein sample at 10 μM in 5 mM phosphate buffer (pH 6.5) using 1-mm path length cuvettes, while near-UV CD spectra in the absence and presence of 8 M urea were acquired on protein samples at 100 μM in the same buffer with 10-mm path length cuvettes. Data from five independent scans were added and averaged. Deconvolution of the far-UV CD spectrum to estimate the contents of secondary structures was conducted by K2D2 program [20].
All NMR experiments were acquired on an 800 MHz Bruker Avance spectrometer equipped with pulse field gradient units as described previously [18,19,21]. NMR experiments including HNCACB and CCC(CO)NH were acquired on 15N-/13C-double labelled samples for achieving sequential assignments, while 15N-edited HSQC-TOCSY and HSQC-NOESY (with a mixing time of 300 ms) were collected on 15N-labelled samples to identify NOE connectivities at protein concentrations of 200 μM in 5 mM phosphate buffer (pH 6.5) at 25˚C. For assessing the backbone dynamics on the ps-ns time scale, {1H}-15N steady-state NOEs were obtained by recording spectra on 15N-labeled samples at 200 μM in 5 mM phosphate buffer (pH 6.5) at 25˚C as previously described [22]. NMR data were processed with NMRPipe [23] and analyzed with NMRView [24]. Obtained chemical shifts of HN, N, CA, CB and HA atoms were further analysed by SSP program [25] to obtain residue-specific score of secondary structure propensity (SSP). The chemical shifts were deposited into BMRB database with the entry code 27565.

Bioinformatics analysis of the TMEM106B cytoplasmic domain
In the present study, to facilitate experimental investigations, the cytoplasmic domain of TMEM106B was first assessed by a variety of bioinformatics tools to gain insights into its amino acid composition, ordered/disordered regions and secondary structure. As shown in Fig 1B, it has a relatively random composition of amino acids similar to the TDP-43 Nterminal domain adopting an ubiquitin-like fold [18], which is not highly biased to a certain type of amino acids as observed in the TDP-43 prion-like domain [19]. On the other hand, except for residues Tyr50 and Leu78-Tyr84 with small positive hydrophobicity scores, all other residues have negative hydrophobicity scores (Fig 1C), suggesting that it is overall hydrophilic. Furthermore, the disorder scores of most residues predicted by IUPred are > 0.5 (Fig 1D), implying that it might be intrinsically disordered as we previously observed on Nogo domains [16,26]. Indeed, based on the secondary structure prediction result (Fig 1E), a large portion of the sequence is in random coil conformation. However, two very short fragments, namely Thr22-Arg27 and Gln74-Leu78, are predicted to form α-helices, while five short fragments, namely Leu30-Val31, Tyr50-Thr54, Ser58-Thr60, Val79-Leu81 and Leu89-Pro91, adopt extended conformations.

Conformational properties of the TMEM106B cytoplasmic domain
The recombinant protein of the TMEM106B cytoplasmic domain is highly soluble in buffer at neutral pH. Its far-UV CD spectrum has the negative signal maximum at~208 nm and a shoulder negative signal at~222 nm, as well as a positive signal at~190 nm (Fig 2A), which indicates that it contains helical conformations to some degree. We thus conducted further deconvolution of the CD spectrum by K2D2 program [20], which estimated the TMEM106B cytoplasmic domain to contain 35.7% α-helix, 13.2% β-stands and 51.1% random coil. We further conducted thermal unfolding experiments and the result showed that upon increasing temperatures, the protein appeared to become more disordered (Fig 2B). Unfortunately, at and above 50˚C, the protein became severely aggregated and precipitated, and consequently the CD spectra are not reliable due to very large noise.
While far-UV CD spectroscopy reflects secondary structure features of a protein, near-UV CD is very sensitive to the tight tertiary packing [26]. Therefore, we subsequently recorded the near-UV CD spectra of the TMEM106B cytoplasmic domain in the absence and presence of 8 M denaturant urea (Fig 2C). The high similarity between two near-UV spectra clearly suggested that it is lacking of any tight tertiary packing even under the native condition [26]. Completely consistent with this results, its two-dimensional 1H-15N NMR HSQC spectrum has very narrow spectral dispersions over both dimensions, with only~0.9 ppm over 1H and 19 ppm over 15N dimensions (Fig 2D). Therefore, preliminary CD and NMR characterizations revealed that the TMEM106B cytoplasmic domain is an intrinsically disordered domain (IDD), which is lacking of any tight tertiary packing but populated with secondary structures to different extents.

Residue-specific conformation and dynamics of the TMEM106B cytoplasmic domain
NMR spectroscopy is the most powerful biophysical tool to pinpoint residual structures existing in the unfolded proteins [18,19,26,27]. Therefore, in order to gain insights into residuespecific conformational of the TMEM106B cytoplasmic domain, we acquired and analyzed three-dimensional NMR spectra including CCC(CO)NH, HN(CO)CACB, HSQC-TOCSY and HSQC-NOESY spectra. Triple resonance NMR spectra CCC(CO)NH and HN(CO)CACB allow the assignment by providing the cross-bond sequential connectivities, which was further confirmed by the cross-space sequential NOE connectivities in HSQC-TOCSY and HSQC-NOESY spectra. Although several HSQC peaks are relatively weak and some are significantly overlapped as shown in Fig 2D, we managed to successfully achieved sequential assignments of all residues except for the first residue and six Pro residues without HSQC peaks, whose side chain resonances, however, were also assigned. Fig 3A presents (ΔCα-ΔCβ) chemical shift, which is a sensitive indicator of the residual secondary structures in disordered proteins [18,19,[27][28][29][30]. The very small absolute values of (ΔCα-ΔCβ) over the whole sequence indicate that the TMEM106B cytoplasmic domain is lacking of well-formed secondary structures. Nevertheless, several regions show relatively large deviations, thus suggesting that these regions may be populated with dynamic secondary structures to some degree (Fig 3A).
To obtain quantitative insights into the populations of different secondary structures, we further analyzed chemical shifts of the TMEM106B cytoplasmic domain by SSP program [25], and the results are shown in Fig 3B. Indeed, all residues have the absolute values of SSP scores less than 0.3, confirming that the whole domain has no well-formed secondary structure. Nevertheless, several regions including Gly20-Gly29, Glu38-Gly40, Val59-Gly66, Asn76-Gln77 and Ser85-Leu89 have residue with SSP scores larger than 0.1, thus implying that they are populated with dynamic helical conformations. On the other hand, several very short segments including Gly3, Arg69-Arg72 and Pro91-Arg93 also have SSP scores < -0.1, implying that these regions might have intrinsic capacity to adopt extended conformations.
We also assessed the backbone rigidity of the TMEM106B cytoplasmic domain by collecting heteronuclear NOEs, which provides a measure to the backbone flexibility on the ps-ns timescale [18,19,22,[27][28][29][30]. As shown in Fig 3C, the backbone appears to be overall flexible on ps-ns time scale as judged from the relatively small or even negative heteronuclear NOEs (hNOEs), with an average value of only 0.15 (Fig 3C), which is much smaller than that for a well-folded protein [18]. Nevertheless, this average value is much larger than those for the unfolded states of ALS-causing SOD1 with an average value of -0.1 [28] and C71G-PFN1 with an average value of -0.15 [29] also collected at 800 MHz. Very interesting, its hNOE values are very similar to those of the inactive Dengue NS3 protease domain without its co-factor NS2B [30]. Strikingly, although no long-range NOE was identified in HSQC-NOESY spectrum, the Dengue NS3 protease domain did own long-range interactions as revealed by paramagnetic relaxation enhancement (PRE) [30]. Therefore, it appears that the backbone motions of the TMEM106B cytoplasmic domain on ps-ns time scale are partially restricted most likely due to the formation of dynamically-populated secondary structures, and dynamic tertiary packing as observed on the Dengue NS3 protease domain.
Indeed, further analysis of HSQC-NOESY spectrum allowed the identification of many sequential and medium-range NOE connectivities (NOEs) characteristic of helical conformations, which include dNN(i, i+1), dαN(i, i+2), dNN(i, i+2), dαN(i, i+3) and even dαN(i, i+4) NOEs (Fig 4). Briefly several fragments appear to be populated with helical conformations. In particular, the fragment Ser12-Asn25 is populated with relatively high helical conformations as judged from the existence of many characteristic NOEs, which include dNN (i, i+2), dαN(i, i+3) and dαN(i, i+4) NOEs. Interestingly, however, the helical regions defined by NOE patterns show some inconsistency with those predicted by SSP. One possibility is that these helical conformations are dynamic ensembles and consequently the observed results might be different because NOE and chemical shifts are averaged with distinct mechanisms.

Discussion
Although TMEM106B variants were initially identified as risk factors for FTLD [1], recent studies suggest that it may have more general involvements in neurodegenerative diseases including Alzheimer disease [3], and other in carriers of C9ORF72 expansions such as amyotrophic lateral sclerosis (ALS) [6]. Intriguingly, TMEM106B variants have been very recently identified to regulate aging phenotypes, thus implying that the normal aging and Solution conformation of TMEM106B cytoplasmic domain neurodegenerative disorders might have some convergent mechanisms/pathways [7]. Indeed, TMEM106B has been characterized to be a lysosomal protein whose variants are associated with defects of the endolysosomal pathway, supporting the emerging view that lysosomal biology plays key roles in neurodegenerative diseases and aging [1][2][3][4][5][6][7][8][9][10][11][12][13][14].
The TMEM106B cytoplasmic domain functions in the endosomal/autophagy pathway by interacting with various protein partners. Briefly, the TMEM106B cytoplasmic domain has been functionally mapped out to interact with three diverse categories of proteins: endocytic adaptor proteins [12]; proteins of the endosomal sorting complexes [13]; and microtubule-associated protein 6 (Map6) [14]. One emerging feature characteristic of these interactions is their dynamic and transient nature. In fact, the risk factor T185 was shown to be associated with CHMP2B more tightly than S185, thus slightly reducing autophagic flux [13]. Strikingly, the TMEM106B cytoplasmic domain appears to bind the intrinsically disordered region of MAP6 [14].
Elucidation of the three-dimensional structure of the TMEM106B cytoplasmic domain will not only provide mechanistic insights how the small TMEM106B cytoplasmic domain can interact with such diverse partners, but may also offer the clues for future discovery/design of molecules to interfere in the interactions for therapeutic applications. In the present study, we first assessed the structural properties of the TMEM106B cytoplasmic domain by both bioinformatics analysis and biophysical characterization including CD and NMR HSQC spectroscopy. The results together indicated that the TMEM106B cytoplasmic domain is an intrinsically disordered domain with no well-folded three-dimensional structure.
Nevertheless, by analyzing a set of multi-dimensional NMR spectra, we have successfully obtained residue-specific NMR parameters, which thus define conformational and dynamic properties of the TMEM106B cytoplasmic domain. Overall, the TMEM106B cytoplasmic domain is lacking of any tight tertiary packing as evidenced by the lack of any long-range NOE, as well as relatively flexible as judged by its small average value of hNOEs. Nevertheless, many segments are populated with dynamic/nascent secondary structures particularly helical conformations, and have relatively restricted backbone motions as evidenced by positive hNOE values for a large portion of residues. Furthermore, the TMEM106B cytoplasmic domain might have dynamic tertiary packing as we previously detected on the isolated Dengue NS3 protease domain. In this context, the TMEM106B cytoplasmic domain might be regarded to be a molten globule like state which has both secondary structures and dynamic tertiary packing as we previously found on a small protein [31]. Indeed, one group of intrinsically disordered proteins has been shown to be molten globule like [32].
Our present study thus offers the structural basis for the TMEM106B cytoplasmic domain to dynamically and transiently interact with such diverse protein partners. It has been wellestablished that one unique advantage for a protein to be intrinsically disordered is the capacity in interacting with a large and diverse set of binding partners dynamically and transiently [32]. One scenario is "binding-induced folding": the intrinsically disordered domain with dynamically populated secondary structures becomes folded only upon binding to its partners, as we previously found on the regulatory domain of PAK4 kinase, which is intrinsically disordered but forms a well-defined helix in complex with the kinase domain [33]. Alternatively, the intrinsically disordered domain may still remain largely disordered and dynamic even upon forming a "fuzzy complex" with its partners, as we previously observed on the hepatitis C virus NS5A protein which could form a "fuzzy complex" with the human host factor VAPB [34]. Therefore, upon binding to its partners the TMEM106B cytoplasmic domain might fold into a well-defined three-dimensional structure, or still remain disordered and dynamic. However, it is also possible that the TMEM106B cytoplasmic domain folds into a well-defined structure upon binding to one or one category of partners, while remain largely disordered and dynamic after interacting with another or another category of partners. Therefore in the future, it is of fundamental and therapeutic interest to devote extensive efforts to first mapping out the binding domains/regions of the partners of the TMEM106B cytoplasmic domain, and to subsequently determining their complex structures.