THAP11F80L cobalamin disorder-associated mutation reveals normal and pathogenic THAP11 functions in gene expression and cell proliferation

Twelve human THAP proteins share the THAP domain, an evolutionary conserved zinc-finger DNA-binding domain. Studies of different THAP proteins have indicated roles in gene transcription, cell proliferation and development. We have analyzed this protein family, focusing on THAP7 and THAP11. We show that human THAP proteins possess differing homo- and heterodimer formation properties and interaction abilities with the transcriptional co-regulator HCF-1. HEK-293 cells lacking THAP7 were viable but proliferated more slowly. In contrast, HEK-293 cells were very sensitive to THAP11 alteration. Nevertheless, HEK-293 cells bearing a THAP11 mutation identified in a patient suffering from cobalamin disorder (THAP11F80L) were viable although proliferated more slowly. Cobalamin disorder is an inborn vitamin deficiency characterized by neurodevelopmental abnormalities, most often owing to biallelic mutations in the MMACHC gene, whose gene product MMACHC is a key enzyme in the cobalamin (vitamin B12) metabolic pathway. We show that THAP11F80L selectively affected promoter binding by THAP11, having more deleterious effects on a subset of THAP11 targets, and resulting in altered patterns of gene expression. In particular, THAP11F80L exhibited a strong effect on association with the MMACHC promoter and led to a decrease in MMACHC gene transcription, suggesting that the THAP11F80L mutation is directly responsible for the observed cobalamin disorder.

14 298 -fully one-third of the total 1114 THAP11 WT peaks. Thus, the THAP11 F80L mutant protein 299 binds to only a subset of the THAP11 WT promoter targets and does no exhibit any evident de 300 novo promoter binding compared to the THAP11 WT protein.

301
Fig 7C illustrates THAP11 WT vs. THAP11 F80L DNA-site specificity with a 3.3 Mb view of 302 Chromosome 1 that includes the MMACHC gene. The view covers nine THAP11 WT TSS-303 associated peaks (1-9) -of which four (peaks 1, 5, 7 and 8) were selected for close-ups to 304 illustrate the points illuminated below -and one non-TSS-associated peak (labeled *): two 305 TSS-associated peaks (peaks 7 and 8) and the one non-TSS-associated THAP11 peak fell into 306 the THAP11 F80L -absent category and the remainder in the common category (see S3 Table).
307 Among those in the common category, we note that some peaks (e.g., peak 1 in Fig. 7C) remain 308 largely the same size between the THAP11 WT and THAP11 F80L samples, whereas others (e.g., 309 peak 5) were smaller albeit still present in the THAP11 F80L sample. In the subsequent analyses 310 described below, we do not distinguish between these two common peak subcategories. In 311 conclusion, the THAP11 F80L mutation results in a selective disruption of THAP11 DNA binding 312 at specific promoters in HEK-293 cells, and does not create de-novo THAP11 promoter-binding 313 sites.
314 315 Restricted DNA-sequence recognition by the THAP11 F80L protein 316 To examine more broadly the nature of the effect of the THAP11 F80L mutation, we 317 calculated size scores for each peak (see Materials and Methods for peak score determination) 318 and plotted separately the distribution of the common and F80L-absent peak scores for both 319 the THAP11 WT parental and THAP11 F80L mutant samples ( Fig 8A). Consistent with their a priori 320 categorization, the THAP11 F80L peaks scored higher in the common group than in the F80L-

357
In addition to a weaker TAM consensus sequence associated with the F80L-absent 358 peaks, fully one-quarter of the F80L-absent peaks had no discernible nearby TAM (i.e., within 359 1000 bp on each side of the peak maximum), whereas essentially all common peaks were 360 associated with one or more TAMs (Fig 8C). These analyses further emphasize the importance 361 of a strong TAM consensus sequence for THAP11 F80L promoter recognition.

362
In addition to examining the TAMs of THAP11-peak-containing promoters, we asked if 363 the nature of the genes associated with the common and F80L-absent peaks differ through 364 gene-ontology (GO) analysis as summarized in Fig 8D ( To examine the consequences of the THAP11 F80L mutation at the gene-expression level 374 we analyzed the RNA-seq results (S5 Table). Figure  Among the 14 genes most downregulated in the THAP11 F80L cells, fully one-half were 395 canonical S-phase histone-encoding genes (Fig. 9A, right). In our THAP11 WT ChIP-seq data, the 396 THAP11 protein was absent from canonical histone-encoding gene promoters (S1 Table) 397 indicating that this downregulation is an indirect effect. Such downregulation may be linked 398 to the slower proliferation rate of THAP11 F80L cells, either as a cause or a consequence.

399
Separate GO analyses of the upregulated and downregulated genes ( Fig 9B and S6 400  Table), we observed that a 418 significant portion (one quarter) of these direct-effect genes are associated with regulation of 419 transcription (Fig 10, blue arrows). This observation suggests that THAP11 plays a higher-420 order role in the regulation of gene transcription by directly regulating -most often 421 activating -the expression of secondary transcriptional regulator-encoding genes.

504
To investigate the consequences of the THAP11 F80L mutation, we probed THAP11 DNA 505 binding using THAP11 ChIP-seq. We observed no significant new THAP11 TSS-associated-506 binding sites, but did observe that the mutation causes the loss of THAP11 DNA binding at a 507 specific subset of TSS-associated sites. A detailed analysis of the altered binding-site patterns 508 revealed that some TSS-associated sites are particularly sensitive to the THAP11 F80L mutation.
509 Overall, these are sites exhibiting a lower affinity for THAP11 binding, with a weaker THAP11 23 510 motif consensus sequence -particularly at the 3' half of the consensus sequence. These 511 observations suggest that THAP11-binding sites come in two classes that may respond to wild-512 type THAP11 activity differently, for example by being more or less sensitive to the activity of 513 regulatory co-factors or dimer formation.