Genome-wide identification of ZF-HD gene family in Triticum aestivum: Molecular evolution mechanism and function analysis

ZF-HD family genes play important roles in plant growth and development. Studies about the whole genome analysis of ZF-HD gene family have been reported in some plant species. In this study, the whole genome identification and expression profile of the ZF-HD gene family were analyzed for the first time in wheat. A total of 37 TaZF-HD genes were identified and divided into TaMIF and TaZHD subfamilies according to the conserved domain. The phylogeny tree of the TaZF-HD proteins was further divided into six groups based on the phylogenetic relationship. The 37 TaZF-HDs were distributed on 18 of 21 chromosomes, and almost all the genes had no introns. Gene duplication and Ka/Ks analysis showed that the gene family may have experienced powerful purification selection pressure during wheat evolution. The qRT-PCR analysis showed that TaZF-HD genes had significant expression patterns in different biotic stress and abiotic stress. Through subcellular localization experiments, we found that TaZHD6-3B was located in the nucleus, while TaMIF4-5D was located in the cell membrane and nucleus. Our research contributes to a comprehensive understanding of the TaZF-HD family, provides a new perspective for further research on the biological functions of TaZF-HD genes in wheat.


Introduction
The developmental processes of plant are controlled by various regulatory proteins. The transcription factors (TFs) are one kind of special regulatory proteins which regulate other gene expression, and they are characterized by one or more DNA binding domains (DBDs) [1]. TFs can bind with specific DNA sequences to regulate the response of plants to environmental stimuli, and further play important roles in plant growth and development processes [2]. The Zinc finger homeodomain (ZF-HD) TFs, containing a conserved zinc finger (ZF) domain in the gene structure, subcellular localization, phylogenetic relationship, expression profiles of ZF-HD family genes in wheat were systematically analyzed. At the same time, the transcriptional autoactivation activity of two genes were verified by yeast two-hybrid system. In conclusion, this study lays the foundation for analyzing the function of TaZF-HD proteins, and also provides a theoretical reference for mining stress resistance genes in wheat breeding for disease resistance.

TaZF-HDs motifs, gene structures and three-dimensional structure analysis
The conserved motifs of TaZF-HD proteins were identified using MEME Suite (http://memesuite.org) [32] in discriminative mode. The parameters were set as: Each sequence may contain any number of non-overlapping motif; The number of different motifs up to 20, and the motif widths range from 6 to 50 amino acids. Then these motif patterns were drawn by TBtools software [33]. According to the annotation information, the gene structures of TaZF-HDs were displayed using the online software Gene Structure Display Server (GSDS2.0, http://gsds.cbi. pku.edu.cn/) [34]. The three-dimensional structures of the TaZF-HD proteins were analyzed by SWISS-MODEL (https://swissmodel.expasy.org).

Promoter analysis of TaZF-HD genes
The 1500 bp upstream sequences of the TaZF-HD gene were extracted from the wheat genome. Then cis-elements were identified using PlantCARE online tool (http://bioinformatics.psb. ugent.be/webtools/plantcare/html/) [35]. And the results were shown using R package 'pheatmap'.

Chromosomal location and gene duplication analysis of TaZF-HDs
The chromosomal physical location of the ZF-HD genes was displayed using MapInspect software basing on the wheat reference genome annotation information. Gene duplication events were divided into tandem duplication events and segmental duplication events [36]. Tandem duplication events were identified using the following evaluation criteria: (1) Length of the aligned sequence > 80% of the length of each sequence; (2) Identity > 80%; (3) Threshold � 10 −10 ; (4) Only one duplication can be recognized when genes are linked closely; (5) Intergenic distance is less than 25 kb. If genes satisfied criteria (1), (2), and (3), and were located on different chromosomes, they were judged to be segmental duplications [37]. In addition, the Dna Sequence Polymorphism Version 6 (DnaSP6) was used to calculate the non-synonymous (Ka) and synonymous (Ks) substitution rates of duplicated gene pairs [38].

Homology and synteny analysis of TaZF-HD genes
Orthologous homology genes between allohexaploid common wheat and sub-genome donor species (T. urartu, T. dicoccoides, and Ae. Tauschii) were identified by BLASTp analysis, and the cutoff values (e-value < 10 −10 , identity > 80%) were used to ensure the reliability of orthologs. The homology relationships were displayed using the R package 'circlize' [39]. In addition, basing on the multiple alignments among wheat and other species (Arabidopsis, S. lycopersicum, Z. mays, and O. sativa), the synteny relationship within species of ZF-HDs was analyzed using MCScanX [40].

MicroRNA targeting and protein interaction analysis of TaZF-HD genes
MicroRNA (miRNA) has an important function of regulating gene expression, in order to identify miRNAs that target the TaZF-HD genes, we submitted the TaZF-HD gene sequences to psRNATarget (http://plantgrnbb.noble.org/psRNATarget) with default parameters [41]. And Cytoscape software (https://www.omicshare.com/tools/) was used to drew the linkage between the predicted miRNA and corresponding target genes [42]. Protein-protein interaction (PPI) can regulate a variety of biological activities and functions [43]. In order to better understand the biological functions and regulatory network of TaZF-HD genes, the online STRING database(https://string-db.org) was used to predict the PPI network.

Multiple conditional transcriptome analysis of TaZF-HDs
Original RNA-seq data of wheat from multiple conditional transcriptome analyses were download from the NCBI Short Read Archive (SRA) database and mapped to the wheat reference genome through hisat2 [44]. After that, the expression levels of TaZF-HDs (FPKM, normalized fragments per kilobase of the exon model per million mapped reads) were calculated by Cufflinks [45]. The heatmap of wheat ZF-HD genes was generated using the R package 'pheatmap'.

Preparation of wheat materials and stress treatments
Seeds of Yangmai20 (a hexaploid common wheat variety) were surface sterilized with 1% hydrogen peroxide, then rinsed with distilled water, and germinated for 2 days in an incubator at 26˚C [46]. The seedlings were transferred and cultured in a continuously ventilated 1/4 Hoagland nutrient solution [47]. The stress treatments in present study included three abiotic stresses (hormonal, drought, and high salt) and one biotic stress (Fusarium graminearum). After 5 days, the stems of wheat seedlings were inoculated with hypha of F. graminearum (PH-1). The stems were collected at 24, 48, 72, and 120 h after inoculation. Moreover, wheat seedlings were treated with 150 mM sodium chloride (NaCl), 20% polyethylene glycol (PEG), 100 μM abscisic acid (ABA). After treatments, leaves and roots were collected at 2, 4, 8, 12, 24, 72, 120 h. During the treatment, the growth environment of wheat seedlings was set at 25˚C and 16 h /8 h (day/night). Three biological replicates were set up for each treatment. The samples were immediately frozen with liquid nitrogen and stored at -80˚C.

Real-time quantitative PCR and data analysis
In order to clarify the development and tissue-specific expression profiles of TaZF-HD, 4 genes with high expression in transcriptome data were selected to perform quantitative realtime PCR (qRT-PCR) to detect their expression levels. Gene-specific primers were designed using Primer Premier 5.0 and listed in S7  [48]. The ADP-ribosylation factor Ta2291, which stable expressed under various conditions was used as an internal reference gene for qRT-PCR analysis (Forward: GCTCTCCAACAACA TTGCCAAC, Reverse: GCTTCTGCCTGTCACATACGC).

Transient expression and subcellular localization analysis
In this study, Agrobacterium-mediated tobacco transient transformation system was used for subcellular localization analysis. The full-length sequences of TaTaMIF4-5D and TaZHD6-3B genes without stop codon were cloned, and then the genes were combined with pART27-GFP vector to construct transient expression vector pART27-TaMIF4-5D-GFP and pART27-TaZHD6-3B-GFP. Then the Agrobacterium (GV3101) containing recombinant vector was injected into Nicotiana benthamiana leaves. The fluorescence signal was observed 72 hours later using a fluorescence confocal microscope DMi8 (Leica, Germany).

Verification of transcriptional autoactivation activity
In order to further verify whether TaMIF4-5D and TaZHD6-3B have transcriptional autoactivation activity, yeast two-hybrid (Y2H) system was used. Primers were designed for the TaMIF4-5D and TaZHD6-3B genes and the full-length sequences were cloned into the pGBKT7 vector. Then the recombinant vectors (pGBKT7-TaMIF4-5D and pGBKT7-TaZHD6-3B) and pGADT7 were transformed into yeast AH109 chemically competent cell. The vector pGBKT7-53 and pGBKT7-Lam were used as positive control and negative control, respectively. The transformed yeast cells were cultured at 30˚C for three days.

Identification of ZF-HDs and conservative domain analysis in wheat
After detecting domains by Pfam and InterProScan, we finally identified 37 TaZF-HDs from the wheat genome, which is relatively more in numbers than that of rice and maize. Multiple sequence alignment of the 37 proteins was performed using DNAMAN software, and the align results of conserved domains were combined the online serve WebLogo3 (http://weblogo. threeplusone.com/) (S1 Fig). Based on the domains of ZF-HD, the 37 TaZF-HD genes could be divided into TaZHD and TaMIF subfamily. The ZHD subfamily has ZF domain and HD domain, while MIF subfamily only has ZF domain. According to the distribution of genes on the wheat chromosome, genes of TaZHD and TaMIF subfamily were renamed ( TaZHD1-1A to TaZHD14-U, TaMIF1-3A to TaMIF5-5D), and the wheat triplet genes were distinguished by chromosomal location (TaZHD2-1A, TaZHD2-1B, TaZHD2-1D). The results of multiple sequence alignment indicated that both HD and ZF domain are highly conserved in wheat.
From the preliminary analysis of the ZF-HD gene domain by Pfam and DNAMAN, the ZF-HD family members obtained in this study were highly reliable and could be used for subsequent bioinformatics analysis.

Construction of the phylogenetic tree and prediction of TaZF-HD protein characteristics
To better understand the evolutionary history and relationships of the ZF-HD genes in wheat, a phylogenetic tree was generated using NJ method with the full-length amino acid sequences from Arabidopsis, rice, maize, and wheat. According to the phylogenetic tree (Fig 1), the TaZF-HD family could be clustered into six groups, named as Groups I-VI. Group IV contained the most members of ZF-HD proteins (24), followed by group III with 20 members. Moreover, Group V included the least members (10). Group I, IV, V, and VI only contained ZHD subfamily members, group II had three members of TaZHD subfamily and TaMIF subfamily, respectively, and group III had only TaMIF subfamily members. The prediction results of 37 protein features and subcellular localization were listed in Table 1. The length of protein sequences of the TaZF-HDs was between 93-465 amino acids. Among them, because the members of the MIF subfamily contained only one ZF domain, their protein sequences are shorter than ZHD subfamily, and the longest sequence in MIF subfamily was 96 amino acids. The predicted molecular weight (MW) of ZF-HDs was ranging from 9.84197 kDa to 50.6004 kDa. The theoretical isoelectric point (pI) of the ZF-HD proteins was range from 6.17 to 9.89. In this study, most ZF-HD proteins were alkaline proteins or weak alkaline proteins. At the same time, only one of 37 genes had an instability index lower than 40, which was relatively stable. The GRAVY value was range from -1.14 to -0.309, which showed that all members of the ZF-HD family were hydrophilic proteins. According to the results of subcellular localization, all the predicted genes were located in the nucleus except TaZF-HD4-1D, which was located in chloroplast.

Conserved motifs, gene structure and three-dimensional structure analysis
Motif is a highly conserved amino acid sequence. It is generally believed that a conserved motif will make a protein have a corresponding function or structure. According to the predicted motif of the MEME server (Fig 2C), it can be seen that although the protein length of the TaZF-HD genes ranged from 93 to 465 amino acids, the members of the TaZF-HD family had similar conserved motifs. For example, all members of ZF-HD family had motif 1, which showed that motif 1 may represent the ZF domain. And all the ZHD subfamily members had motif 2 and motif 3, which represent the HD domain.
In order to study the intron-exon structure of the TaZF-HD genes, the gene structures were drawn using GSDS2.0 based on the TaZF-HD genome annotation information. The result showed that many TaZF-HD genes lack introns, only seven TaZF-HD genes have introns, while all members of the TaMIF subfamily had no introns (Fig 2B). Among the 37 TaZF-HDs, TaZHD11-5B had the longest intron sequence and the longest untranslated regions (UTR), 16 of the 37 genes only had coding sequence (CDS). The lengths of the CDS in the same subfamily were highly similar, indicating that the TaZF-HDs structures were relatively conserved.

PLOS ONE
The SWISS-MODEL in the intensive mode was used to analyze the three-dimensional structure of the 37 TaZF-HD proteins. The result showed that the three-dimensional structures of the same subfamily are very similar. And the structures of the ZHD subfamily members are more complex than that of the MIF subfamily members, which is because the proteins of the ZHD family have longer sequences (Fig 2D).

Key cis-elements in the promoters of TaZF-HD genes
The cis-acting regulatory element is a specific motif that can be bound with appropriate transcription factors to regulate gene transcription in plants, and facilitate plant response to environmental stimuli [49]. We could identify the differences in function and regulation basing on the distribution of different cis-acting elements in promoter region of genes. To identify putative cis-acting elements in the promoter region of TaZF-HDs, the 1500 bp upstream regions of genes before transcriptional start site (ATG) were scanned. The promoters of TaZF-HDs contained 53 kinds of putative response factors, including G-box involved in light responsiveness, CAAT and TATA-box involved in growth and development, ABRE involved in the abscisic acid responsiveness, TGA-element related to auxin-responsive, GARE-motif and P-box involved in gibberellin-responsive, etc (Fig 3). Among of these cis-acting elements, the types of cis-elements related to biotic/abiotic stress were the most, but the most abundant cis-elements were related to growth and development. Furthermore, the most widely distributed element was CAAT-box, which was contained in all TaZF-HD genes. Also, the TATA box, though it is not present in all genes, it has the largest total number. In the members of the TaZF-HD family, the promoter upstream sequence of TaZHD12-5A had the largest number of responsive elements (91); while TaMIF5-5B and TaZHD8-4D only contained 11 elements each.

Chromosomal location and gene duplication analysis of TaZF-HDs
Generally, gene duplications are considered to be one of the primary driving forces of species evolution [50]. Gene duplication includes segmental duplication and tandem duplication.

PLOS ONE
Before analyzing the gene duplication events, we should determine the location of genes on the chromosome. Chromosome mapping revealed that the 37 ZF-HD genes were unevenly distributed across among the 18 chromosomes of wheat, there was no ZF-HD gene on 7 chromosomes (Fig 4A). While the 3A, 5A, 5B, and 5D chromosome had the highest number ZF-HD genes (four on each chromosome). Due to the incomplete sequencing of wheat genome, TaZHD14-U distributed on the unknown chromosome. There were 16 groups of gene duplication events, of which one group is a tandem duplication and the remaining groups are segmental duplication events (S5 Table). Moreover, there were some homologous genes, such as TaZHD6-3A, TaZHD6-3B, and TaZHD6-3D, derived from 3A, 3B, and 3D chromosomes of wheat, respectively. To better understand the evolutionary factors affecting TaZF-HD gene family, we calculated Ka/Ks (nonsynonymous substitution rates/synonymous substitution rates) value to infer the selection force during the evolution of genes [51]. In general, Ka/ Ks > 1 represented positive selection of evolution acceleration; Ka/Ks = 1 represented neutral gene drift; and Ka/Ks < 1 represented purifying selection under functional constraints. The Ka/Ks values of the 13 TaZF-HDs homologous pairs were less than 1, and the values of other 3 pairs were 0 ( Fig 4B and S6 Table). These results indicate that the gene family may have experienced powerful purification selection pressure during the process of wheat evolution to eliminate harmful mutations at the protein level, which may help to promote maintain the stability of the gene functions.

Homologous gene pairs and synteny analysis of TaZF-HDs
In our research, 7, 22, 11 members of the ZF-HD family were identified in T. urartu, T. dicoccoides, and Ae. Tauschii, respectively. A comprehensive phylogenetic tree was established by NJ method (Fig 4C). Then a homologous analysis of wheat and three sub-genome donor species was performed, and the orthologs and paralogs were clustered. Orthologous refered that two genes in different species but derive from a common ancestor, paralogs described homologous genes within a single species and generated by gene duplication [52]. A total of 82 homologous genes were identified (Fig 4D), 65 homologous gene pairs were related to T. aestivum, of which 16 were paralogs gene pairs. Moreover, there were 9, 26, and 14 orthologs between T. aestivum and T. urartu, T. dicoccoides, and Ae. Tauschii.
In order to further analyze the evolution and homology relationship of the wheat ZF-HD family, the synteny analysis between wheat and other four species (Arabidopsis, rice, maize and tomato) were analyzed. The synteny analysis showed that there were multiple pairs of ZF-HD orthologous genes in wheat and other species except Arabidopsis. The number of orthologous genes were 23 (between wheat and rice), 36 (between wheat and maize), and 6 (between wheat and tomato) (Fig 4E). These genes may come from the same ancestral genes. Notably, there were more homologous genes between wheat and maize and rice (Wheat, maize, and rice are all monocotyledonous plants.), which is consistent with the evolutionary subordination of species.

MiRNA targets and protein interaction network analysis of TaZF-HD genes
MicroRNA (miRNA) is a class of non-coding single-stranded RNA molecules with a length of about 22 nucleotides encoded by endogenous genes. They have a variety of important regulatory effects in animal and plant cells, and can degrade mRNA of target gene or inhibit protein translation [53]. In the complex regulatory network of target genes and miRNAs, each miRNA can regulate multiple target genes, and a gene also could be precisely regulated by several miR-NAs. The study of the regulatory relationship between miRNAs and genes will help us further understand the function of the target genes. Therefore, we systematically analyzed potential miRNAs that may target the wheat ZF-HD gene family and hope to provide information support for gene regulation mechanisms. We identified 14 miRNAs targeted to 20 TaZF-HD genes, and then used Cytoscape software to establish a relationship network (Fig 5A). By analyzing the regulatory network, we found that tae-miR9664-3p targets six TaZF-HD genes (TaZHD12-5A, TaZHD12-5D, TaZHD12-5B, TaMIF1-3A, TaMIF2-3A, TaZHD11-5A); tae-miR5384-3p targets four genes (TaZHD5-2D, TaZHD5-2A, TaZHD5-2B, TaZHD6-3B). Besides, tae-miR9776, tae-miR9666a-3p, tae-miR531, tae-miR1119 and tae-miR1133 have three targeted genes, respectively. Overall, by predicting the potential miRNAs of genes, understanding their important roles in plant development and stress resistance, has an important role in the study of gene function. According to the PPI prediction results (Fig 5B), we found 10 proteins that interact with the TaZF-HD proteins. Among these 10 proteins, 4 are homeobox leucine zipper (HD-Zip) protein, 3 are mediator complex 25 (MED25), and the remaining three proteins include a GATA-family transcription factors, a cytochrome P450 (CYP450) enzyme and a macrophage migration inhibitory factor (MIF). Besides, there are multiple interactions between TaZF-HD genes.

Multiple conditional transcriptome analysis of TaZF-HDs
High-throughput technologies, such as RNA-Seq analysis, have been widely used for differential expression analysis [28]. In order to investigate the specific expression profiles of ZF-HD genes, the original RNA-seq data were downloaded. Then these data were sorted and expression heatmap of the TaZF-HD genes were generated using the R package 'pheatmap'. As shown in Fig 6, three genes, TaMIF4-5A, TaMIF4-5B, and TaMIF4-5D, had the higher expression levels and were expressed in almost treatments and tissues in different wheat development stages, which indicated that these genes may play critical roles in wheat developmental stages or under various stresses. Besides, the expression levels of TaZHD6-3A, TaZHD6-3B, TaZHD6-3D, TaZHD11-5D, TaZHD11-5A and TaZHD5-2D were also relatively high in biotic stresses (F. graminearum, Powdery mildew, Stripe rust) and abiotic stresses (PEG6000, Phosphorous starvation, drought). In addition, most genes (such as TaZHD1-1A, TaZHD2-1B, TaZHD9-4A and TaMIF1-3A) had low expression levels under the various conditions and were expressed only in specific periods or under specific treatments, which indicated these genes may have potential functions.

Real-time quantitative PCR and data analysis
The expression patterns of genes are often closely related to their functions. Our analysis indicated that most of the ZF-HD genes were expressed at low levels, but with tissue-specificity. In order to further reveal the expression level and potential functions of the TaZF-HD genes involved in biotic stresses (F. graminearum) and abiotic stresses (NaCl, PEG, and ABA), qRT-PCR was used to analyze expression patterns of four selected TaZF-HD genes (TaMIF4-5D, TaZHD6-3B, TaZHD8-4D, TaZHD11-5D). As shown in Fig 7, TaMIF4-5D was up-regulated at 12 h and 24 h after ABA treatment in root compared with control; The expression levels of TaZHD6-3B and TaZHD11-5D were higher than that of control, TaZHD6-3B had the highest expression at 48 h, while TaZHD11-5D had the highest expression at 12 h. Under ABA treatment in leaf, the results showed that the expression level of TaMIF4-5D was not significantly different with control at 72 h and 120 h, but lower than control at other points; the expression levels of TaZHD6-3B and TaZHD8-4D were up-regulated at 6 h and 120 h, while TaZHD11-5D was up-regulated at 72 h and 120 h. Under PEG-induced drought stress condition in root, the expression levels of TaMIF4-5D and TaZHD11-5D were up-regulated at 12 h and 72 h, and the expression level at other points were fluctuating. Compared with the control, the expression level of TaZHD8-4D was low at each stage. As for PEG stress in leaf, the expression of TaMIF4-5D, TaZHD6-3B, and TaZHD8-4D were lower than that of control, and expression of TaZHD11-5D was increased at 72 h and 120 h. In the case of NaCl stress in root, the expression of TaMIF4-5D was lower than that of control, TaZHD8-4D exhibited a significant improvement at 24 h, and the expression of TaZHD11-5D was up-regulated at 12 h and Fig 7. qRT-PCR analysis of TaZF-HD genes under biotic/abiotic stresses. The data were analyzed by three independent repeats, and standard deviations were shown with error bars. The expression level of TaZF-HDs genes was drawn using Origin software. � indicates significant differences that compared with control at p < 0.05.
https://doi.org/10.1371/journal.pone.0256579.g007 48 h. Under NaCl stress in leaf, the expression of TaMIF4-5D was significantly up-regulated at 72 h, while the expression level of TaZHD6-3B was decreased than that of control throughout the treatment, TaZHD11-5D was up-regulated at 6 h, down-regulated at 12 h, and then but decreased at 120 hours. After inoculation with F. graminearum, the fluctuations of the expression levels of the four genes were light at different periods. The expression levels of TaMIF4-5D and TaZHD8-4D were down-regulated at 48 h and 120 h. The expression levels of TaMIF4-5D at other time points were not significantly different from the control, while the expression levels of TaZHD8-4D were up-regulated at 48 h and 72 h; the expression of TaZHD6-3B was increased than control at 24 h, while had no significant difference with control at 48 h and 72 h. The expression of TaZHD11-5D was increased at 24 h and 48 h compared with control, after that, the expression reduced to some extent. Considering, these results suggested that different TaZF-HD genes had various response to different stress treatments.

Verification of subcellular localization and transcriptional autoactivation activity
Transcription factors is a class of DNA-binding proteins that can specifically interact with ciselements in DNA or RNA, and can activate or inhibit the transcription and expression of certain genes. A typical transcription factor is composed of four functional regions: nuclear localization signal region (NLS), DNA binding region (BD), transcription control region and oligomerization site. In this study, the results of subcellular localizations of TaMIF4-5D and TaZHD6-3B are predicted to be in the nucleus. In order to further validate the location of TaMIF4-5D and TaZHD6-3B, we performed the transient expression in N. benthiana. Our study indicated that TaZHD6-3B was located in nucleus, while TaMIF4-5D was located in the cell membrane and nucleus (Fig 8A).
The transcriptional autoactivation activity result showed that yeast cells with fusion expression vectors pGBKT7-TaMIF4-5D and pGBKT7-TaZHD6-3B can grow on SD-Trp-Leu plates, but cannot grow on SD-Trp-Leu-His-Ade plates (Fig 8B), indicating that these two transcription factors have no transcriptional autoactivation activity.

Discussion
As one of the most important transcription factor families, ZF-HD family has been reported in many plants. The ZF-HD family can be divided into ZHD subfamily and MIF subfamily, and they all have ZF domain, while the MIF subfamily lacks the HD domain. Unlike the ZHD gene, MIF is only found in seed plants. Previous study found that the ZHD gene appeared earlier than MIF in land plants, indicating that the original MIF gene may be derived from the ZHD gene [4]. Although previous studies have identified 17 in Arabidopsis, 15 in rice, 24 in maize, 20 in tartary buckwheat, 22 in tomato, and 31 in Chinese cabbage, there is a lack of research on the ZF-HD family in wheat. Based on the edible and economic value of wheat, it is necessary to study the function of ZF-HD family in wheat. In this study, 37 candidate ZF-HD genes were identified in wheat genome, including 28 members of ZHD subfamily, 9 members of MIF subfamily. Compared with Arabidopsis, rice, and maize, wheat has relatively more members in the ZF-HD family. In order to reveal the phylogenetic relationship of the ZF-HD family, a phylogenetic tree was constructed, and a total of 93 members from wheat, Arabidopsis, rice and maize were divided into six groups (Fig 1). In group 2 and group 5, there were genes from wheat, rice and maize, but no genes from Arabidopsis. It is speculated that this may be due to the separate evolution of the species after the differentiation of monocotyledon and dicotyledon.
Gene structure analysis is a key to reveal the function of genes. In present study, our results show that multiple TaZF-HD genes (30) do not contain intron, which is consistent with the results in Arabidopsis, rice, maize, and tartary buckwheat [18]. Studies have shown that genes with smaller or fewer introns may be more effectively expressed in stressful environments [54]. Therefore, these TaZF-HD genes may respond quickly under the external environment. For instance, TaMIF4-5A, TaMIF4-5B, and TaMIF4-5D had no introns, and they had efficient  (Figs 2B and 6). Besides, a total of 20 conserved motifs were identified in wheat ZF-HD family (Fig 2C). Among them, motif 1 present in all TaZF-HD members and correspond to the ZF domain, motif 2 and motif 3 correspond to the HD domain. In general, the members of ZF-HD family in the same group shared similar gene structure and motif composition, which indicates that they may play similar functions. Three genes (TaMIF4-5A, TaMIF4-5B, and TaMIF4-5D) were highly expressed in almost all selected wheat growth and development stage and stress treatments (Fig 6), which indicated that these genes may play important roles in growth and development. According to the analysis results of cis-acting elements, we found several mainly enriched elements of TaZF-HDs: TATA-box, G-box, CAAT-box, ABRE, TGA-element, GARE-motif and P-box, which were related to biotic/abiotic stress or plant growth and development. The multiple cis-elements of TaZF-HD gene promoters signify their regulatory effects on gene expression during different growth stages and under environmental stimuli.
In this study, a total of 16 duplication events with 19 TaZF-HD genes were found from wheat genome, of which 15 were segmental duplication. The Ka/Ks analysis of these gene pairs showed the selective power during the evolution of genes. In these duplication events, Ka/Ks value of 13 events was less than 1 (Fig 4B), which indicated that ZF-HD family may have experienced strong purification selection pressure during the evolution of wheat to eliminate harmful mutations at the protein level, which may help to promote the stability of the gene functions. The homologous analysis showed that 65 orthologous gene pairs were found between wheat and three sub-genome donor species. According to the syntenic analysis, there are 0, 23, 36, 6 orthologous gene pairs between wheat and Arabidopsis, rice, maize, tomato, respectively. These genes may come from same ancestral genes. Moreover, some TaZF-HD have orthologous gene pairs in tomato, maize, and rice, such as TaZHD2-1D, TaZHD10-4B, and TaZHD6-3B, indicating that these genes could already exist before speciation. Some TaZF-HD genes only found orthologous gene pairs in rice and maize, such as TaZHD5-2A and TaZHD6-3B, suggesting that these orthologous gene pairs appeared after the divergence of monocotyledonous and dicotyledonous plants.
As a small non-coding RNA, miRNA plays an important role in eukaryotes. Here, a total of 14 miRNAs were identified. Previous studies have shown that the expression of tae-miR9664 and its target genes play a key role in the resistance to wheat stripe rust [55], and we found that tae-miR9664 regulates six genes in TaZF-HDs. The prediction results of PPI found that there are interactions between members of the TaZF-HD family, and the TaZF-HDs also interacts with other proteins. It is speculated that the proteins of TaZF-HD family may regulate gene expression or participate in different physiological functions by forming homotypic or heterodimers.
Studies have shown that when plants are stimulated by unfavorable environments, ZF-HD gene expression usually responds to adversity [3]. In this study, qRT-PCR analysis was used to identify the expression patterns of four TaZF-HD genes under biotic and abiotic stresses. The results showed that the expression of the four genes (TaMIF4-5D, TaZHD6-3B, TaZHD8-4D, and TaZHD11-5D) were significantly down-regulated under F. graminearum inoculation, indicating that they played a critical role in responding to F. graminearum infection. In addition, these four genes were significantly up-regulated at different periods under ABA treatment, indicating that these genes can respond quickly to ABA stress. Three genes (TaMIF4-5D, TaZHD6-3B, and TaZHD11-5D) were up-regulated under PEG-simulated drought treatment, indicating that the three genes may have effects on responding to drought stress. TaZHD8-4D and TaZHD11-5D were up-regulated under NaCl treatment at a certain point, indicating that they may play roles in responding to salt stress (Fig 7).
Since transcription factors have nuclear localization signal regions (NLS), most transcription factors are located in the nucleus, but there are exceptions. For example, in Arabidopsis, the heat shock transcription factor AtHsfA6a was located in the cytoplasm [56]. We identified the TaZHD6-3B located in the nucleus, and the TaMIF4-5D located in the nucleus and cell membrane. The results showed that these two genes can be located in the nucleus, indicating that they may have functions as transcriptional factor. The transcriptional autoactivation experiment using yeast two-hybrid showed that both TaZHD6-3B and TaMIF4-5D have no autoactivation activity.

Conclusion
In this experiment, 37 TaZF-HD genes were identified, which can be divided into ZHD subfamily and MIF subfamily according to the domains. The chromosome mapping result showed that except for TaZHD14-U, other genes are located on 18 wheat chromosomes. Ka/Ks analysis showed that the TaZF-HD family have experienced powerful purification selection during the evolution process. The qRT-PCR of four selected TaZF-HD genes showed that these genes almost all responded significantly to biotic and/or abiotic stresses, and different members showed different expression patterns. In addition, we have performed some basic molecular biology experiments to verify function of the TaZF-HD gene family members. The results of this study provide the original information and theoretical basis for the subsequent study of TaZF-HDs, and will lay a foundation to improve wheat quality traits in molecular breeding program.