Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Genetic diversity and population structure analyses of tropical maize inbred lines using Single Nucleotide Polymorphism markers

  • Rodreck Gunundu ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing

    rodreck.gunundu@seedcogroup.com

    Affiliations African Centre for Crop Improvement (ACCI), College of Agriculture, Engineering and Science (CAES), University of KwaZulu-Natal, Scottsville, Pietermaritzburg, South Africa, Seed Co, Rattray Arnold Research Station, Harare, Zimbabwe

  • Hussein Shimelis,

    Roles Conceptualization, Investigation, Methodology, Supervision, Validation, Visualization, Writing – review & editing

    Affiliation Seed Co, Rattray Arnold Research Station, Harare, Zimbabwe

  • Seltene Abady Tesfamariam

    Roles Conceptualization, Investigation, Methodology, Validation, Visualization, Writing – review & editing

    Affiliation Seed Co, Rattray Arnold Research Station, Harare, Zimbabwe

Correction

20 Jan 2026: Gunundu R, Shimelis H, Tesfamariam SA (2026) Correction: Genetic diversity and population structure analyses of tropical maize inbred lines using Single Nucleotide Polymorphism markers. PLOS ONE 21(1): e0341231. https://doi.org/10.1371/journal.pone.0341231 View correction

Abstract

Analyses of the genetic distance and composition of inbred lines are a prerequisite for parental selection and to exploit heterosis in plant breeding programs. The study aimed to assess genetic diversity and population structure of a maize germplasm panel comprising 182 founder lines and 866 derived inbred lines using Single Nucleotide Polymorphism (SNP) markers to identify genetically unique lines for hybrid breeding. The founder lines were genotyped with 1201 SNPs, and the derived lines with 1484 SNPs. Moderate genetic variation, with genetic diversity ranging from 0.004 to 0.44 with a mean of 0.25, was recorded for the founder lines, while corresponding values of 0.004 to 0.34 with a mean of 0.13 were recorded for the derived lines. Heterozygosity values ranging from 0.00 to 0.24 and a mean of 0.08 were recorded for both lines. Of the SNP markers used, 82% of the 1201 markers and 84% of the 1484 markers exhibited polymorphism information content ranging from 0.25 to 0.50. Analysis of molecular variance revealed significant genetic differences (P ≤ 0.001) among and within populations in the founder and derived lines. Most detected variations, i.e., 97% and 88.38%, were attributed to within populations in the founder and derived lines, respectively. Population structure analysis identified three distinct subpopulations among founder lines and two among derived lines. Cluster analysis supported the population structure The following genetically distant founder and derived inbred lines were selected: G15NL337 and G15NL312 (Cluster 1), 15ARG152 and RGS-PL44 (Cluster 2), RGS-PL44 and 15ARG149 (Cluster 2), and RGS-PL33 and RGS-PL44 (Cluster 2), respectively. The selected lines are genetically distinct and recommended for marker-assisted hybrid maize breeding to exploit the frequency of beneficial alleles. This study provides valuable insights for maize breeding programs, enabling the exploitation of beneficial alleles and contributing to improved crop yields and food security through hybrid breeding.

Introduction

Globally, maize (Zea mays L., 2n = 2x = 20) is the second most widely cultivated cereal crop after wheat, with an estimated production area of 204 million hectares and 1,163 million tonnes of grain production annually [1]. Maize plays a pivotal role in food security and global economies. It is the leading staple food in developing countries and a crucial raw material for the livestock and processing industries [2, 3]. Maize is a strategic crop in Africa’s food systems, providing approximately 30% of the continent’s energy intake. In Africa, smallholder farmers produce maize and trade it in local markets for household food security and cash income.

South Africa is the leading maize producer in Africa, with a production of 16.1 million metric tonnes per annum, and ranks as the world’s ninth largest maize exporter after the USA, Brazil, Argentina, Ukraine, Romania, France, Paraguay and Poland [4, 5]. Other important maize producers in Africa are Nigeria (12.9 million tonnes of grain per year), Ethiopia (10.2 million tonnes), and Egypt (7.5 million tonnes) [1]. The total maize production in Africa is 93 million tonnes annually, far below the demand of approximately 150 million tonnes. The grain deficit is met through imports notably from Argentina, Ukraine and Brazil. In 2020, Argentina was the top maize exporter to Africa, accounting for a monetary value of $1.8 billion, followed by Ukraine ($856 million) and Brazil ($824 million) [1].

Africa accounts for only 8% of the global maize production. The mean maize grain yield in the region is low at 2.1 t/ha compared to the worldwide average of 5.8 t/ha. This yield gap is attributed to a combination of abiotic stresses (e.g., heat and drought stress, flooding, waterlogging, poor soil fertility, and soil erosion) [69] and biotic stresses, including plant diseases (e.g., grey leaf spot, maize streak virus, maize lethal necrosis disease, Phaesopharia leaf spot and northern corn leaf blight), parasitic weeds (e.g. Striga species) and insect pests (e.g., the fall armyworm, stem borers, cutworms, termites and leafhoppers). These stresses contribute to substantial yield losses and crop failure in the region [3, 10]. Therefore, genetic innovations and modern production technologies are crucial to improving the yield potential and closing the yield gap.

Hybrid maize breeding is critical in developing resilient, locally adapted, and high-performing cultivars. Hybrids achieve increased yields, withstand diseases and pests more effectively, and provide better nutritional content. Furthermore, they demonstrate exceptional resilience to drought, heat, flooding, and poor soil quality [11]. The success of hybrid breeding depends on heterosis, or hybrid vigour, which is maximized by selecting genetically distant and contrasting inbred lines [12].

Genetic variation allows for the selection of favourable genetic combinations among different and complementary parents [1316]. A well-characterized genetic resource is essential for identifying potential parents, heterotic groups, and guiding conservation [1719]. Analyses of genetic structure and diversity provide valuable insights into the relationships between breeding lines, which can guide hybrid breeding [20, 21]. Several studies have assessed the population structure and genetic diversity among varied maize populations and contrasting test environments and marker systems. For instance, Lu et al. [13] analyzed 770 maize lines, identifying distinct population structures and genetic divergence between temperate and subtropical/tropical germplasm using 1,034 SNPs. Yan et al. [22] reported well-delineated genetic structures between temperate and tropical lines among 632 inbred lines. Adu et al. [20] assessed genetic diversity in 94 tropical maize inbred lines, clustering them by pedigree, selection history, and endosperm color. Furthermore, Wen et al. [23] examined 359 maize inbred lines developed by CIMMYT and IITA, displaying variable tolerance to abiotic and biotic stresses. The present and past findings indicate the need for rigorous genetic diversity analysis of candidate test populations using high throughput SNP markers to appraise the genetic structure and lineage and guide selection and breeding. Notably, hybrids developed from diverse heterotic groups consistently outperform their parents in grain yield and yield components traits [24].

Elite inbred lines can be assigned to distinct heterotic groups using phenotyping, pedigree analyses, and genetic distance estimates [20]. Morphological, biochemical, and molecular markers are commonly used in genetic diversity analysis and genetic grouping [25]. DNA markers (e.g., Single Nucleotide Polymorphisms (SNPs), Simple Sequence Repeats (SSRs), Restriction Fragment Length Polymorphisms (RFLPs), Randomly Amplified Polymorphic DNAs (RAPDs), and Amplified Fragment Length Polymorphisms (AFLPs)) have become complementary tools to phenotyping tools. Genetic markers have high repeatability with limited influence from genotype x environment interaction effects. SNPs have become the preferred choice of markers to genotype maize populations due to their low cost per data point, widespread presence in the genome, specific location at genetic loci, co-dominance, amenable for high-throughput analysis, and lower rates of genotyping errors [26, 27]. SNPs have been used to identify distinct subpopulations [28], determine genetic diversity within and between landraces [29], assess genetic diversity of early maturing white and yellow tropical maize inbred lines [20], determine the rate of decay of linkage disequilibrium [30], and discern population structure [31].

Analyses of the genetic distance and composition of inbred lines are a prerequisite for parental selection and to exploit heterosis in hybrid breeding programs. Seed Co Ltd systematically bred and selected founder and derived elite maize inbred lines from two major heterotic groups to develop high-performing commercial single cross and three-way hybrids. However, there is lack of information on the genetic diversity and relationship of these lines to guide the regional maize breeding program. In this regard, the test lines should be characterized with diagnostics SNP markers to select genetically distinct and complimentary lines for marker-assisted hybrid maize breeding to exploit the frequency of beneficial alleles. Therefore, this study aimed to assess the genetic diversity and population structure of 182 founder lines and 866 derived inbred lines of maize using SNP markers to identify genetically complementary lines for hybrid breeding.

Materials and methods

Plant material

The study used 182 elite founder-inbred lines and 866 derived inbred lines of maize from tropical and subtropical genetic lineages. The founder lines, sourced from the Seed Co Ltd maize germplasm pool, are elite parental lines selected to develop improved maize varieties. These lines are widely utilized in breeding programs and are prominent in most released hybrid varieties in Zimbabwe. The 866 tropical inbred lines were developed from diverse source populations created by crossing founder lines, selected for their adaptability to tropical and subtropical environments in sub-Saharan Africa (SSA) after rigorous testing. The lines were selected based on desirable agronomic characteristics, such as high yield potential, drought tolerance, and disease resistance. Table 1 summarises the list of lines and their respective heterotic groups [N3 (group 1) and SC (group 2)]. The N3 heterotic group were originated from the Salisbury white landrace, which was cultivated in Salisbury (now known as Harare) before the introduction of hybrid maize in 1960. On the other hand, the SC group was obtained from a landrace grown on Mr Southey’s farm and was named "Southern Cross". The N3 was specifically designated as "Northern Cross" to highlight the contrast with the SC inbreds.

thumbnail
Table 1. List of the founder and derived lines used in the study.

https://doi.org/10.1371/journal.pone.0315463.t001

Genotyping of inbred lines

Sample collection

The lines were field-established at Rattray Arnold Research Station (RARS) in Zimbabwe. RARS is situated at Longitude 31°12′ 41.35″ E, Latitude 17°40′ 20.07″ S, at an altitude of 1360 metres above sea level. The climate is sub-tropical, with average monthly temperatures ranging from 28 to 32°C between November and April. The total annual precipitation received at RARS is 865mm, mostly between November and April. RARS is located in the mid-altitude moist environments, which are the primary maize-growing areas in Southern Africa. Ten kernel samples were collected for each line. The samples were placed in envelopes, which were sealed and accurately labelled. Healthy and disease-free kernels were sampled for genotyping.

DNA extraction

DNA extraction and SNP genotyping were performed in the Limagrain Laboratory in France following established protocols. Genomic DNA was isolated from maize kernels using the Kompetitive Allele-Specific (KASP) custom method, and genotyping was facilitated by the LGC KASP system (accessible at http://www.lgcgenomics.com). DNA extraction was done from ten kernels per inbred line, with positive controls included in the initial six wells of a 96-well plate. DNA purity and concentration were assessed using a Nanodrop device (Nano Vue Plus), and the DNA was diluted to concentrations ranging from 81 to 188 μg/L. The DNA was then arrayed onto 384 PCR plates and genotyped using 1201 and 1484 SNP KASP markers for the 182 founder lines and 866 derived tropical lines, respectively, covering all ten maize chromosomes. Polymerase Chain Reaction amplification was performed using Kompetitive Allele-Specific PCR (KASP) primers, generating sufficient DNA for genotyping. The KASP system facilitated the genotyping process, with scanning conducted using Pheraster SNP software and processing carried out using KlusterCaller software (http://www.lgcgenomics.com).

Data analysis

Analysis of molecular variance

Analysis of molecular variance (AMOVA) was conducted using GenAlEx 6.51b2 software [32], following the protocol outlined by [32] to partition genetic variations among and within populations attributable to SNPs.

Characterization of SNP markers and inbred lines

SNP markers with poor amplification, uncertain allele identification, or excessive missing data (>10%) were excluded, as were markers with a minor allele frequency of less than 5%. Inbred lines with the assosciated data missing above 10% were removed from the analysis. After filtering, 1201 SNP markers (S1 Table in S1 File) were used to genotype the 182 founder lines and 1484 markers (S2 Table in S1 File), were used to genotype the 866 derived lines. The inbred lines were genotyped using KlusterCaller software (http://www.lgcgenomics.com) to detect single nucleotide polymorphisms (SNPs) and insertions/deletions (indels), enabling a thorough evaluation of genetic variation. The DARwin6 software [33] was used to identify unique clustering patterns and structure, and generated dendrograms and phylogenetic trees to visualize the relationships between populations.

The 1201 and 1484 markers selected were distributed among the 10 maize chromosomes for the 182 founder and 866 derived lines, respectively. The number of markers per chromosome varied from 59 to 215 in the founder lines and 76 to 231 in the derived lines. The markers were spaced at regular intervals along each chromosome, based on genetic distance expressed in centimorgan (cM). This arrangement ensured full coverage and prevented the clustering of markers, giving a complete overview of the entire genome (Table 2).

thumbnail
Table 2. Positioning of the 1201 and 1484 SNP markers on the maize chromosomes using linkage analysis.

https://doi.org/10.1371/journal.pone.0315463.t002

Genetic parameters were computed, including major allele frequency (MAF), genetic distance (GD), polymorphic information content (PIC) and heterozygosity (He) using Power-Marker (version 3.2.5) statistical software [34]. GD represents the likelihood of two randomly chosen individuals being different at a specific locus, measuring expected heterozygosity [35]. Gene diversity (GD) was calculated as follows [36]: where

k = number of alleles,

= frequency of the marker allele.

Polymorphic information content (PIC) values estimate a marker’s discriminating power by considering the number of alleles. The formula for calculating the PIC was as follows [36]: where:

= frequency of the marker allele,

k = number of alleles,

= frequency of the marker allele,

P2u = frequency of the uth marker,

= frequency of the vth marker.

The PIC values were categorized as highly informative (PIC value of the marker >0.50), (ii), moderately informative (0.25 to 0.50) or slightly informative (<0.25) [34].

Marker call rate

The marker call rate was calculated by determining the proportion of successful genotyping calls for each marker across all samples.

Marker call rate = (Number of successful genotyping calls / Total number of samples) x 100

Where:

  • Number of successful genotyping calls = the number of samples for which a genotype was successfully called for a given marker.
  • Total number of samples = the total number of samples genotyped for a given marker.

Population structure analysis

Genetic data from 1201 and 1484 SNP markers were analyzed using the admixture model-based clustering method in STRUCTURE 2.3.4 software [37] to infer the population structure of 182 founder lines and 866 derived lines, respectively. The analysis settings included a burn-in period of 20,000 iterations, a Markov chain Monte Carlo (MCMC) simulation length of 100,000, and six independent runs for each K value (ranging from 1 to 7) for the 182 founder lines and ten independent runs for each K value (ranging from 1 to 11) for the 866 derived lines. The optimal number of populations (K) was estimated using the Evanno et al. (2005) method, as implemented in the online STRUCTURE Harvester tool [37].

Cluster analysis

The genetic relationships among the inbred lines were discerned using DARwin software [33]. The neighbor-joining method (NJ) [38], was used to construct phylogenetic trees with 500 bootstraps using the 1201 SNP marker data for the 182 founder lines and 1484 SNP marker data for the 866 derived lines. Continuous dissimilarity indices were generated using the standard Euclidean similarity test using the formula described by Perrier and Jacquemoud-Collet [33], enabling the construction of dendrograms that illustrate the genetic distances between the lines:

Where; dji is the similarity between units i and j; Xij, Xjk are values for variable k for units I and j,

X represents the global mean, and k indicates the number of variables. The output from DARwin was imported into FigTree version 1.4.3 [39]. software to construct the final phylogenetic trees.

Results

Summary statistics for the 182 founder and 866 derived lines

Table 3 presents the genetic diversity parameters based on SNP markers for the 182 founder and 866 derived lines. The observed heterozygosity for the 182 founder lines was 0.08, ranging from 0.01 to 0.24. Gene diversity ranged from 0.00 to 0.44, with a mean of 0.25, indicating a moderate level of genetic variation. The mean major allele frequencies was 0.85, ranging from 0.50 to 0.99. The polymorphic information content (PIC) ranged from 0.00 to 0.50, with a mean of 0.37. Most markers (82%) were moderately informative (PIC ≥ 0.25), while 18% were slightly informative (PIC < 0.25) (Fig 1). Minor allele frequencies ranged from 0.10 to 0.50, with a mean of 0.28. The marker call rate was high, with a mean of 89.99%, varying from 85.00% to 95.00%.

thumbnail
Fig 1.

Frequency distribution and marker frequency of 182 parental inbred lines (a and b) and 866 derived inbred lines (c and d) calculated using 1201 and 1484 SNP markers, respectively, based on polymorphic information content.

https://doi.org/10.1371/journal.pone.0315463.g001

thumbnail
Table 3. Genetic diversity parameters for 182 founder and 866 derived inbred lines of maize calculated using 1201 and 1484 SNP markers, respectively.

https://doi.org/10.1371/journal.pone.0315463.t003

The major allele frequencies based on the 1,484 SNP markers ranged from 0.50 to 0.99, with a mean of 0.82. Gene diversity ranged from 0.00 to 0.34, with a mean of 0.25, while observed heterozygosity averaged 0.08, ranging from 0.00 to 0.21. The PIC values ranged from 0.00 to 0.50, with a mean of 0.41. Most markers (84%) were moderately informative (PIC ≥ 0.25), while 16% were slightly informative (PIC < 0.25) (Fig 1). The minor allele frequencies ranged from 0.08 to 0.50, with a mean of 0.30. The marker call rate ranged from 37.22% to 100%, with a mean of 96.64%, indicating a high genotyping success rate.

Analysis of molecular variance

Analysis of molecular variance (AMOVA) revealed significant genetic differences (P ≤ 0.001) among and within populations (Table 4). Most genetic variation was attributed to within-population variation, accounting for 97% and 88.38% of the total variation in the founder and derived lines, respectively (Table 4).

thumbnail
Table 4. Summary of analysis of molecular variance comparing among and within maize populations of 182 founder inbred lines and 866 derived inbred lines based on 1201 and 1484 SNP markers, respectively.

https://doi.org/10.1371/journal.pone.0315463.t004

Population structure of the germplasm panel

The population structure analysis demarcated the 182 maize lines into three distinct subpopulations (Fig 2), with ΔK peaking at K = 3. The tripartite division captured the underlying genetic diversity, with each cluster varying in size and composition. Subpopulation 1 consisted of 38 lines, Subpopulation 2 had 100 lines, and Subpopulation 3 had 44 lines (S4 Table in S1 File). Further, the 866 maize lines were partitioned into two subpopulations (Fig 3), with the highest ΔK value at K = 2, indicating a bipartite division. The two clusters, identified by the highest median log-probability values (Ln(Pr(Data))), differed in size and composition, with cluster 1 composed of 328 lines and cluster 2 having 538 lines (S5 Table in S1 File).

thumbnail
Fig 2. Three sub-populations discerned for the 182 founder inbred lines of maize genotyped using 1201 SNP markers.

A—Best Delta K estimation via the Evanno method. B—Estimated population structure of 182 maize inbred lines revealed by 1201 SNP markers for K = 3. Where, I = Sub-population 1, II = Sub-population 2, III = Sub-population 3.

https://doi.org/10.1371/journal.pone.0315463.g002

thumbnail
Fig 3. Two sub-populations resolved among the 866 derived inbred lines of maize using 1484 SNP markers.

A—Best Delta K estimation via the Evanno method. B—Estimated population structure of 866 maize inbred lines revealed by 1201 SNP markers for K = 2. I = sub-population 1, II = Sub-population 2.

https://doi.org/10.1371/journal.pone.0315463.g003

Cluster analysis

Genetic distance and cluster analysis of 182 founder inbred lines

Genetic distance estimates based on SNP markers among the 182 maize inbred lines revealed variable genetic diversity ranging from 0.006 (16AG16785 vs 16AG16786) to 0.435 (RGS-PL33 vs RGS-PL44) (S6 Table in S1 File). The mean genetic distance for all pairwise comparisons was 0.25, indicating moderate genetic diversity among the lines. Low genetic distances were detected between several pairs of inbred lines, including 16AG16786 and 16AG16785 (0.006), 16AG16801 and 16AG16802 (0.013), and RGS-PL17 and RGS-PL55 (0.021), suggesting a high degree of genetic similarity between these lines. In contrast, high genetic distances were estimated between RGS-PL33 and RGS-PL44 (0.435), 15AG152 and RGS-PL44 (0.432), indicating a more distant genetic relationship.

Genetic grouping based on population structure analysis confirmed the results of the cluster diagram that resolved the 182 genotyped inbred lines into three major clusters (Fig 4). Each cluster was partitioned into sub-clusters, with Cluster II being the largest (comprising 55% of parental inbred lines), followed by Cluster III (24% of inbred lines), and Cluster I (21% of inbred lines). These clusters corresponded to sub-populations 1 (red), 2 (green), and 3 (blue) from the structure analysis, respectively. The phylogenetic tree provided a visual representation of the genetic relationships among the inbred lines, supporting the findings of the population structure analysis.

thumbnail
Fig 4. Cluster diagram showing the relationships between 182 inbred lines based on 1201 SNP markers.

See S3 Table in S1 File for codes of genotypes.

https://doi.org/10.1371/journal.pone.0315463.g004

Genetic distances and cluster analysis of 866 derived inbred lines

The genetic distance between pairwise comparisons based on 1484 SNP markers for the 866 derived lines ranged from 0.004 to 0.336 (S7 Table in S1 File), with a mean genetic distance of 0.14. Most lines (77%) had genetic distances ranging from 0.004 to 0.20, while 33% had distances above 0.20, ranging from 0.20 to 0.34 (Fig 5). The lowest genetic distance (0.004) was recorded between the inbred lines G17NL211 and G17NL210. Other pairs of lines with low genetic distances included G17NL473 and G17NL472 (0.005), G17NL194 and G17NL472 (0.008), G16NL854 and G16NL857 (0.009), G17NL602 and G17NL603 (0.011), and G16NL919 and G16NL920 (0.015). Conversely, the highest genetic distance was recorded between lines G15NL337 and G15NL312 (0.336), followed by G15NL349 and G15NL310 (0.31), G15NL303 and G15NL357 (0.303), G15NL327 and G15NL353 (0.301), G15NL292 and G15NL284 (0.299), and G15NL355 and G15NL301 (0.298). Cluster analysis of the 866 derived lines based on SNP marker genetic distance estimates grouped the lines into two distinct clusters (Fig 6), agreeing with the structure analysis (Fig 3). Each cluster was partitioned into sub-clusters. Cluster I, corresponding to Sub-population 1 from the population structure analysis (highlighted green), was designated as heterotic group 1 and composed 328 lines (37.88%). Cluster II, corresponding to Sub-population 2 (highlighted red), was designated heterotic group 2 with 538 lines (62.12%).

thumbnail
Fig 5. The distribution of pairwise genetic distance calculated for 866 maize inbred lines genotyped with 1484 SNPs.

https://doi.org/10.1371/journal.pone.0315463.g005

thumbnail
Fig 6. Phylogenetic relationships among 866 derived inbred lines, as inferred from 1484 SNP markers revealing distinct clusters and branches.

See S5 Table in S1 File for codes of genotypes.

https://doi.org/10.1371/journal.pone.0315463.g006

Discussion

Knowledge of genetic diversity, structure, and genetic relationships among elite inbred line populations of maize is vital for selecting genetically diverse and complementary lines for hybrid breeding and enhancing heterotic groups. Hybrid varieties are the best yielders, buffer biotic and abiotic stresses and achieve economic returns. Best cross combinations are selected based on maximum heterosis, which requires the development of homozygous elite lines and a subsequent assortment of contrasting heterotic patterns. Inbred lines with shared ancestry may lead to heterogeneous populations, limiting hybrid vigour. Inbred lines can be crossed, and a new generation of derived lines can be selected that are genetically complementary and distinct from other heterotic populations [20, 27].

High-throughput genotyping methods effectively determine and delineate the genetic relationships among inbred populations. Diagnostic molecular markers provide a high level of accuracy in identifying the genetic constitution of individuals to guide hybrid breeding [13, 16, 40]. The SNP markers are widely used to assess genetic diversity and relationships [28, 41], select breeding parents [42, 43], and identify novel genes linked to economically important traits [44, 45]. Also, SNPS are the genetic markers of choice due to several advantages, including their low cost per data point, widespread presence in the genome, specific location at genetic loci, co-dominance, and lower rates of genotyping errors [26, 27].

The present study evaluated the extent of genetic diversity, population structure, and genetic lineage of a maize germplasm panel consisting of 182 founder lines and 866 derived inbred lines using 1201 and 1484 SNP markers, respectively. The analysis revealed insights into allele frequency, which is crucial for understanding the extent of genetic variation among populations [46]. The major allele frequency varied from 0.50 to 0.99 (Table 3), revealing adequate genetic variation among the test populations for selection and potential for marker-assisted breeding programs. The mean MaF values for the 1201 and 1484 SNP markers were 0.82 and 0.85, respectively, indicating a high frequency of dominant alleles. The mean MAFs were 0.28 and 0.30 for the 1201 and 1484 markers, respectively (Table 3), indicating a moderate distribution of alleles among the inbred lines. This was consistent with previous reports by [15, 47], but lower than those reported by [48, 49]. Minor alleles indicate genetic variations of a specific gene or genetic marker crucial for preserving diversity within a population [15, 23, 50].

The assessed tropical elite maize lines had an average observed heterozygosity of 0.24 (Table 3), indicating a high degree of homozygosity and genetic stability. However, since inbred lines with heterozygosity > 5% are considered impure, some of the lines in this study may need additional selfing to reach the desired level of genetic purity. These findings align with [20], who reported lower heterozygosity in early-maturing tropical maize inbred lines using SNP analysis. Low mean heterozygosity values of 0.20 were also reported by [51, 52]. The polymorphism information content predicts the relevance of a genetic marker for linkage analysis [31, 53]. In this study, the mean PIC values for the 1201 and 1484 markers were 0.37 and 0.41, respectively (Table 3), indicating that the SNP markers used were moderately informative. The moderate genetic polymorphism detected in this study is in line with the expected characteristics of bi-allelic SNP markers, which are limited to a maximum PIC value of 0.5. Nevertheless, the PIC values obtained in this study are still informative and can be used to evaluate the genetic diversity and relationships among the tropical elite maize lines [54, 55], and the lower mutation rates of SNPs compared to other genetic markers [56]. SNP markers are more precise in genetic analysis. However, SNPs show lower PIC values than other markers, such as SSRs [57].

Genetic distance (GD) measures the genetic difference or dissimilarity among genotypes in a population and can be used to infer their genetic relationships [20, 24, 49]. The present results showed considerable genetic variability among the inbred lines (Table 3). Genetic distances among the founder lines ranged from 0.006 to 0.44, with a mean of 0.25. The values suggest high genetic diversity and differentiation levels in the founder parental lines. The genetic distance values detected in the founder lines agree with previous studies in maize [20, 58]. However, these values were lower than those reported by other researchers [13, 28], who reported average genetic distance values of 0.32, as well as [59, 60], who reported values of 0.37. The derived elite lines showed moderate genetic distances (mean of 0.13), indicating a reasonable degree of genetic variation among individuals while sharing significant genetic similarity (Table 3).

The SNP panels used in this study identified adequate genetic polymorphisms among and within the inbred line populations (Table 4). The low moderate diversity detected within the founder and derived inbred lines were attributable to genetic drift, founding effects, artificial selection, genetic recombination and linkage disequilibrium. A higher degree of genetic diversity at 97% that accounted for the within-population genetic variation was computed for the founder inbred line populations. In contrast, the variation among populations was low, at 3%, suggesting high gene flow due to outcrossing, genetic drift, founding effect and artificial selection. In the derived lines, among-population variation was 11.62%, while the within-population variation was higher at 88.38% (p≤0.001) signifying that the majority of genetic variation was partitioned within the population, suggesting a high level of genetic heterogeneity within the population. The high genetic diversity within the population exhibited by both elite sets of lines is a valuable asset for breeding and conservation purposes.

Population structure analysis determines the genetic ancestry of inbred lines [61, 62]. The genetic analysis using the SNP analysis delineated three subpopulations (K = 3) for the founder lines (Fig 2). The SNP markers effectively categorized the inbred lines into heterotic groups based on their source populations, grouping individuals with similar genetic backgrounds into the same subpopulations. The identified population groups guide breeding programs to select parental lines. The current results agree with [20], who reported three subpopulations among 94 early maturing tropical maize inbred lines using SNP markers. Similarly, based on the neighbour-joining cluster analysis, the dendrogram (Fig 4) allocated the founder inbred lines into three genetic groups. Population analysis involving the 866 derived lines revealed two sub-populations (K = 2) (Fig 3). The inbred lines were assigned into heterotic groups based on similarity of ancestry and selection history. The genetic clustering based on the neighbour-joining cluster analysis supported the findings based on phylogenetic analysis. The grouping of the derived lines into two sub-populations is consistent with findings by [13], who identified two groups (K = 2) among 770 maize inbred lines using SNPs.

Conclusion

The present study assessed the genetic diversity and population structure comprising 182 founder lines and 866 derived inbred lines using diagnostic SNP markers and identified genetically unique lines for hybrid breeding. Higher genetic variations, at 97% and 88.38%, were attributed to within populations in the founder and derived lines, in that order. Population structure analysis identified three distinct genetic groups among founder lines and two among derived lines. Based on pairwise genetic comparison, the following founder and derived inbred lines were selected: G15NL337 and G15NL312 (Cluster 1), 15ARG152 and RGS-PL44 (Cluster 2), RGS-PL44 and 15ARG149 (Cluster 2), and RGS-PL33 and RGS-PL44 (Cluster 2), respectively. The selected lines are genetically distinct and recommended for marker-assisted hybrid maize breeding to exploit the frequency of beneficial alleles. The study identified novel genetically distant founder lines (i.e., 15ARG152 and RGS-PL44 and RGS-PL44 and 15ARG149) and derived lines (G15NL337 and G15NL312). The SNP markers identified with high polymorphism information content are valuable in genomic selection and genetic analysis and breeding. The core findings of the study are valuable references for maize breeding programs in Africa when using the current and related tropical-adapted populations.

Acknowledgments

Seed Co Limited is thanked for the PhD study support to the first author and the African Centre for Crop Improvement (ACCI), University of KwaZulu-Natal, for the technical and scientific support.

References

  1. 1. FAOStat. (2022). FAOSTAT. FAO Stat. FAO, Rome. https://www.fao.org/faostat/en/
  2. 2. Cairns J. E., Hellin J., Sonder K., Araus J. L., MacRobert J. F., Thierfelder C., & Prasanna B. M. (2013). Adapting maize production to climate change in sub-Saharan Africa. Food Security, 5(3), 345–360. -
  3. 3. Erenstein O., Jaleta M., Sonder K., Mottaleb K., & Prasanna B. M. (2022). Global maize production, consumption and trade: trends and R&D implications. Food security, 14(5), 1295–1319.
  4. 4. Dragomir V., Ioan Sebastian B., Alina B., Victor P., Tanasă L., & Horhocea D. (2022). An overview of global maize market compared to Romanian production. Romanian Agriculture Research, 39, 535–544.
  5. 5. Geyser J. M., Pretorius A., & Fourie A. (2024). Trends in and determinants of South African maize exports in the post-deregulation era. Journal of Economic and Financial Sciences, 17(1), 862.
  6. 6. Leitner S., Pelster D. E., Werner C., Merbold L., Baggs E. M., Mapanda F., & Butterbach-Bahl K. (2020). Closing maize yield gaps in sub-Saharan Africa will boost soil N2O emissions. Current Opinion in Environmental Sustainability, 47, 95–105.
  7. 7. Santpoort R. (2020). The drivers of maize area expansion in sub-Saharan Africa. How policies to boost maize production overlook the interests of smallholder farmers. Land, 9(3), 68.
  8. 8. Siatwiinda S. M., Supit I., van Hove B., Yerokun O., Ros G. H., & de Vries W. (2021). Climate change impacts on rainfed maize yields in Zambia under conventional and optimized crop management. Climatic Change, 167, 1, 39–23.
  9. 9. Thomas A. (2020). Improving Crop Yields in Sub-Saharan Africa—What Does the East African Data Say. IMF Working Papers, 20(95).
  10. 10. Beyene Y., Gowda M., Suresh L. M., Mugo S., Olsen M., Oikeh S. O., Juma C., Tarekegne A., & Prasanna B. M. (2017). Genetic analysis of tropical maize inbred lines for resistance to maize lethal necrosis disease. Euphytica, 213(9), 1–13. pmid:32009665
  11. 11. Varshney R. K., Bohra A., Roorkiwal M., Barmukh R., Cowling W. A., Chitikineni A.,… & Siddique K. H. (2021). Fast-forward breeding for a food-secure world. Trends in Genetics, 37(12), 1124–1136. pmid:34531040
  12. 12. Swarup S., Cargill E. J., Crosby K., Flagel L., Kniskern J., & Glenn K. C. (2021). Genetic diversity is indispensable for plant breeding to improve crops. Crop Science, 61(2), 839–852.
  13. 13. Lu Y., Yan J., Guimarães C. T., Taba S., Hao Z., Gao S., Chen S., Li J., Zhang S., Vivek B. S., Magorokosho C., Mugo S., Makumbi D., Parentoni S. N., Shah T., Rong T., Crouch J. H., & Xu Y. (2009). Molecular characterization of global maize breeding germplasm based on genome-wide single nucleotide polymorphisms. Theoretical and Applied Genetics, 120(1), 93–115. pmid:19823800
  14. 14. Melani M. D., & Carena M. J. (2005). Alternative Maize Heterotic Patterns for the Northern Corn Belt. Crop Science, 45(6), 2186–2194.
  15. 15. Prasanna B. M. (2012). Diversity in global maize germplasm: characterization and utilization. Journal of Biosciences, 37(5), 843–855. pmid:23107920
  16. 16. Zhou S., Wei F., Nguyen J., Bechner M., Potamousis K., Goldstein S., Pape L., Mehan M. R., Churas C., Pasternak S., Forrest D. K., Wise R., Ware D., Wing R. A., Waterman M. S., Livny M., & Schwartz D. C. (2009). A single molecule scaffold for the maize genome. PLoS Genetics, 5(11). pmid:19936062
  17. 17. Giordani W., Scapim C. A., Ruas P. M., Ruas C. de F, Contreras-Soto R., Coan M., Fonseca I. C. de B., & Gonçalves L. S. A. (2019). Genetic diversity, population structure and AFLP markers associated with maize reaction to southern rust. Bragantia, 78(2), 183–196.
  18. 18. Legesse B. W., Myburg A. A., Pixley K. V., & Botha A. M. (2006). Genetic diversity analysis of CIMMYT-mid-altitude maize inbred lines using AFLP markers. South African Journal of Plant and Soil, 23(1), 49–53.
  19. 19. Reif J. C., Xia X. C., Melchinger A. E., Warburton M. L., Hoisington D. A., Beck D., Bohn M., & Frisch M. (2004). Genetic Diversity Determined within and among CIMMYT Maize Populations of Tropical, Subtropical, and Temperate Germplasm by SSR Markers. Crop Science, 44(1), 326–334.
  20. 20. Adu G. B., Badu-Apraku B., Akromah R., Garcia-Oliveira A. L., Awuku F. J., & Gedil M. (2019). Genetic diversity and population structure of early-maturing tropical maize inbred lines using SNP markers. PLoS ONE, 14(4), 1–12. pmid:30964890
  21. 21. Buckler E. S., Gaut B. S., & McMullen M. D. (2006). Molecular and functional diversity of maize. Current Opinion in Plant Biology, 9(2), 172–176. pmid:16459128
  22. 22. Yan J., Shah T., Warburton M. L., Buckler E. S., McMullen M. D., & Crouch J. (2009). Genetic characterization and linkage disequilibrium estimation of a global maize collection using SNP markers. PloS one, 4(12), e8451. pmid:20041112
  23. 23. Wen W., Araus J. L., Shah T., Cairns J., Mahuku G., Bänziger M.,… & Yan J. (2011). Molecular characterization of a diverse maize inbred line collection and its potential utilization for stress tolerance improvement. Crop Science, 51(6), 2569–2581.
  24. 24. Talabi A. O., Badu‐Apraku B., & Fakorede M. A. B. (2017). Genetic variances and relationship among traits of an early maturing maize population under drought‐stress and low nitrogen environments. Crop Science, 57(2), 681–692.
  25. 25. Govindaraj M., Vetriventhan M., & Srinivasan M. (2015). Importance of genetic diversity assessment in crop plants and its recent advances: an overview of its analytical perspectives. Genetics research international, 2015(1), 431487. pmid:25874132
  26. 26. Mammadov J., Aggarwal R., Buyyarapu R., & Kumpatla S. (2012). SNP markers and their impact on plant breeding. International journal of plant genomics, 2012(1), 728398. pmid:23316221
  27. 27. Semagn K., Magorokosho C., Vivek B. S., Makumbi D., Beyene Y., Mugo S., Prasanna B. M., & Warburton M. L. (2012). Molecular characterization of diverse CIMMYT maize inbred lines from eastern and southern Africa using single nucleotide polymorphic markers. BMC Genomics, 13(1). pmid:22443094
  28. 28. van Inghelandt D., Melchinger A. E., Lebreton C., & Stich B. (2010). Population structure and genetic diversity in a commercial maize breeding program assessed with SSR and SNP markers. Theoretical and Applied Genetics, 120(7), 1289–1299. pmid:20063144
  29. 29. Arca M., Mary-Huard T., Gouesnard B., Bérard A., Bauland C., Combes V.,… & Nicolas S. D. (2021). Deciphering the genetic diversity of landraces with high-throughput SNP genotyping of DNA bulks: methodology and application to the maize 50k array. Frontiers in Plant Science, 11, 568699. pmid:33488638
  30. 30. Lu Y., Shah T., Hao Z., Taba S., Zhang S., Gao S.,… & Xu Y. (2011). Comparative SNP and haplotype analysis reveals a higher genetic diversity and rapider LD decay in tropical than temperate germplasm in maize. PloS one, 6(9), e24861. pmid:21949770
  31. 31. Zhang X., Zhang H., Li L., Lan H., Ren Z., Liu D., Wu L., Liu H., Jaqueth J., Li B., Pan G., & Gao S. (2016). Characterizing the population structure and genetic diversity of maize breeding germplasm in Southwest China using genome-wide SNP markers. BMC Genomics, 17(1) 1–16. pmid:27581193
  32. 32. Excoffier L., Smouse P. E., & Quattro J. M. (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131(2), 479–491. pmid:1644282
  33. 33. Perrier X., & Jacquemoud-Collet J. P. (2006). DARwin software: Dissimilarity analysis and representation for windows. http://darwin.cirad.fr/darwin
  34. 34. Liu K., & Muse S. V. (2005). PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics, 21(9), 2128–2129. pmid:15705655
  35. 35. Ellegren H., & Galtier N. (2016). Determinants of genetic diversity. Nature Reviews Genetics, 17(7), 422–433. pmid:27265362
  36. 36. Botstein D., White R. L., Skolnick M., & Davis R. W. (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. American journal of human genetics, 32(3), 314. uuid/0B80518E-A22B-41F3-BE43-171F51007E42 pmid:6247908
  37. 37. Earl D. A., & vonHoldt B. M. (2012). STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources, 4(2), 359–361.
  38. 38. Saitou N., & Nei M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, 4(4), 406–425. pmid:3447015
  39. 39. Rambaut A. (2016). FigTree 1.4. 3. https://doi.org/tree.bio.ed.ac.uk/software/figtree
  40. 40. Badu-Apraku B., Garcia-Oliveira A. L., Petroli C. D., Hearne S., Adewale S. A., & Gedil M. (2021). Genetic diversity and population structure of early and extra early maturing maize germplasm adapted to sub-Saharan Africa. BMC Plant Biology, 21(1), 1–16. pmid:33596835
  41. 41. Singh N., Choudhury D. R., Singh A. K., Kumar S., Srinivasan K., Tyagi R. K.,… & Singh R. (2013). Comparison of SSR and SNP markers in estimation of genetic diversity and population structure of Indian rice varieties. PloS one, 8(12), e84136. pmid:24367635
  42. 42. Chen H., He H., Zou Y., Chen W., Yu R., Liu X.,… & Deng X. W. (2011). Development and application of a set of breeder-friendly SNP markers for genetic analyses and molecular breeding of rice (Oryza sativa L.). Theoretical and applied genetics, 123, 869–879. pmid:21681488
  43. 43. Spindel J., Wright M., Chen C., Cobb J., Gage J., Harrington S.,… & McCouch S. (2013). Bridging the genotyping gap: using genotyping by sequencing (GBS) to add high-density SNP markers and new value to traditional bi-parental mapping and breeding populations. Theoretical and applied genetics, 126, 2699–2716. pmid:23918062
  44. 44. Mammadov J. A., Chen W., Ren R., Pai R., Marchione W., Yalçin F.,… & Kumpatla S. P. (2010). Development of highly polymorphic SNP markers from the complexity reduced portion of maize [Zea mays L.] genome for use in marker-assisted breeding. Theoretical and applied genetics, 121, 577–588. pmid:20401646
  45. 45. Rafalski J. A. (2002). Novel genetic mapping tools in plants: SNPs and LD-based approaches. Plant science, 162(3), 329–333.
  46. 46. Chacón S M. I., Pickersgill B., Debouck D. G., & Arias J. S. (2007). Phylogeographic analysis of the chloroplast DNA variation in wild common bean (Phaseolus vulgaris L.) in the Americas. Plant Systematics and Evolution, 266, 175–195.
  47. 47. Suwarno W. B., Pixley K. V., Palacios-Rojas N., Kaeppler S. M., & Babu R. (2015). Genome-wide association analysis reveals new targets for carotenoid biofortification in maize. Theoretical and Applied Genetics, 128(5), 851–864. pmid:25690716
  48. 48. Muthusamy V., Hossain F., Thirunavukkarasu N., Choudhary M., Saha S., Bhat J. S., Prasanna B. M., & Gupta H. S. (2014). Development of β-carotene rich maize hybrids through marker-assisted introgression of β-carotene hydroxylase allele. PLoS ONE, 9(12), 1–22. pmid:25486271
  49. 49. Oyekunle M., Badu-Apraku B., & Hearne S. (2015). Genetic diversity of tropical early-maturing maize inbreds and their performance in hybrid combinations under drought and optimum growing conditions. Field Crops Research, 170, 55–65. https://www.sciencedirect.com/science/article/pii/S0378429014002834
  50. 50. Reif J. C., Xia X. C., Melchinger A. E., Warburton M. L., Hoisington D. A., Beck D., Bohn M., & Frisch M. (2004). Genetic Diversity Determined within and among CIMMYT Maize Populations of Tropical, Subtropical, and Temperate Germplasm by SSR Markers. Crop Science, 44(1), 326–334.
  51. 51. Yao Q., Yang K., Pan G., & Rong T. (2007). Genetic diversity of maize (Zea mays L.) landraces from Southwest China based on SSR data. Journal of genetics and genomics, 34(9), 851–860. pmid:17884695
  52. 52. Musundire L., Derera J., Dari S., Tongoona P., & Cairns J. E. (2019). Molecular characterisation of maize introgressed inbred lines bred in different environments. Euphytica, 215(3).
  53. 53. Meti N., Samal K. C., Bastia D. N., & Rout G. R. (2013). Genetic diversity analysis in aromatic rice genotypes using microsatellite based simple sequence repeats (SSR) marker. African Journal of Biotechnology, 12(27), 4238.
  54. 54. Eltaher S., Sallam A., Belamkar V., Emara H. A., Nower A. A., Salem K. F.,… & Baenziger P. S. (2018). Genetic diversity and population structure of F3: 6 Nebraska winter wheat genotypes using genotyping-by-sequencing. Frontiers in genetics, 9, 76. pmid:29593779
  55. 55. Luo Z., Brock J., Dyer J. M., Kutchan T., Schachtman D., Augustin M.,… & Abdel-Haleem H. (2019). Genetic diversity and population structure of a Camelina sativa spring panel. Frontiers in plant science, 10, 184. pmid:30842785
  56. 56. Coates B. S., Sumerford D. V., Miller N. J., Kim K. S., Sappington T. W., Siegfried B. D., & Lewis L. C. (2009). Comparative performance of single nucleotide polymorphism and microsatellite markers for population genetic analysis. Journal of Heredity, 100(5), 556–564. pmid:19525239
  57. 57. Helyar S. J., Hemmer‐Hansen J., Bekkevold D., Taylor M. I., Ogden R., Limborg M. T.,… & Nielsen E. E. (2011). Application of SNPs for population genetics of nonmodel organisms: new opportunities and challenges. Molecular ecology resources, 11, 123–136. pmid:21429169
  58. 58. Dao A., Sanou J., Mitchell S. E., Gracen V., & Danquah E. Y. (2014). Genetic diversity among INERA maize inbred lines with single nucleotide polymorphism (SNP) markers and their relationship with CIMMYT, IITA, and temperate lines. BMC Genetics, 15(1), 1–14. pmid:25421948
  59. 59. Wu X., Lund M. S., Sun D., Zhang Q., & Su G. (2015). Impact of relationships between test and training animals and among training animals on reliability of genomic prediction. Journal of Animal Breeding and Genetics, 132(5), 366–375. pmid:26010512
  60. 60. Yang X., Gao S., Xu S., Zhang Z., Prasanna B. M., Li L., Li J., & Yan J. (2011). Characterization of a global germplasm collection and its potential utilization for analysis of complex quantitative traits in maize. Molecular Breeding, 28(4), 511–526.
  61. 61. Dube S. P., Sibiya J., & Kutu F. (2023). Genetic diversity and population structure of maize inbred lines using phenotypic traits and single nucleotide polymorphism (SNP) markers. Scientific Reports, 13(1), 17851. pmid:37857752
  62. 62. Nkhata W., Shimelis H., Melis R., Chirwa R., Mzengeza T., Mathew I., & Shayanowako A. (2020). Population structure and genetic diversity analyses of common bean germplasm collections of East and Southern Africa using morphological traits and high-density SNP markers. Plos one, 15(12), e0243238. pmid:33338076