Last-percent improvement in eligibility rates of crop seeds based on quality evaluation using near-infrared imaging spectrometry

As the world population continues to grow, the need for high-quality crop seeds that promise stable food production is increasing. Conversely, excessive demand for high quality is causing “seed loss and waste” due to slight shortfalls in eligibility rates. In this study, we applied near-infrared imaging spectrometry combined with machine learning techniques to evaluate germinability and paternal haplotype in crop seeds from 6 species and 8 cultivars. Candidate discriminants for quality evaluation were derived by linear sparse modeling using the seed reflectance spectra as explanatory variables. To systematically proceed with model selection, we defined the sorting condition where the recovery rate of seeds matches the initial eligibility rate (iP) as “standard condition”. How much the eligibility rate after sorting (P) increases from iP under this condition offers a reasonable criterion for ranking candidate models. Moreover, the model performance under conditions with adjusted discrimination strength was verified using a metric “relative precision” (rP) defined as (P–iP)/(1–iP). Because rP, compared to precision (= P), is less dependent on iP in relation to recall (R), i.e., recovery rate of eligible seeds, the rP-R curve and area under the curve also offer useful criteria for spotting better discriminant models. We confirmed that the batches of seeds given higher discriminant scores by the models selected with reference to these criteria were more enriched with eligible seeds. The method presented can be readily implemented in developing a sorting device that enables “last-percent improvement” in eligibility rates of crop seeds.


Introduction
The world population continues to grow, reaching 8 billion in 2022, and is expected to reach 8.5 billion in 2030 and 9.7 billion in 2050 [1].The issue of stable production and supply of food is thus becoming more and more important for sustainability of human society.Developing high-yielding cultivars and spreading them around the world would be one of the effective measures to address this issue.However, no matter what cultivar is used for crop production, the superior traits of that cultivar will not be exhibited without high-quality seedlings.Since ancient times in Japan, there has been a term "nae-hansaku", meaning that "half of the growth Competing interests: The authors have read the journal's policy and have the following competing interests: YO is an employee of the commercial company, Tokita Seed Co., Ltd.OM have a patent application pending (application number 2023-89036) related to the methods of seed quality evaluation based on near-infrared imaging spectrometry.The patent does not restrict use of the methods for academic research and education, and other noncommercial purposes.The authors declare that these do not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.
practical phase, or its social implementation has not progressed smoothly.One of the factors being pointed out as an impediment to the spread of this technique is high cost of the equipment required, NIR multi-or hyperspectral cameras [9].Given their active use in the food industry [10], however, this should not be the root cause.Though there is a difficulty in that handling of this technique requires multidisciplinary expertise, the time has come that individuals and teams with such aptitude in the field of seed science and industry should set out to consistent activities from research to practical application.
It may be just one of many attempts around the world, we are aiming to develop a sorting device based on NIR imaging spectrometry to abolish "seed loss and waste".Here we demonstrate a practical workflow from the occurrence of an ineligible seed lot to the start of sorting by the envisioned device, which will pave the way to improve the eligibility rate of seeds to the last percent.

Seed materials
Seeds were prepared and provided by Tokita Seed Co., Ltd.Japan.Information for all seed materials used in this study, including harvest locations and years, is summarized in Table 1.

Near-infrared hyperspectral imaging
NIR hyperspectral images of seeds were captured using an imaging system composed of a push-broom NIR hyperspectral camera (CV-N801HS, Sumitomo Electric Industries Ltd., Japan), a NIR lens with a focal length of 30 mm (Sumitomo Electric Industries Ltd.), and a motorized shutter device (Lambda SC + IQ35-SA SmartShutter, Sutter Instrument Company, CA, USA) attached in front of the lens.Illumination was supplied through a facing pair of anhydrous synthetic quartz-transmission lights (PDL-S-250VAB, NIPPON P�I Co., Ltd., Japan) placed adjacent to DC halogen lamps with aluminum reflector (PLL-250/AL, NIPPON P�I Co., Ltd.).Seeds were aligned on shallow wells (1-mm depth) of microplate-formatted (24,48 or 96 wells) matt black-finished custom polyvinyl chloride (PVC) trays.They were transferred by a stepping motor-driven (RKS545AAD-FC30LA-1, Oriental Motor Co., Ltd., Japan) conveyor device (CSSK50-T-100-100, NKE Corporation, Japan).Images were taken twice with the trays rotated 180˚, in preparation for mutual outer cross-validation in the subsequent discriminant modeling.The camera was equipped with a 16-bit cooled indium-gallium-arsenide and gallium-arsenide-antimonide (InGaAs/GaAsSb) type-II super lattice (T2SL) focal-plane array (FPA), having sensitivity in the wavelength range of 950-2,350 nm, and was capable of acquiring 320 pixel-wide and 256 spectral-band line image at a single exposure.As the pixel size of the FPA was 30×30 μm and the horizontal field-of-view (FOV) was set to 89.6 mm by applying a working distance of 280 mm, the resulting images had spatial resolution of ca.90 ppi with a spectral sampling pitch of 6 nm.Brightness of the images was calibrated to absolute reflectance by linear interpolation using an image of the 99% reflectance area of a multistep diffuse reflectance standard (SRT-MS-050, Labsphere Inc., NH, USA) and a darkfield image (0% reflectance).

High-resolution imaging of seed appearance
A portion of the seeds that underwent NIR hyperspectral imaging was subjected to high-resolution appearance photography using an imaging system composed of an 8k color CMOS linescan camera (EV71YC4CCL8005-BA0, Teledyne e2v Ltd., CA, USA), 0.4× object-side telecentric lens (MPHC-04F-150, Optart Corporation, Japan), and a pair of high-uniformity white-LED bar lights (IDBA-HMS150WHV-S, Leimac Ltd., Japan).Seeds were placed in KingFisher 96 microplates (Thermo Fisher Scientific Inc., MA, USA) or aligned on shallow wells (1-mm depth) of microplate-formatted (24, 48 or 96 wells) matt black or white-finished custom PVC trays.They were transferred by a stepping motor-driven (RKS564AAD-TS30-1, Oriental Motor Co., Ltd.) conveyor device (GVHA-100-1000-6-NV-NM-NH-D-R, MISUMI Group Inc., Japan).As the pixel size of the line sensor was 5×5 μm, the resulting images had spatial resolution of 2,032 ppi.White balance correction of the images was carried out by using appropriate reflectance areas of the multistep diffuse reflectance standard (SRT-MS-050, Labsphere Inc.).

Germination test and haplotype analysis
After NIR hyperspectral and/or high-resolution imaging, seeds were subjected to germination tests as prescribed by ISTA [2].In brief, seeds were sown on a sheet of moist filter paper laid in a plastic container (lettuce, bunching onion, tomato and cauliflower) or on soil (squash and pea), and incubated until the germination deadline under light and temperature conditions suitable for each crop species: continuous light at 20˚C for 6 d (lettuce) or 9-10 d (bunching onion); 16/8 h-light and dark photoperiod at 30/20˚C for 5 d (cauliflower) or 10-11 d (tomato); greenhouse under natural daylength where the temperature was within the range of 20-30˚C for 7-8 d in summer (squash), or 12-22˚C for 13-16 d in fall to winter (pea).No pretreatment for germination was applied in this study.
The cauliflower cultivar No. 47 is premised to be used as an intraspecific F1 hybrid.Due to relaxation of self-incompatibility in female flowers that occurs as their lifetime approaches, it has yet to be achieved in this cultivar to completely prevent formation of undesired inbred seeds.To clarify whether each cauliflower seed was an F1 hybrid or inbred, paternal haplotype was determined by zymogram banding patterns of phosphoglucomutase isozymes [11] extracted from seedlings post-germination.

Discriminant modeling for seed quality evaluation
Datasets prepared through the analysis of each seed batch in Table 1 are summarized in Table 2.A schematic of the following procedures is shown in S1 Fig.
Dataset preparation.NIR reflectance spectra (R) of individual seeds were obtained by averaging the spectra recorded in seed-occupied pixels in the NIR hyperspectral image.Eleven additional spectra (X 2 to X 12 ) were derived from R (= X 1 ) by applying reciprocal (R -1 ) [12,13] or pseudo-absorbance (-logR) [14] transformation aimed at improving linearity, in combination with or without standard normal variate (SNV) transformation for baseline correction and/or Savitzky-Golay (SG) smoothing filter [15] for denoising (S1 Fig, step 1).SNV transformation was carried out using a valid spectral range defined as 980-2,200 nm (corresponding to 201 wavebands).SG filter was applied with 3rd-order polynomial and a filter length of 5.All of 12 spectra were examined as explanatory variables for predicting quality traits of individual seeds.For training discriminant models, the quality trait of seeds to predict (objective variable y) was assigned with a dummy variable: 1 for seeds that were later clarified as eligible and −1 for others (S1 Fig, step 1).For prediction of germinability, seeds normally germinated were assigned with 1, while those germinated abnormally, i.e., the ones that did not grow into healthy seedlings after emergence, and non-germinated seeds, that we regarded as dead, with −1.For prediction of paternal haplotype of cauliflower seeds, F1 hybrid and inbred were assigned with 1 and −1, respectively.For direct prediction of fully eligible cauliflower seeds with commercial value, F1 hybrid that germinated normally was assigned with 1, and −1 for others.
Derivation of candidate discriminant models.Datasets comprised of y and each of 12 NIR spectra (X 1 to X 12 ) were subjected to discriminant modeling for seed quality evaluation.Models were basically derived by partial least squares discriminant analysis (PLS-DA) [16,17], but to improve their interpretability and performance, we introduced a step for preselecting explanatory variables (wavebands) by L1 regularization.Specifically, an ordinary PLS-DA model was first derived and let its standard partial regression coefficients (SPRCs) be β (S1 Fig, step 2).Reciprocal of the absolute values of β was then set as the penalty factor to execute adaptive least absolute shrinkage and selection operator (LASSO) [18] (S1 Fig, step 3).Finally, the reduced explanatory variable combinations presented in the LASSO solution path were reapplied to PLS-DA to complete a series of candidate discriminant models (S1 Fig, step 4).Scripts and datasets for replicating these procedures are provided as S1 File.How to select superior models from the candidates is one of the subjects of this study, and so will be addressed in the Results section.

Software
Software for operating the NIR hyperspectral and 8k line-scanning camera systems, and for analyzing hyperspectral images was developed in managed C# code using Microsoft Visual Studio Community edition (version 2022, https://visualstudio.microsoft.com/) with extensions Emgu CV (version 4.7.0,https://www.emgu.com/),a.NET wrapper to the OpenCV image processing library (version 4.7.0,https://www.opencv.org/)and Extreme Optimization Numerical Library for.NET (version 8.1.19,https://www.extremeoptimization.com/).The software for directly visualizing discriminant scores of seeds within hyperspectral images is provided as S2 File, which works on 64-bit versions of Microsoft Windows operating systems with.NET6 runtime environment.Discriminant modeling with adaptive LASSO and PLS-DA algorithms were performed on R statistical software (version 4.3.1,https://www.r-project.org/)using packages 'glmnet' and 'pls'.Variable importance in projection (VIP) [19,20] of the resulting PLS-DA models was calculated using an R package 'plsVarSel'.All graphs were drawn using 'ggplot2' and related R packages except the heat maps drawn using RINEARN Graph 3D software (version 5.6.32,https://www.rinearn.com/graph3d/).

Appearance and NIR reflectance spectra of eligible and ineligible crop seeds
Fig 1 shows high-resolution images of seeds from all crop species used in this study, which are arranged according to their quality as clarified by subsequent germination test and/or haplotype analysis.For all crop species, there were no features in the seed appearance that could guide us to discriminate whether each is eligible or not.In fact,

Defining metrics for discriminant model selection
(1) "Standard condition" and related metrics.To predict quality traits of individual seeds from their NIR reflectance spectra with practical accuracy, machine learning techniques were employed to derive discriminant models for seed quality evaluation.What is desired for the models are to give high and low scores, respectively, to seeds in eligible and ineligible classes.But how can we spot the best few among many candidates?To find good criteria for model selection, we start with a thought experiment using three hypothetical models with different discriminability: near-perfect (NP), medium-(MP) and poor-performance (PP).For each of eligible and ineligible seeds, the model NP gives scores according to normal distributions N (1, 0.25) and N (−1, 0.25), MP according to N (0.2, 0.25) and N (−0.2, 0.25), and PP according to N (0.05, 0.25) and N (−0.05,0.25) (Table 3).In real scenes of seed management, iP in each seed lot is usually known through small-scale preliminary tests conducted shortly after harvest.What is desired for post-harvest seed sorting    ).We therefore define the condition where the recovery rate of seeds matches iP as "standard condition".Metrics definable under this condition include "standard LST" (sLST) as the lowest score of seeds to be recovered, and "standard precision" (sP) as the eligibility rate after sorting (Fig 3; rows 1, 4 and 5).It is clear the better the discrimination performance of the model applied, the higher the sP when tested on the same seed batch (Table 3).Below, we may also use the term "standard" to refer to other concepts and metrics under this condition.
(2) "Relative precision" and related metrics.Even with PP model, P seems adjustable; it increases as the recovery rate of seeds is reduced by raising LST (Fig 3C ; rows 4 and 5).If sP shortfalls a desired level, discrimination strength should be tightened by raising LST higher than sLST.Then, how can we evaluate the overall performance of discriminant models, including the performance under LST-adjusted "non-standard" conditions?
The performance of discriminant models is often evaluated using ROC (receiver operating characteristics) or PR (precision-recall) curves.In non-defective item selection as with seed sorting, acceptance of ineligible (false-positive) is usually less favorable than rejection of eligible (false-negative).For such purposes, PR curve is preferred over ROC curve due to high sensitivity to false-positive errors [21].For both curves, it is regarded the larger the area under the curve (AUC), the better the discrimination performance.
The area under the PR curve (AUC-PR), also called average precision, is a useful metric in comparing the performance of multiple discriminant models applied to the same dataset.On the other hand, it has the drawback of being dependent on iP, making it unsuitable for verifying if a particular model performs equally on different datasets.To remedy this problem, we introduce "relative precision" (rP) defined as: : which is equivalent to P scaled to the closed interval of [0, 1].In connection, we also define rPR curve as the relationship between rP and R, and area under the rPR curve (AUC-rPR) as an alternative to AUC-PR.Fig 4 shows PR and rPR curves for discrimination when individuals in seed batches with 80% or 60% iP are scored by the three hypothetical models.Additionally, Table 3 summarizes AUC-PR and AUC-rPR in these cases along with metrics related to standard performance.Though AUC-rPR is not fully independent of iP, its effect is far less than on AUC-PR.Accordingly, AUC-rPR reduces the risk of overestimating the performance of poor models when tested on seed batches with higher iP.We also paid attention on the difference between standard (sA) and maximum accuracy (Amax) to check if the sorting under standard condition is sufficiently drawing the potential of discriminant models (Table 3).As mentioned, however, what is desired for seed sorting is not maximizing the accuracy (A) of discrimination, but raising P to the desired level.

Features of selected discriminant models
(1) Effects of waveband selection.While discriminant models can be derived by any algorithm such as PLS-DA, support vector machine (SVM), and even deep learning, PLS-DA has been preferred due to high relative performance and interpretability of the models.Yet, PLS-DA models with non-zero partial regression coefficients (PRCs) for hundreds of explanatory variables (wavebands) are still too complex to grasp the basis for discriminability.We therefore introduced a step for variable selection before executing PLS-DA to reduce the number of wavebands incorporated in the models.As this procedure is somewhat specific to this study, we evaluated how it affected on the structure and performance of the resulting PLS-DA models.They were basically selected for being frequently high-ranked in iterative derivation.
Figs 5 and 6 show standardized PRCs (SPRCs) and VIP for each waveband in sparse and non-sparse discriminant models derived with and without waveband selection, respectively.The magnitude of their absolute values can be interpreted as relative importance of each waveband in discrimination.As summarized in Table 4, NIR spectra (X 1 to X 12 ) employed as the explanatory variable varied between sparse and non-sparse models, as well as between models for different crop species and cultivars.For all corresponding pairs, the number of wavebands incorporated in sparse models was 6-10 times fewer than non-sparse counterparts.Accordingly, it is easier with sparse models to identify the wavebands where the quality of seeds is reflected.
Table 5 compares performance of sparse and non-sparse models using metrics devised above.In all cases, sparse models outperformed their non-sparse counterparts as evaluated by sP and AUC-rPR.We will only focus on sparse discriminant models hereafter.
(2) Distribution of discriminant scores.As evident in Table 5, all selected models worked as expected in that sP exceeded iP, albeit with varying extent.5).
Out of 10 pairs of models listed in Tables 5 and 8 are intended for predicting germinability of seeds.In deriving these models, both non-germinated and abnormally germinated seeds  3. Relative precision (rP) is defines in Eq 1. X-axis in all plots represents recall (R, recovery rate of eligible seeds), while Y-axis in (1) and ( 2) represents precision (P, eligibility rate after sorting) and rP, respectively.https://doi.org/10.1371/journal.pone.0291105.g004 were equally labeled as ineligible.Nevertheless, seeds normally germinated occupied the highest score range, followed by abnormally germinated and then non-germinated seeds in all cases regardless of target species and cultivars (Fig 7A -7F, 7J and 7K; rows 2 and 3).Thus, scores given by these models may reflect not only the probability of normal germination but also the inherent vigor of seeds.
(3) Application domain of discriminant models.The performance evaluation of discriminant models in Table 5 and Fig  As usual in discriminant analysis, the models performed worse in most cases in external than internal validation (Tables 5 and 6).Nevertheless, insofar as the dataset was from the cultivar the model was intended for, the deterioration in discrimination performance was not that severe as to impair its practicality.This is easier to grasp by comparing AUC-rPR.(4) Consistency of discriminant scores.While discriminant scores of eligible seeds were always distributed on a higher range than those of ineligible seeds, the absolute scores deviated for each testing occasion, even in cases where seed batches from the same lot were used.This is evident from the discrepancy of sLST between internal and external validations, and, if any, between multiple external validations (Tables 5 and 6), or can be noticed by comparing the corresponding sets of graphs in Since NIR hyperspectral images of seeds, the source of discriminant scores, were taken at each testing occasion, changes in the condition of seed materials or the imaging system over time could have compromised the consistency of absolute scores.Such state changes cannot be  completely prevented in practice, so the validity of discriminant scores like sLST is ad hoc.As mentioned, however, LST is convertible to seed recovery rate, and similarly, absolute discriminant scores to relative quality rank of seeds.Thus, the adjustment of discrimination strength should be done by setting LST that back-calculated from the target seed recovery rate each time, not LST that worked well in the past.

Quality evaluation of cauliflower seeds for commercial value
As mentioned, each seed of cauliflower cultivar No. 47 is regarded as fully eligible, i.e., commercially valuable, only if it is F1 hybrid and can germinate normally.Whether these two requirements are met can be predicted either separately or together.Then which works better in terms of performance and usability in discriminating fully eligible cauliflower seeds?Below, we compare methods for seed quality evaluation involving multiple trait requirements.
(1) Evaluation by successive per-trait discrimination.Among discriminant models derived above, the model m-Ca47g predicts whether a cauliflower seed can germinate normally regardless of its paternal haplotype.The model m-Ca47h, on the other hand, was derived to predict only whether a cauliflower seed is an F1 hybrid, while ignoring its germinability.Unlike other models, these two need to be used in combination to discriminate whether a seed of interest is fully eligible.
In case such "successive" method is to be taken, the relationship between LST settings for the two models and the metrics related to discrimination performance for fully eligible seeds becomes three-dimensional.This can be represented planarly using heatmaps as in Fig 9 .PR and rPR relationships will also be three-dimensional, i.e., curved surfaces instead of curved lines, as depicted in S4    3 and 4 Table 7 shows the volume under PR and rPR curved surfaces (denoted as "AUC-PR and rPR" for consistency), along with performance metrics when both m-Ca47g and m-Ca47h models are used under their respective standard condition.While AUC-rPR is sufficiently high, a severe disadvantage of this method is the difficulty in adjusting discrimination strength to achieve desired P in case sP fell short of it.
(2) Direct evaluation omitting per-trait discrimination.In evaluating the commercial value of seeds, it is not necessarily required to discriminate their quality per-trait.In contrast to the precedent, the model m-Ca47 only predicts whether a cauliflower seed is fully eligible, regardless of whether its germinability or paternal haplotype is ineligible.The properties of this model have already been presented alongside those of other models (Figs 7H and 8J and S2H and S3J Figs).
As summarized in Table 7, metrics related to standard performance were comparable between cases when the "direct" method with the model m-Ca47 or the "successive" method referring to both m-Ca47g and m-Ca47h scores was taken.As seen in Fig 7H and  S2H Fig, m-Ca47 has the advantage that the discrimination strength can be adjusted through a single LST, while the performance is practically high.Another advantage of the "direct" method is only one discriminant model is sufficient to be derived regardless of the number of quality traits that should be focused on.This method can be taken if its disadvantage in unpredictability of per-trait quality of seeds, i.e., germinability and haplotype, is acceptable.
(3) Evaluation by unified score of multiple per-trait discrimination.To parallelize the advantages of "successive" and "direct" methods as described above, i.e., predictability of pertrait quality and high adjustability of discrimination strength, we explored how to integrate scores from multiple discriminant models into a single representative score.
As mentioned, distribution of discriminant scores can vary between testing occasion even those given by a single model to seed batches from the same lot.Also, the scores given by different discriminant models are not comparable with each other.To compensate for such inconsistent nature of primary discriminant scores, we propose the following score conversion Abbreviations not mentioned are common to those in Tables 3 and 4. https://doi.org/10.1371/journal.pone.0291105.t006 analogous to variable standardization: where (M, D) denotes combined use of the model M and dataset D (usually devoid of y), and ŷ denotes the primary scores.ŷ p is a subset of ŷ for predicting the standard deviation (SD) and sLST of ŷ, with the respective estimates denoted as s(ŷ p ) and sLST(ŷ p ).
In evaluating seed quality based on multiple traits, it is reasonable to take the lowest rating as the representative, as rejection should be preferred over acceptance in case of doubt.The nature of ẑ as being "standardized" merits in unifying scores from multiple discriminant models.When the quality of seeds in dataset D is evaluated using n models, M 1 to M n , the unified  2 and 4-6.Different line colors are used according to the type of validation (dark gray for internal validation) and, for the cases of external validation, whether the dataset under analysis is from different batch (blue), lot (green), or cultivar (magenta) compared to the internal dataset for model M. The range of X-axis is common to that shown at the bottom (C, G and K).https://doi.org/10.1371/journal.pone.0291105.g008discriminant score can be represented as: The score û can again be treated alike the score ŷ, and sLST for this can also be determined.The m-Ca47gh is an integrated model of m-Ca47g and m-Ca47h, in which the primary ŷ scores from the respective source models are unified through the above procedures.7, performance deterioration in external compared to internal validation, a measure of model overfitting, was less pronounced with the model m-Ca47gh than m-Ca47.This "unification" method can be taken if a set of discriminant models with reasonable performance is available, each for predicting one of multiple quality traits.

Visualization of discriminant scores and verification of seed quality
To verify the feasibility of automatic eligible seed sorting based on the discriminant models derived above, we developed a system to directly visualize the discriminant scores of seeds within the FOV of NIR hyperspectral images.Fig 10 shows the quality of squash and pea seeds as predicted using the models m-Sq5 and m-Pe19, respectively.The program and data files for retesting are provided as S2 File.The same is possible for seeds from crop species other than squash and pea.
We then separated high-and low-ranked seeds on the basis of score images and tested for their germinability and/or paternal haplotype.Table 8 summarizes the results of cultivation tests where seeds of different predicted quality ranks were sown simultaneously.Fig 11 shows germination beds photographed before the deadline.For all crop species, normal germination was observed more frequently from batches of high-ranked than of low-ranked seeds.For cauliflower cultivar No. 47, more inbred seedlings emerged from batches of seeds with lower m-Ca47h scores (Table 8).
As concerned from the low relative value of AUC-rPR (0.4176, see Table 6), the efficacy of seed sorting in lettuce was inferior to that in other crop species listed in Table 8.Nevertheless, growth of lettuce seedlings from high-ranked seeds was apparently more vigorous than those from low-ranked seeds.

Discussion
We demonstrated a method for seed quality evaluation widely applicable to a variety of crop species, which is based on NIR imaging spectrometry and machine learning techniques.In most cases presented here, seed batches with higher discriminant scores were more enriched in eligible seeds (Table 8 and Fig 11).On the other hand, we also encountered cultivars for which no satisfactory models for discriminating eligible seeds could be derived (e.g., bunching onion cultivar No. 51; see Table 5 and Fig 8E).Including such unfortunate cases, the advantage of the method is that it takes only a couple of hours from the completion of routine seed quality inspections, such as germination tests, until it becomes clear whether discrimination of eligible seeds is possible for the given seed lot.Where possible, the derived discriminant model can be implemented in a sorting device to immediately start "repairing" the seed lots, that otherwise do not meet quality standards and have no choice but to be discarded.The only operation required more than usual is to capture a NIR hyperspectral image of seeds before subjecting them to quality inspection; this allows to establish a dataset linking their spectroscopic and quality traits as the explanatory and objective variables, respectively (S1 Fig, step 1).When aiming for pass/fail discrimination by machine learning techniques, we tend to expect a discriminant model once derived being usable semi-permanently for the same purpose.However, at least in our experience, the crop species and cultivars that produce ineligible seed lots are not necessarily fixed; the occurrence of such lots often is accidental and becomes the very first opportunity for troubleshooting in that troubled cultivar.In the case of cauliflower seeds, differences in physicochemical properties between F1 hybrid and inbred should be consistent regardless of seed lot.By contrast, there are various causes for the occurrence of poorly germinating seeds, and hence it makes sense to derive an optimal model for discriminating eligible seeds on a case-by-case basis.For example, the properties of seeds that have lost the potential to germinate normally due to high temperature or over-humidity are different, and therefore the boundaries (possibly defined as hyperplanes or hypersurfaces) separating them from eligible seeds should also be different from each other.This seems consistent with  the observation that combined use of the models (m-Ca47g and m-Ca47h) specialized for predicting germinability or paternal haplotype was somewhat better in overall performance than using a single model (m-Ca47) when discriminating fully eligible cauliflower seeds (Table 8).Even eligible seeds themselves, unlike industrial products, can have somewhat different properties from lot to lot, so that it would also be difficult to define an enclosed boundary separating them from ineligible seeds of any kinds.Although we mentioned it takes only a short while, the procedure we employed to derive discriminant models for seed quality evaluation is not necessarily computationally efficient (S1 Fig, steps 2-4; S1 File).To promptly respond to the case-by-case needs for seed quality reparation, efficient and rapid procedures should be explored continually.Sparse PLS and its variants [22,23] are the ones we should try early for deriving discriminant models with high interpretability and performance; these may help reduce or even eliminate the need for iterative model derivation due to their high reproducibility.Whatever methods used for deriving discriminant models though, the concepts and metrics presented in this study, such as "standard" condition and performance as well as "AUC-rPR", will help systematically proceed with model selection.
One of the issues not fully addressed in this study is exploring the implications of the derived discriminant models (Fig 5).Near to mid-wave infrared (IR) light induces vibrational level changes in wavelength-specific chemical bonds, and hence the IR spectrum of a substance acts like a fingerprint of its chemical composition.Additionally, IR light to the shortwave range is suitable for non-destructive measurement [6].Based on an observation that coniferous tree species for afforestation (Cupressaceae and Pinaceae) tend to form sterile seeds with abnormality in endosperm storage lipids, we have developed and put into practice a method and device for isolating fertile seeds in these species.There we focused on the property of methylene group (-CH 2 -), that is abundant in fatty acid side chains and preferentially absorbs NIR lights centered at 1,730 nm (1st overtone of antisymmetric stretching) or 1,200 nm (2nd overtone of symmetric stretching) [24,25].Conversely, in an attempt to estimate the elemental composition of seeds (in rice and cypress), we noticed that absorption wavebands of the peptide group (-CONH-) were preferentially selected in sparse linear models for regressing nitrogen content (e.g., around 1,920 nm attributable to 2nd overtone of amide I (C = O) stretching).Peptide groups are abundant in proteins, the major storage forms of nitrogen in seeds.In this way, dissection of the wavebands selected in sparse discriminant models and the magnitude of SPRC and/or VIP for each waveband as in Fig 5 helps to clarify the substances that make the difference between eligible and ineligible seeds.Accumulation of such information will be a precious knowledge base for realizing a management system to stably maintain and supply high-quality seeds.
Another technical issue that remains to be solved is how to spot an appropriate LST to achieve P (eligibility rate after sorting) nearly in line with expectations.Though sP (P under standard condition) is somewhat affected by iP (eligibility rate before sorting), it roughly averaged 86% when discriminant models were externally validated on seeds batches from the cultivars they were intended for (Table 6).This rate is likely unsatisfactory for most seedling producers, so LST must be raised at the cost of some decline in R (recovery rate of eligible seeds), i.e., more of eligible seeds to be discarded.The relationship between P and recovery rate of seeds usually have a convex shape as in Fig 3B (row 5; see also Fig .Hence, this may be approximated at worst as a straight line passing through the points (0, 1) and (1, iP).However, while making efforts on verifying the feasibility of seed quality evaluation in as many crop species and cultivars as possible, we could not spare much effort on replicate tests using as many seed batches from the same cultivar as possible.We are not yet at the stage we can propose a reliable means to adjust the discrimination strength as intended.
Ever since the technology of seed quality evaluation using NIR imaging spectrometry began to emerge in the mid-2000s, the high cost of the equipment required has been pointed out as an impediment to its widespread adoption.What we think is yet another impediment is that there is a limit to the number of seeds that can be handled by manual or semi-mechanical operations, whether for industrial applications or basic research.In fact, the task of capturing hyperspectral images itself is not that hard at all.Instead, what is arduous is preparing the aforementioned datasets in which the spectroscopic and quality traits of individual seeds are linked one-to-one; they all need to be processed consistently in strict order.Realizing a mechanical device that can freely manipulate (align and separate) lightweight, irregular-shaped seeds is a real challenge that requires interdisciplinary efforts.However, the time has come to confront and overcome this challenge if we are to further deepen the technology and promote its social implementation.
Single-kernel sorters equipped with NIR point spectrometers have been commercialized, and industrial use of models that achieved high extreme throughput is progressing [26,27].However, seeds are not homogeneous and so the trait at a single point within a seed do not necessarily represent the trait of the whole seed.Compared to spectrometers, hyperspectral cameras require a longer time for a single exposure, but have an advantage in that they can acquire spatial and spectral information of a large number of seeds at once, providing a chance to accurately estimate the biological traits of individual seeds.As presented in this study, NIR hyperspectral images allow seed quality evaluation in a wide variety of crop species and cultivars according to the same principles and procedures.We are working toward early realization of a single-kernel sorter equipped with a NIR hyperspectral camera, which will pave the way to solve the problem of "seed loss and waste" in crop production.
Fig 1 shows high-resolution images of seeds from all crop species used in this study, which are arranged according to their quality as clarified by subsequent germination test and/or haplotype analysis.For all crop species, there were no features in the seed appearance that could guide us to discriminate whether each is eligible or not.In fact, Fig 1 selectively shows eligible

Fig 1 .
Fig 1. Appearance of eligible and ineligible crop seeds.Seeds were subjected to germination test after capturing NIR hyperspectral and 8k color images.For cauliflower seeds, paternal haplotype was also determined.Images are arranged according to the results of these quality inspections.Numbers below the images represent the discriminant score for each seed given by the corresponding discriminant models in Table 4. https://doi.org/10.1371/journal.pone.0291105.g001 Fig 3 shows distribution of scores given by each model in case the initial eligibility rate (iP, initial precision) was 80% (rows 1-3).Assuming a scheme of eligible seed sorting where seeds are recovered in descending order of score, metrics related to discrimination performance can be plotted as a function of lower score threshold (LST) (row 4) or the recovery rate of seeds (row 5).LST and seed recovery rate are interconvertible, as the latter is trivially derived from the empirical cumulative distribution function (eCDF) of discriminant scores given to individual seeds (row 1).With NP model, precision (P), i.e., eligibility rate after sorting, keeps close to 100% until the recovery rate of seeds reaches iP (Fig 3A; row 5).With PP model, on the other hand, P immediately converges to iP (Fig 3C; row 5).

Fig 2 .
Fig 2. NIR reflectance spectra of eligible and ineligible crop seeds.(A-I) Average (±SD) reflectance spectra of seeds by quality class for the indicated crop cultivars.(J) Reflectance spectra of 5 individual seeds from each quality class in cauliflower cultivar No. 47.The range of X-axis is common to that shown at the bottom (C, G and J).https://doi.org/10.1371/journal.pone.0291105.g002

Fig 3 .
Fig 3. Thought experiments of eligible seed sorting using hypothetical models with different discriminability.Properties of the three hypothetical models NP (A), MP (B) and PP (C) are shown in Table3.Results expected when these models were applied to a seed batch with 80% initial precision (iP, eligibility rate before sorting) are shown.X-axis is common from (1) to (4) representing discriminant score itself, or lower score threshold (LST) set for seed sorting.(1) Density distribution of scores for all seeds (gray filled curves) and empirical cumulative distribution function (eCDF) for seeds above LST (black lines).The latter provides a means to convert between LST and seed recovery rate.(2) Density distribution of scores for seeds in each quality class.Curve heights are scaled by class.(3) Boxplot and scatterplot representations of (2).(4-5) Discrimination performance plotted as a function of LST setpoint (4) or seed recovery rate (5).sP, standard precision (eligibility rate after sorting under standard condition); sLST, standard LST (LST that matches seed recovery rate with iP).
Fig 7 shows distribution of discriminant scores given by each model to individual seeds (rows 1-3) and performance metrics in relation to LST (row 4) or recovery rate of seeds (row 5) as in Fig 3.For most models, the graphs were similar to Fig 3B drawn by the hypothetical model MP, suggesting that they have practical, if not perfect, discrimination performance.By contrast, those for the model m-Bo51 derived for bunching onion cultivar No.51 were more like Fig 3C drawn by the PP model.The inferior performance of this model is also evidenced by the low relative value of AUC-rPR (0.3993; see Table

Fig 4 .
Fig 4. Precision-and relative precision-recall curves drawn by hypothetical models with different discriminability.Properties of the three hypothetical models NP (A), MP (B) and PP (C), and quality breakdown in hypothetical seed batches with initial precision (iP, eligibility rate before sorting) of 0.8 (80%) or 0.6 (60%) are shown in Table3.Relative precision (rP) is defines in Eq 1. X-axis in all plots represents recall (R, recovery rate of eligible seeds), while Y-axis in (1) and (2) represents precision (P, eligibility rate after sorting) and rP, respectively.
7 is based on outer cross-validation set to avoid overestimation, but is not more than internal validation using data from the seeds with which the models were trained.To better estimate the generalization performance and application domain of each model, seeds from (1) different batch of the same lot, (2) different lot of the same cultivar, and (3) different cultivar of the same species, relative to those used for model training, were applied to external validation.The results are shown in Table 6 and S2 Fig, which are arranged in correspondence with Table 5 and Fig 7, respectively.

Fig 5 .
Fig 5. Variable importance plot of sparse discriminant models for seed quality evaluation.NIR spectrum employed as the explanatory variable and the number of wavebands selected in each model are shown in Table 4. Standard partial regression coefficients (SPRCs) and variable importance in projection (VIP) for each waveband in the indicated discriminant models are drawn in green and orange lines, respectively.The ranges of SRPC and VIP are shown on the left and right Y-axes, respectively.The range of X-axis is common to that shown at the bottom (C, G and J).https://doi.org/10.1371/journal.pone.0291105.g005 Fig 7 and S2 Fig.

Fig 6 .
Fig 6.Variable importance plot of non-sparse discriminant models for seed quality evaluation.NIR spectrum employed as the explanatory variable in each model is shown in Table 4. Without variable selection, the number of wavebands incorporated in the model was 201 for all cases.The figure is drawn in the same style as Fig 5. https://doi.org/10.1371/journal.pone.0291105.g006 Fig. Optimizing the discrimination strength is equivalent to finding coordinates on Fig 9 where P is above the desired level and R within an acceptable range.The fact is that, however, the relationships as in Fig 9 are yet black boxes when a discriminant model is first applied to a novel seed batch for which the individual eligibility has not been Abbreviations not mentioned are common to those in Tables3 and 4.https://doi.org/10.1371/journal.pone.0291105.t005

Fig 7 .
Fig 7. Internal validation of discriminant models for seed quality evaluation.The figure illustrates the results of internal validation with real discriminant model M applied to dataset D in the same style as Fig 3. Summary of datasets, discriminant models, and the performance of discrimination with each (M, D) combination are shown in Tables 2, 4 and 5, respectively.In the cases of predicting germinability of seeds (A-K, J, K), score distribution of ineligible seeds is drawn separately for abnormally germinated and non-germinated seeds.https://doi.org/10.1371/journal.pone.0291105.g007

Fig 8 .
Fig 8. Relative precision-recall (rPR) curves in eligible seed sorting by discriminant models derived for various crop cultivars.Shown are rPR curves for discrimination with model M applied to dataset D. Summary of datasets, discriminant models, and the performance of discrimination with each (M, D) combination in internal and external validations, respectively, are shown in Tables2 and 4-6.Different line colors are used according to the type of validation (dark gray for internal validation) and, for the cases of external validation, whether the dataset under analysis is from different batch (blue), lot (green), or cultivar (magenta) compared to the internal dataset for model M. The range of X-axis is common to that shown at the bottom (C, G and K).
The properties of m-Ca47gh in internal and external validations are shown in Fig 7I and S2I Fig, and PR and rPR curves for discrimination by this model in S3K Fig and Fig 8K, respectively.According to rPR curves (Fig 8J and 8K) and AUC-rPR in Table

Fig 9 .
Fig 9. Performance of quality sorting for fully eligible cauliflower seeds by successive per-trait discrimination.Shown are the results of internal (A) and external (B) validations.The value range for the color scale is indicated at the top right of each plot.The X-and Y-axes represent lower score threshold (LST) for discrimination with the models m-Ca47g and m-Ca47h, respectively Gray arrowheads indicate standard LST (sLST) for the respective model when applied to the indicated dataset D. The range of both axes are common to all plots.https://doi.org/10.1371/journal.pone.0291105.g009

Fig 10 .
Fig 10.Visualization of predicted quality rank of crop seeds by direct processing of NIR hyperspectral images.Each seed of squash (A) and pea (B) were scored by discriminant models (M) m-Sq5 and m-Pe19, respectively.Summary of the models are shown in Table 4. Seeds within the same field-of-view (FOV) are filled with a pseudo-color according to the relative score of each seed.The figure can be reproduced using the software and data files provided as S2 File.https://doi.org/10.1371/journal.pone.0291105.g010

Fig 11 .
Fig 11.Germination and post-emergence growth of crop seeds with different predicted quality ranks.The test results are summarized in Table 8.Seeds were scored by the discriminant model (M) intended for each crop cultivar and classified into rankers from top (T) to bottom (B).They were then sown on germination beds at the same time.The images were taken before the test deadline.For peas, the original image (A) along with the extracted image of seedlings (B) are shown.Cauliflower seeds in (E) were scored by two discriminant models m-Ca47g and m-Ca47h, and ranked for each quality trait of germinability and paternal haplotype.https://doi.org/10.1371/journal.pone.0291105.g011 7 and S2 Fig) unless using a discriminant model with poor performance as in Fig 3C (e.g., see Fig 7J)

Table 3 . Thought experiment using hypothetical seed batches and discriminant models.
P, precision (eligibility rate after sorting); R, recall (recovery rate of eligible seeds); A, accuracy; iP, initial P (eligibility rate before sorting); LST, lower score threshold; sP/sR/sA, P/R/A under standard condition; Amax, maximum A; AUC-PR/rPR, area under PR/rPR (precision/relative precision-recall) curve.* Condition where the recovery rate of seeds matches iP, and P and R equalize (sP = sR).https://doi.org/10.1371/journal.pone.0291105.t003 with recall (R), i.e., recovery rate of eligible seeds, kept high as much as possible.It is worth noticing in this respect that P and R equalize when the recovery rate of seeds reaches iP, which always holds no matter what discriminant model is used(Fig 3; row 5 ). sP, standard precision (eligibility rate after sorting under standard condition); sLST, standard LST (LST that matches seed recovery rate with iP).https://doi.org/10.1371/journal.pone.0291105.g003 is to raise P

Table 5 . Performance of sparse and non-sparse discriminant models in internal validation compared on the same dataset.
* 1 G and H denote germinability and paternal haplotype, respectively.* 2 Symbols match with those in S1 Fig.* 3 Base spectra to which SG filtering and/or SNV transformation were applied.R denotes raw reflectance spectra.* 4 Number of wavebands incorporated in each model.https://doi.org/10.1371/journal.pone.0291105.t004

Table 3 .
* Upper and lower denote the values for m-Ca47g and m-Ca47h models, respectively.