Comparison of Hybrid Classifiers for Crop Classification Using Normalized Difference Vegetation Index Time Series: A Case Study for Major Crops in North Xinjiang, China

A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern.


Introduction
Crop-type information is important for the global food security system, and there is an urgent demand for accurate crop classification data [1][2][3]. Since the crop calendar varies among different crops, phenology is the basis of crop classification [4]. Vegetation indices (VI), which could be calculated from remote sensing images, can measure vegetation coverage, and VI time series can describe crop phenology [5][6][7]. Thus, VI time series have been employed widely to produce crop classification data [8][9][10][11][12]. In addition, the Normalized Difference Vegetation Index (NDVI) has the higher importance score than other features, such as multi-spectral bands, calculated indices and ancillary data [13].
Coarse spatial resolution data, such as Moderate Resolution Imaging Spectroradiometer (MODIS) (250-500m) data, are characterized by high temporal resolution and have shown potential for identifying crops [14,15]. However, one drawback is that the spatial resolution of MODIS data is relatively coarse, and the classification accuracy is affected by mixed pixels when the field size is small. For sensors at finer spatial resolution, such as Landsat-5/7 TM/ ETM+ (at 30-m resolution), the revisit period is relatively long (e.g., 16days for Landsat). Thus, Landsat cannot provide enough cloud-free imagery for most regions of the world [16]. As some other sensors (such as Huan Jing and DEIMOS) can provide images at medium spatial resolution [17,18], the use of multi-sensor merged image time series has improved the performance of crop identification at medium resolution [19,20].
A range of classifiers, such as Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), and C5 [14,21,22], have been employed for crop type classification. In addition, totake advantage of single classifiers and increase the classifiers' discriminatory power, some hybrid classifiers have been proposed based on product and sum rules [23]. For example, Du, Chang [24] mixed Spectral Information Divergence (SID) and Spectral Angle Mapper (SAM) and Ghiyamat and Shafri [25] mixed SID and Spectral Correlation Measure (SCM) utilizing product rules, and the mixed classifiers have shown good performance in hyperspectral classification. However, a drawback is that the improvement of the product rule is limited when multiple (more than three) classifiers are used. Various other hybrid strategies, such as the multi-classifier approaches multiple voting (M-voting) and probabilistic fusion (P-fusion), have been employed to combine the classification results obtained from multi-spectral and textual features and have improved the performance of the classification [26]. However, few studies have tested the performance of multi-classifier systems on integrating classifiers.
Pixel-based image analyses (PBIA) have led to some misclassifications, which are due mainly to 1) the similar spectral characteristics of some crop classes, 2) the spectral variability due to the canopy and bare soil background reflectance and different crop development schedules, and 3) the mixed pixels located at the boundaries between classes [8]. Object-based image analyses (OBIA) were proposed to solve these problems [13], and a number of studies have found that OBIA methods are effective for classification for both high-and moderate-resolution imagery [27]. In addition, several studies have compared the performance of OBIA with that of PBIA. For images with high spatial resolution (spatial resolution better than 10 m), OBIA outperformed PBIA with better classification accuracy and the capacity to extract patch boundaries [28,29]. At medium spatial resolution (10-100m), OBIA generally provides a visually appealing appearance, but conclusions about statistical accuracy have varied. Some studies have shown that OBIA achieved higher classification accuracy than PBIA [30,31]; some obtained the opposite conclusion, PBIA obtaining better accuracy [32,33]; and others concluded that OBIA and PBIA acquired similar classification accuracy [34,35]. Thus, additional study remains essential to test the potential of OBIA in improving the classification of specific crop classes.
Xinjiang is an important cotton-production region, and the crop fields in North Xinjiang are characterized by large size (larger than 20 ha). Then, two representative counties in North Xinjiang, Bole and Manas, were selected in this research to test the potential of NDVI time series of 30-m spatial resolution to classify crops at both the pixel and object level. Therefore, the objectives of this study were to 1) estimate the potential of NDVI time series obtained by merging Landsat-5 and Huan Jing (HJ)-1 data for crop classification at medium resolution [17]; 2) compare the performance of hybrid strategies (M-voting and P-fusion) with the single classifiers for crop classification; and 3) compare the performance of OBIA and PBIA for crop classification.
The dominant crops grown in the study areas include cotton, maize (spring maize and summer maize), watermelon, grape, tomato, and wheat. The vegetation cover fraction for each crop type over the growing season is presented in Fig 2. Cotton, spring maize, watermelon, tomato, and grape are planted in early April and begin their growth mostly during the June-July period. For harvest, watermelon and tomato are harvested in August, spring maize is harvested in early September, and grapes and cotton are harvested during the August-September and September-October periods. Winter wheat is planted in early November, begins its growth in the next April, and is reaped for harvest in late June. After that, some fields are in rotation, and others are planted to summer crops such as summer maize. Thus, we divided the winter wheat into two classes depending on whether summer crops are planted in the same field or not.

Datasets
Satellite images and vegetation indices. We selected cloud-free (cloud cover less than 20%) satellite images in both study regions covering the entire growing season (Table 1), including both the Landsat TM CDR land surface reflectance product and the Huan Jing (HJ) CCD level 2 product of 2011 [36]. The HJ-1 satellite constellation, launched by the Chinese Government in 2008, has two CCD cameras that observe a broad coverage of 720 km and have spatial resolution of 30m. The CCD cameras have four visible and near-infrared bands, which include B1 (0.43-0.52nm), B2 (0.52-0.60nm), B3 (0.63-0.69nm), and B4 (0.76-0.9nm) [37]. In this research, we intended to obtain a NDVI time series at 30-m spatial resolution and at approximately 15-day temporal resolution during the growing season. However, TM images cannot cover the entire growing season of the study areas at such a high temporal resolution. Therefore, we employed cloud-free Landsat TM and HJ CCD (with similar spatial resolution to Landsat TM) images to build the image time series. All HJ images were georeferenced to the UTM WGS84 zone 44N (Bole) and 45N (Manas), and the HJ images were then registered to the TM images, achieving an RMSE of less than 0.3 pixels using a second-order polynomial transformation and B-linear resampling. Then, radiance calibration and Fast Line-of-sight Atmospheric Analysis of Hypercubes (FLAASH) atmospheric correction were performed using Environment for Visualizing Images (ENVI) for all images [38][39][40]. NDVI was calculated using the reflectance of visible (red) and near-infrared (NIR) bands for both TM and HJ images using Eq (1). NDVI ¼ r ðNIRÞ À r ðRedÞ r ðNIRÞ þ r ðRedÞ ð1Þ Ground reference data. Ground-truth data were obtained by fieldwork in the study regions during 2011. Fields were selected to represent the full variety of crop types and an even distribution across the study areas. The selected fields, 457 fields in Bole and 435 fields in Manas, were then surveyed. For each field, the crop type was collected as attribute information. Field boundaries were recorded using GPS and digitized as polygons in ArcGIS. To avoid  boundary pixels, polygons were converted to a raster format using the TM grid. In total, 10,855 sample pixels for Bole and 9,991 pixels for Manas were obtained. The distribution of groundtruth data is shown in Fig 1. The amounts of training and validation samples for each crop type are shown in Table 2, and the training and validation dataset are provided in S1 Dataset.

Methods
The methodology of the study is presented in Fig 3. First, we measured the consistency between Landsat TM NDVI and HJ CCD NDVI, and then utilized Landsat TM and HJ CCD images to build time series NDVI covering the entire growing season. Then, we used the single classifiers to classify crop types and obtained both a classification map and probabilistic outputs for each classifier. Afterward, the performance of two fusion strategies, M-voting and P-fusion, at both pixel and object levels were assessed using both statistical and visual analyses. The training sample size varied from 50 to 4000 (50, 100, 250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, and 4000) in each study area, and for each training sample size, the samples were randomly selected from the training sample set listed in Table 2. Additionally, all classifiers were trained ten times for each training sample size, and both the average and standard deviation of the classification accuracy were reported.

Similarity Measure between Landsat and HJ NDVI
We evaluated the NDVI consistency between Landsat-5 TM and HJ-1 CCD data by comparing the NDVI of the two sensors for similar dates (Table 3). To reduce the potential impacts of registration inaccuracy, we defined subsets of homogeneous regions of interest (ROI) with 3 × 3 windows located in the middle of larger homogeneous "patches" [41]. Average values of these sampling windows were used to compare the consistency between Landsat-5 TM and HJ-1 CCD NDVI. Through this process, we defined subsets of the ROIs for different crop types, selecting 170 and 153 windows within Bole and Manas, respectively. Scatter plots and linear relationships were used to examine how Landsat-5 TM and HJ-1 CCD NDVI differed in performance.

Image segmentation
To compare the performances of different classifiers at pixel and object level, we segmented the image using the multi-resolution segmentation (MRS) algorithm of eCognition [42]. The MRS algorithm is a "bottom-up" approach that begins with pixel-sized objects and iteratively grows through the pair-wise merging of neighboring objects based on several user-defined parameters, including scale, color/shape, weights of spectral bands, and smoothness/compactness [42]. Landsat images of three time periods were used during the segmentation process in both of the study regions; and a summary of the segmentation parameters used is presented in Table 4.

Classifiers
Single classifiers. The performances of the following three single classifiers were evaluated: 1) Random Forest (RF), 2) Support Vector Machine (SVM), and 3) C 5.0 Rule-Based Model (C5.0). SVM is a non-parametric supervised classifier derived from statistical learning theory [10]. In the simplest form, SVMs are linear binary classifiers that label a given test sample from one of the two possible classes. The classifiers use training samples to find the optimal hyperplanes that separate classes with minimum classification error. Significantly, only the training samples that lie on the margin (called support vectors) are used to define the hyperplanes. The simplest SVMs assume that the problems are linearly separable. However, "data points" of different classes overlap in practice. As a result, the basic linear decision boundaries cannot classify patterns with high accuracy. Thus, the linearly inseparable problems are solved by transforming the nonlinear correlations into a higher (Euclidean or the Hilbert) space using a kernel function [21]. Another problem is that remote sensing classifications always involve multi-class situations. In this case, the binary classifier (simplest SVM) is used as a multi-class classifier based on one-against-one and one-against-others methods [21,43]. Generally, SVM is not based on the assumption that the data are normally distributed for a particular image; thus, SVM could outperform classifiers based on maximum likelihood theory [10]. Additionally, SVM has the advantages of being able to deal with high-dimensional datasets and small training samples [44][45][46], and SVM has been utilized widely for crop classification [10,20,47]. In this study, implementation of SVM was performed with the libSVM (library e1071 for R) [48].The widely used radial basis function kernel (RBF) was selected, and the data space was normalized to a common scale [0, 1]. In addition, training of the SVM included choosing a kernel parameter "γ" (gamma) and a regularization parameter "C" (cost). "C" controls the penalty associated with misclassified training samples, and "γ" determines the gamma of the kernel function [49]. Then, the parameters "C" and "γ" were selected using a systematic 2-D space spanned by "γ"and "C". The RF algorithm is an ensemble machine learning technique that combines multiple trees [22]. Each tree is constructed using two-thirds of the original cases. Then, the remaining onethird of the cases are employed to generate a test classification, with an error referred to the "out-of-bag error" (OOB error). Subsequently, the model output is determined by the majority vote of the classifier ensemble [50]. Two free parameters can be optimized in the RF algorithm: the number of trees (ntree) and the number of features to split the nodes (mtry). The advantages of the RF algorithm, such as the relatively high efficiency with large datasets, the probability output for each class, and the generated OOB error (an internal unbiased estimate of the generalization error), make it suitable for remote sensing applications [51]. In this study, both the crop-label and probabilistic output were obtained using the Random Forest library for R [52]. The "ntree" parameter was set to a relatively high value of 1000 to allow convergence of the OOB error statistic, and "mtry" was set to the square root of the total number of input features [53].
C5.0 is a decision tree algorithm developed from C4.5. In C4.5, when training the model, all training samples are set as the root of the decision tree. Then, the gain information ratio of every feature is calculated based on the entropy of the feature, and the feature with the highest information gain is selected to split the data into multi-subsets. The algorithm repeats this procedure on each subset until all instances in the subset belong to the same class and a leaf node is created [27,54]. C5.0 advanced C4.5 by the development of a boosting technique (generating and combining multiple classifiers to improve predictive accuracy) [55]. C5.0 has several advantages including that it is fast to train and it has a set of rules. Thus, it has been employed widely in land use and land cover classification (LULC) [56,57]. In this study, C5.0 was implemented using the library C50 for R [58], and the parameters "trails" that specify the number of boosting iterations was set at 10.
Uncertainty of the classification result obtained from the single classifier. All three single classifiers employed in this research can provide the probabilistic output, (p 1 (x), Á Á Á, p k (x), Á Á Á, p K (x), k = 1, 2, Á Á Á, K), which can reflect the classification uncertainty of the classifier. In this paper, we used the α quadratic entropy to calculate the uncertainty [59], as in Eq (2): where H(p(x)) represent the α quadratic entropy of the vector p(x), represent the probabilistic outputs. While, α is user-defined value which ranges from 0 and 1, and α = 0.5 was chosen in this study. A smaller H(p(x)) indicates a more reliable classification.
The advantage of the specificity measure is that it applies all the information in the probability vector.
Hybrid strategies: Voting and fusion. We used two hybrid strategies (multiple voting, M-voting and probabilistic fusion, P-fusion) to integrate single classifiers and then compared their performance with that of the single classifiers [26]. Processing chains of the voting and fusion strategies are shown in Fig 4. In M-voting, each single classifier has a vote, and if a class obtains more votes than all the other classes, the pixel is labeled as the class with the most votes.
where C(x) is the label of the pixel x and V x (k) is the number of votes that pixel x received for class k. Otherwise, if the votes that two or more classes obtained are the same, the classification uncertainty of the single classifiers were compared. And the pixel is labeled as the class of the single classifier with the least classification uncertainty: where S f (x) is the classification uncertainty of pixel x with classifier f, which is the H(p(x)) calculated by Eq (2) in this study. Andf is the optimal classifier with the least classification uncertainty. For M-voting at the object level, a similar algorithm is employed. All pixels in an object could vote for the final label of the object, and the object is labeled as the class with the most votes as in Eq (3). In P-fusion, the probabilistic and uncertainty results of single classifiers are utilized to define the classification label C(x), as in Eq (5): where p k f ðxÞ is the probabilistic value of pixel x for class k with classifier f. S f (x) is the uncertainty value of pixel x for classifier f, which is used to enhance the relative weight of the classification information with a low degree of uncertainty. At object level, the probabilistic and uncertainty results of all pixels in an object are used to define the label of the object. The algorithm is similar to that of pixel level (Eq (5)).

Metric for classifier approach comparison
Along with the commonly used accuracy assessment indices, including producer's accuracy (PA), user's accuracy (UA), overall accuracy (OA), and kappa coefficient (kappa), calculated from the error matrix [60], McNemar's test was employed in this study to evaluate the statistical significance of the accuracy of the different classifiers [61]. McNemar's test is a nonparametric test based on the standardized normal test statistic, as in Eq (6): where f 12 is the number of samples that are correctly classified by classifier 1 and incorrectly classified by classifier 2. We defined three cases of differences in accuracy between classifier 1 and classifier 2 according to McNemar's test:

Similarity Measure between Landsat and HJ NDVI
Three matching images (acquired in May, July and September) were used to measure the consistency between Landsat-5 TM and HJ-1 CCD NDVI in both study areas. The scatter plots and linear relationships of the NDVI for the matching images are shown in Fig 5. In both study areas, the R 2 values were larger than 0.9 for the NDVI of all the time periods. In addition, the fitted lines between Landsat-5 NDVI and HJ-1 NDVI were close to the 1:1 line. These results coincided with those of previous research and indicated that Landsat-5 TM and HJ-1 CCD data had a strong linear relationship [62][63][64][65]. Overall, the HJ-1 CCD andLandsat-5 TM images had similar spatial resolution and high NDVI consistency. Therefore, NDVI from both sensors was utilized to obtain the time series for this research. The NDVI time series of the major crops in both study areas are shown in Fig 6. Cotton was the major crop, and the highest NDVI value of cotton was between 0.7 and 0.9 around day 200 (early August). For spring maize, the highest value was similar to that of cotton, but after the peak, NDVI of spring maize decreased faster than that of cotton, and at around day 250 (September), the NDVI of spring maize was relatively lower than the value for cotton. Similar to cotton, the NDVI peaks of tomato and watermelon were around day 200, but the NDVI values were between 0.6 and 0.7, significantly lower than those of cotton and spring maize. For grape, NDVI was high during days 170-270 (from late June to late September). In addition, the NDVI of grape had the largest variability among all crops in the study region (between 0.4 and 0.7). The major winter crop in the study areas was winter wheat, and the time period of high NDVI (above 0.5) was between day 120 and day 130.

Accuracy assessment of classification result
The producer's accuracy (PA), user's accuracy (UA), and overall accuracy (OA) of the different classifiers with different training sample sizes are presented in Tables 5-8, respectively, and the results (mean and standard deviation (SD) of the accuracy) are reported based on ten runs of different training sample sets.
In Bole, when the training sample number was 100, SVM obtained the highest mean OA (79.26%), C5.0 had the lowest mean OA (77.2%) among the three single classifiers; and SVM and RF had similar overall accuracy SD (around 1.8%) which was slightly lower than that of C5.0 (1.92%). Among the major crops in Bole, wheat and wheat-to-summer crop had high PA and UA (higher than 95%). However, cotton, maize, and grape had lower accuracy (UA of cotton were around 70% for each classifier, and PA of maize and grape were lower than 50% for all classifiers) because the NDVI time series these crops were confused (Fig 6). For the hybrid classifiers, both M-voting and P-fusion outperformed all single classifiers with higher mean OA (80.88% and 81.34%, respectively). The improvement occurred mainly because cotton and maize were better discriminated. At the object level, both mean OAs and OA SDs of hybrid classifiers were similar to those at pixel level. Although the accuracy of the confused crops (cotton, maize, and grape) increased when using hybrid classifiers, the accuracy remained low (for example, the mean PA of grape was 47.03% for P-fusion), which indicated that these crops were difficult to discriminate with the 100 training samples. In addition, the accuracy SDs of the hybrid classifiers were generally lower than those of the single classifiers, especially for the crops with high classification accuracy (such as wheat and watermelon). When the training sample number was 4,000, the OA of all classifiers increased, and the accuracy SD decreased significantly. Compared with the classification results obtained by using 100 training samples, the PA of both maize and grape increased significantly, but the PA of grape remained low (about 55%).
The classification accuracies of two different training sample sizes for Manas are reported in Tables 7 and 8. When the training sample number was 50, the OAs of all classifiers were around 90%. The hybrid classifiers outperformed the single classifiers (both higher OA and lower SD of OA), and the hybrid classifiers at the object level achieved similar performance as those at the pixel level. Similar to Bole, wheat-to-summer crop had high classification accuracy (both PA and UA were higher than 90% for all classifiers), and the misclassifications were due mainly to the low accuracy of maize and tomato. When 4,000 samples were used for training of the classifiers, all classifiers achieved high OA (higher than 98%) and low SD (lower than 0.2%), and the PA and UA were generally higher than 95% for all crop types.

McNemar's test
The McNemar's tests for Bole and Manas are shown in Tables 9 and 10, in which the results are divided into three parts based on the training sample number. Basically, we have twelve different training sample sizes ranging from 50 to 4000 in both study areas. Then, we divided the training sample size to three groups. The training sample size ranging from 50 to 500, 750 to 2000, and 2500 to 4000 were supposed as 'small sample size', 'middle sample size' and 'large sample size' respectively. As we had ten model runs for each training sample size, there were 40 model runs in each group. In Bole, SVM outperformed RF, and they both outperformed C5.0; the hybrid classifiers outperformed the single classifiers as there were more "S-" in the "single classifiers versus hybrid classifiers" comparison, and M-voting outperformed P-fusion at both the pixel and object level. As for the comparison between pixel and object level, both M-voting and P-fusion had better performance at object level. For example, when the training sample number was between 750 and 2000 in Bole, "pixel-based voting versus object-based voting" had "4 S+, 23 N, 13 S-", and "pixel-based fusion versus object-based fusion" had "17 N, 23 S-". Additionally, when more training samples were used to train the classifiers in Bole, the number of "N" between classifier comparisons increased. For instance, there were 141 "N" when the training samples number was between 50 and 500, but there were 209 "N" when the training sample number ranged from 2,500 to 4,000. In Manas, when the training sample number was small (ranging from 50 to 500), the result of McNemar's test was similar to that of Bole, and if more training samples were used, the classifiers had similar performances with high classification accuracy, which coincided with the result obtained in the 'Accuracy Assessment' section.

Classifier performance at different training sample sizes
The influences of training sample number on the classification accuracy are shown in Figs 7 and 8 for Bole and Manas, respectively. The figures indicated that the OA increased with training sample number until saturation points were reached; after that, the classification accuracy did not increase significantly. For Bole, the accuracy saturated at about 1,500 training samples. For Manas, saturation points were reached at 500 training samples. This was consistent with the accuracy assessment and McNemar's test, which indicated that when the training sample number of Manas was larger than 500, nearly all classifiers had similar performance. In addition, the SD of the OA decreased when more samples were used to train the classifiers. Among all classifiers, M-voting at the object level was least affected by the number of training samples, such as in Bole, the OA was above 85% when only 250 training samples were employed. Compared with the hybrid classifiers, the single classifiers were more strongly affected by the decrease of training samples. For example, when fewer than 500 training samples were utilized in Bole, the OAs of the single classifiers were lower than 85% and the accuracy SD of the hybrid classifiers was lower than that of the single classifiers. In Manas, when the training sample size was small, the situation was similar to Bole; while, when the training sample number was larger than 2000, all classifiers obtained similar low accuracy SDs.

Discussion
In this study, the dominant crops of two representative counties, Bole and Manas, were classified using NDVI time series. For these crops, some were separable from the others crops, but some crops were confused. In Manas, for example, wheat is a winter crop, and it was well-separated from all summer crops. Thus, wheat obtained high classification accuracy (both PA and UA above 95% for nearly all classifiers), even when only 100 training samples were used (in Table 7. Class-specific producer's accuracies (PA), user's accuracies (UA), and overall accuracies (OA) (%) for the different classifiers (Manas, training sample number = 50). Bole). In contrast, the NDVI time series of cotton, grape, and maize in Bole were a little confused. As shown in Fig 6, the NDVI of grape varied significantly during days 192 and 208. As a result, the NDVI profiles of some grape pixels were similar to those of cotton. In addition, some NDVI profiles series of maize were also similar to cotton. Therefore, the classification accuracy of these confused crops was relatively low. For instance, the hybrid classifiers (M-voting and P-fusion) could increase the PA of grape by 2-5%, but the PA of grape was still below 60% for all classifiers when all 4,000 training samples were utilized. Furthermore, when the crops were well-separable (such as in Manas), the saturation points of classification accuracy were reached at around 500 training samples; but if the crops were confused (such as in Bole), the classifiers needed 1,500 training samples to reach saturation points. Between pixel-based and object-based classification, on one hand, classification accuracies were similar in both Bole and Manas. On the other hand, object-based classification provided a more visually appealing result. A series of subset images that were extracted from Bole and Manas are shown in Figs 9 and 10. Classification results for both Bole and Manas at the object level were less speckled than those at the pixel level, which was consistent with previous studies that object-based classifications could offer a more generalized visual appearance and a more contiguous depiction of land cover [34]. In addition, it is notable that both the voting and fusion methods are used at the object level, which is different from previous studies using features at object level to classify crops, and this enriches classification methods of object-based crop classification. For different training sample size, when a small training sample set was used, the classification accuracy was low, and the hybrid classifiers could improve the classification performance substantially (Tables 9 and 10, training sample number range from 50 to 500). However, if a large training sample set was employed, single classifiers could achieve high classification accuracy (above 90%); thus, the hybrid classifiers did not improve performance significantly (Tables 9 and 10, training sample number range from 2,500 to 4,000).
Hybrid classifiers need to use the output of single classifiers, which leads to greater time consumption. However, some national and local authorities do not always pay major attention to collecting ground reference data [66], so the amount of ground reference that could be used for training the classifiers may be limited. Therefore, researchers could benefit from the hybrid classifiers when the number of training samples is small. Nevertheless, if abundant training samples are provided, single classifiers are more suitable because they can achieve high classification accuracy and are more computationally efficient than hybrid classifiers.

Conclusion and Limitation
In this research, we employed classifier hybrid strategies, M-voting and P-fusion, to integrate single classifiers for crop classification using time series NDVI at both the pixel and object levels. The main conclusions of the research follow:  1. Landsat TM and HJ CCD have similar NDVI; thus, the two data sources could be used together to increase the temporal resolution of the NDVI time series at 30-m spatial resolution. 2. When the training sample number is small (50 or 100), hybrid classifiers outperformed single classifiers (higher classification accuracy and lower accuracy SD). Then, the larger training sample set could improve classification performance; but the improvement reaches saturation points (such as 1,500 samples for Bole and 500 samples for Manas), and additional training sample cannot further improve classification accuracy. Thus, when abundant training samples (4,000) are used, hybrid classifiers do not substantially improve classification performance compared with single classifiers.
3. OBIA did not improve the classification performance compared with the PBIA, especially in the heterogeneous region of Manas. However, OBIA can potentially solve the pixel heterogeneity problem, and fewer "salt-and-pepper" noises were observed in the classification result at the object level.
Although the hybrid classifiers can improve classification performance, especially when a small training sample set is used, the classification accuracy of some confused classes, such as grape in Bole, may remain low (e.g., less than 60%); therefore, some other features, such as texture features and physical features [14], should be used together with NDVI time series to better discriminate confused crops. Additionally, feasibility of the hybrid classifiers in some other study area should be further tested.