Land-based crop phenotyping by image analysis: Accurate estimation of canopy height distributions using stereo images

In this paper we report on an automated procedure to capture and characterize the detailed structure of a crop canopy by means of stereo imaging. We focus attention specifically on the detailed characteristic of canopy height distribution—canopy shoot area as a function of height—which can provide an elaborate picture of canopy growth and health under a given set of conditions. We apply the method to a wheat field trial involving ten Australian wheat varieties that were subjected to two different fertilizer treatments. A novel camera self-calibration approach is proposed which allows the determination of quantitative plant canopy height data (as well as other valuable phenotypic information) by stereo matching. Utilizing the canopy height distribution to provide a measure of canopy height, the results compare favourably with manual measurements of canopy height (resulting in an R2 value of 0.92), and are indeed shown to be more consistent. By comparing canopy height distributions of different varieties and different treatments, the methodology shows that different varieties subjected to the same treatment, and the same variety subjected to different treatments can respond in much more distinctive and quantifiable ways within their respective canopies than can be captured by a simple trait measure such as overall canopy height.

Rahman and Krouglicof [3] investigated the robustness and accuracy of Brown's model for a wide range of lens distortions. They found that, for most camera except web camera, a lower order lens distortion model produces satisfactory accuracy at considerable computational saving [3]. In practice, we can achieve satisfactory performance for radial distortion correction using a Brown's model with just first or second order radial distortion coefficients.
There have been many algorithms developed for the estimation of model parameters.
In one class of methods [2,3], the relationship between 3-D reference coordinates and 2-D image coordinates is required. The estimation of the camera model parameters is then achieved numerically according to the set of the 3-D to 2-D correspondences. The major advantage of this approach is that extrinsic and intrinsic camera parameters and distortion parameters can be estimated simultaneously. However, obtaining accurate coordinates of 3-D scene points is sometimes demanding [4]. Usually, a calibration object of known structure is used for extracting accurate coordinates of 3-D scene points. A second class of methods [5] exploit the regular patterns of planar objects such as a chessboard for camera calibration. Zhang [5] used multiple views of planar grids taken from unknown viewpoints to estimate lens distortion parameters as well as camera parameters. In a third class of methods, no calibration object is required. Some methods [4,6] estimate lens distortion parameters from individual images by line rectification. Alvarez and Sendra [6] proposed an algebraic approach for the estimation of lens distortion parameters by line rectification, in which lines are predetermined by manual selection. Ahmed and Farag [4] proposed to use a robust approach based on the least-median-of-squares (LMedS) for the estimation of lens distortion parameters (one radial distortion coefficient and one tangential distortion coefficient). In theory, the LMedS method can handle up to 50% outliers in the input data. Therefore, their method [4] can obtain reasonable estimates if most extracted curves are straight lines in the undistorted images.
Unfortunately, in many applications, such as the field imaging situation currently under discussion, there are no or too few straight lines that can be captured in images. Therefore, the methods described above cannot be employed for plant crop phenotyping.
Farid and Popescu [7] developed a method based on spectral information to estimate the lens distortion model parameters. However, the accuracy with which this was obtained was compromised because only substantial lens distortion can be detected in spectral information. In this paper, we take full advantage of the use of stereo cameras to develop a robust self-calibration algorithm that does not rely on peripheral hardware such as a calibration chart.

The process of self-correction for lens distortion
For a given stereo pair of cameras, the cameras are positioned with optical axes aligned in one plane. In this application to field phenotyping we exploit the fact that crops are planted on locally flat fields to achieve self-calibration. If the lenses of stereo cameras are undistorted and the plane of the camera sensors is parallel to the ground plane, the distance between any two points on the ground (plane) in the right image will be the same as the distance between the two points detected in the left image taken by the stereo camera pair. Let P 1 and P 2 be two arbitrary points in the ground plane. We then have that where (u I1 , v I1 ) and (u I2 , v I2 ) are the coordinates of P 1 and P 2 , respectively, in image I; and (u J1 , v J1 ) and (u J2 , v J2 ) are the coordinates of P 1 and P 2 , respectively, in image J.
It is well known that zoom cameras deviate from the ideal pinhole model.
Fortunately, in most cases, the first-order Brown's model can adequately correct for this distortion caused by the camera lenses. In such a case, the first-order Brown's model where (x,ŷ) and (u, v) are the coordinates of a point in the original image and in the distortion corrected image, respectively. The distortion centre is denoted (x 0 , y 0 ), and k is the first-order distortion parameter. For convenience, let x denotex − x 0 and y denoteŷ − y 0 . By applying (S4) and (S5) to (S3), we have We rewrite this equation as with The parameter k can be obtained from the solution of (S7). From the quadratic we That is, we get two possible values for k. The relevant root of the quadratic is chosen as the one giving the smallest distance after lens distortion correction, where the distance is defined as

Optimization
In a previous subsection, we described a procedure to estimate the lens distortion parameter k from two points matched from a stereo image pair. It is inevitable, however, that errors are incurred in the process of matching of points in a stereo image pair. As the underlying field is not perfectly flat both fine and coarse scale errors due to the deviation from ideality will naturally arise. In most cases the feature matching process found corresponding points with position errors usually within 1 to 2 pixels.
These errors are classified as fine errors and we have used multiple matched points in order to estimate the lens distortion parameters with robustness to fine errors. In a few cases, however, the feature matching process falsely matches points, which can be classified as coarse errors. It is well known that the RANSAC algorithm [8] is robust to coarse errors if these are less than 50%. Therefore, the RANSAC algorithm has been adopted here to avoid the selection of falsely matching points.
Let N denote the number of matched points obtained from the feature matching process. From these we form the N (N − 1)/2 pairs. The error function utilised in the RANSAC algorithm to evaluate the quality of the estimated parameter is here defined as the sum total of M point pairs with the least errors: where M < N (N − 1)/2.

5/7
The RANSAC algorithm for lens distortion parameter estimation can be summarized as follows.
Step 1 Randomly select a pair of image points; Step 2 For these, estimate the parameter k; Step 3 Calculate the error using (S10); Step 4 If the error is within a user-specified tolerance, we accept the estimated parameter k. Otherwise, repeat Steps 1-3; Step 5 If fail, select the k with the least error.