Improved Climate Risk Simulations for Rice in Arid Environments

We integrated recent research on cardinal temperatures for phenology and early leaf growth, spikelet formation, early morning flowering, transpirational cooling, and heat- and cold-induced sterility into an existing to crop growth model ORYZA2000. We compared for an arid environment observed potential yields with yields simulated with default ORYZA2000, with modified subversions of ORYZA2000 and with ORYZA_S, a model developed for the region of interest in the 1990s. Rice variety ‘IR64’ was sown monthly 15-times in a row in two locations in Senegal. The Senegal River Valley is located in the Sahel, near the Sahara desert with extreme temperatures during day and night. The existing subroutines underestimated cold stress and overestimated heat stress. Forcing the model to use observed spikelet number and phenology and replacing the existing heat and cold subroutines improved accuracy of yield simulation from EF = −0.32 to EF =0.70 (EF is modelling efficiency). The main causes of improved accuracy were that the new model subversions take into account transpirational cooling (which is high in arid environments) and early morning flowering for heat sterility, and minimum rather than average temperature for cold sterility. Simulations were less accurate when also spikelet number and phenology were simulated. Model efficiency was 0.14 with new heat and cold routines and improved to 0.48 when using new cardinal temperatures for phenology and early leaf growth. The new adapted subversion of ORYZA2000 offers a powerful analytic tool for climate change impact assessment and cropping calendar optimisation in arid regions.


Introduction
Heat and cold sterility may limit rice production in the current and future climate. Together with phenology they determine in which period rice can be grown with acceptable yield. Crop models can be used to explore options for larger areas and future climates. The simulations by Matthews et al. [1,2] showed large future yield reductions due to increased heat sterility for several regions in parts Asia. Two recent global studies [3,4] show for arid regions such as the Sahel and Pakistan very different impacts of climate change. In the Gourdji paper [3] the arid regions colour dark red showing large climate risks. The Texeira study [4] shows much smaller climate risks in the same regions. We cannot discuss causes of these different outcomes here, but the large discrepancies clearly show that large uncertainty exists in climate change impacts on rice production in arid regions. A lot of experimental research has been conducted on heat and cold sterility risks in rice over the past decade which has not yet been incorporated into existing crop growth models. This study focusses on the ability of the ORYZA2000 model to simulate yields in two sites in Senegal. Senegal is a country with access to river irrigation water, in the Sahel, close to the Sahara desert. Radiation levels are high, humidity is low, temperatures are often above 40°C for days in a row and dangerously low night temperatures (<15°C) occur during part of the year. The large temperature differences within days and at different times of the year make the Sahel regions an interesting site for model evaluation under a wide range of temperature conditions.
As we will show in this paper, the original ORYZA2000 model could not accurately simulate yields in environments with extreme temperatures. We proposed and tested a series of model improvements based on recent experimental research. In environments without severe heat and cold sterility and when phenology is separately calibrated for experiments, ORYZA2000 has been shown to accurately simulate yields [5][6][7][8][9][10][11][12][13][14][15]. On the other hand, it has been shown that the predecessor of ORYZA2000, the ORYZA1 model [16], could not well simulate yields in arid regions such as the Sahel. Main problems identified were poor simulation of phenology, heat and cold sterility. A separate version of ORYZA1, called ORYZA_S, was developed that resolved those issues [17,18]. It included a new submodel for rice development and sterility (RIDEV). Over the last two decades ORYZA1 was extended with ability to simulate water and nitrogen limited production [8]. New research has led to better insights in heat and cold sterility [19,20] in arid and humid climates. These new insights have been incorporated in RIDEV2 (the successor of RIDEV) and they are currently being incorporated into the SAMARA model (Dingkuhn, pers. comm.). At this stage we could have proceeded with either ORYZA_S or with ORYZA2000. We chose to proceed with the ORYZA2000 because we hope that any improvement in the main model can also be useful under other conditions than those tested here, i.e. with water or nitrogen limitation [8] or in crop rotations [21]. With ORYZA_S such applications would not be possible. All parts of the model ORYZA2000 are well documented in a book [8]. The fact that the source code of the ORYZA2000 model is freely available makes the model easily amenable for possible improvements.
Four bodies of experimental research seemed especially relevant for incorporation into the existing model: (1) cardinal temperatures for phenology, (2) cardinal temperatures for early leaf growth, (3) spikelet formation and (4) heat and cold sterility. Firstly, we revisited assumptions on cardinal temperatures. Van Oort et al. [22] showed that phenology simulation could be improved by assuming, for the variety considered in this paper, cardinal temperatures quite different from the default cardinal temperatures in ORYZA2000: a higher base temperature (14°C vs 8°C), a slightly higher optimum temperature (31°C vs 30°C) and no delay in development above the optimum temperature. Secondly, cardinal temperatures are also relevant for simulating early leaf growth, which is generally assumed to be temperature dependent. Under the default settings ORYZA2000 simulates early leaf growth assuming cardinal temperatures of 8, 30 and 42°C, i.e. no leaf growth below 8°C and above 42°C and highest growth rates at 30°C. Most publications on early leaf growth in tropical environments suggest a higher base temperature: Nishiyama et al. [23]: 7-16°C (same data are cited in [24]); Rebolledo et al. [25]: 12°C; Sanchez et al. [26]: 11.8°C for leaf initiation and 14.5°C for shoot growth). Thirdly, we revisited assumptions on modelling spikelet formation. In ORYZA2000, spikelet number is simulated as the total biomass growth from panicle initiation to flowering multiplied with a constant spikelet growth factor (SPGF). Recent work by Yoshida et al. [27] and Kato and Katsura [28] suggested to split up the phase into two separate phases. First, up to about two weeks after panicle initiation juvenile spikelets are formed. The number of juvenile spikelets depends on total N in the crop (kg N ha −1 ). Next in the approximately 2 weeks before flowering, spikelets may be aborted if biomass growth during that period is insufficient. Fourth, we revisited assumptions on heat and cold sterility modelling. In the current ORYZA2000 model, sterility is simulated using daily maximum temperature (T max ), which generally occurs around 2pm (2 hours after the sun reached its highest point above the horizon). In reality rice more often flowers late in the morning (when temperatures are lower), an adaptive capacity to avoid heat stress [19,29,30]. Another problem with using T max is that it ignores transpirational cooling. Several studies have shown that in arid climates significant transpirational cooling of the panicle can occur [20,31,32,33,34], leading to panicle temperatures up to 7°C cooler than air temperature at flowering time. Ignoring flowering time and transpirational cooling could lead to overestimation of heat sterility, especially in arid environments. ORYZA2000 simulates cold sterility with a cooling degree days approach in which a temperature sum is accumulated for days with average temperature below a threshold temperature [35,36]. A possible drawback of this method is that in environments with a large diurnal temperature amplitude, such as in the arid climate considered in this paper, average temperature may seem "safe" while in reality the low night temperatures have a severe impact. A second critique on the cooling degree days approach is that it uses air temperature, while using a combination of water and air temperatures could result in more accurate predictions [20,37,38]. While this is well accepted, the challenge for modellers is to how to model water temperature. We used the empirical equations presented in [39,40]. We are aware that more mechanistic water temperature models exist [41,42]. But since we were unsure about their validity in the environment considered here whereas the RIDEV2 model was developed based on observations in the same environment as in which our experiments were conducted, we chose to proceed with the RIDEV2 equation for water temperature.
The objectives of this paper are (1) to integrate above mentioned research into the framework of the ORYZA2000 model and (2) to test systematically for each of them individually and combined how much they contribute to increased accuracy in yield simulation.

Methods
In the following sections we will describe the models used and the modifications made ( §2.1 and §2.2), the experimental data ( §2.3) and the methods of model comparison ( §2.4)

ORYZA_S
Although we chose to proceed with ORYZA2000, we did use the ORYZA_S model [17,18] as a benchmark. For this we used the version of ORYZA_S used in a WARDA training course in march 1999. This version contained locally optimised parameters for variety IR64, the same as used in this study. We performed two simulations: heat and cold sterility equations based on Dingkuhn and Sow [17,18]. Cold sterility is calculated based on the average of minimum air temperatures in the period from development stage 0.85 to 1.0, approximately the 14 days before flowering. Heat sterility is calculated based on daily average temperatures, averaged over the period from development stage 0.95 to 1.25, approximately the 12 days centred around the 50% flowering date.

ORYZA2000 with modifications
Our starting point was the ORYZA2000 model version 2 number 13. Subversions of the model were developed incorporating theories described in the following subsections. For naming convention, we suggest to use ORYZA2000v2n13s1 (version 2, number 13, subversion 1), which we will for brevity in this paper refer to as s1 to s26. All acronyms used in the sections below are listed in a supporting table (S1 Table).
The ORYZA2000 model dynamically simulates physiological processes in interaction with their environment [8]. The ORYZA2000 has been extensively validated in other studies (see the list of references cited in the introduction). We used default crop parameters from the crop file for variety IR72, which provide a good base assumption for other high yielding irrigated lowland varieties such as IR64 [15]. Any modifications to parameter values specifically for IR64 are discussed in the following sub sections. We simulate potential production, which is production free from weeds, pests, diseases, water stress and nutrient stress. It is impossible to describe the full model here but a good book and website are available [8], https://sites.google. com/a/irri.org/oryza2000/home). In the following section we document the modifications made.

Phenology
In ORYZA2000 the following developmental phases are discerned: We applied two approaches for simulating phenology (motivation for this is in §2.4). In step 1 we forced phenology to observed values using the very simple approach below. Daily development rates (unit d −1 ) were set to: Where OBSDUR p,i is the duration of phase p in treatment i in days. Note that the point of transition from the BVP to PSP was not observed, therefore these phases were combined. Development stage DVS was then calculated adding up these respective development rates, accumulating one DVR unit per day. This approach ensures exact reproduction of observed dates of emergence, panicle initiation, flowering and maturity without having to go through the complexities of separately calculating temperature sum dependent development rates for each phase and each sowing date.
In step 2 we simulated phenology using the bilinear temperature response model and hourly air temperatures. We created one model subsversion with default parameters: cardinal temperatures (8,30, 42°C for base, optimum and maximum temperature for development), no photoperiod sensitivity and a transplanting shock parameter of 0.4. We also created subversions with optimised cardinal temperatures (14, 31, 999°C), no photoperiod sensitivity and a transplanting shock parameter of 0.0. These parameters were previously determined from the same phenological and climate data as used in this paper, using a phenology calibration program described in [22]. This phenology calibration program calibrates all phenological parameters simultaneously and minimises correlation between phenology errors and temperature. Basically this research showed that with the default cardinal temperatures, duration from emergence to flowering was underestimated at lower temperatures and overestimated at higher temperatures. We will present further on in the paper a comparison of simulated duration from emergence to flowering as obtained with these different cardinal temperatures. The base and optimum temperatures obtained by van Oort et al. [22] are consistent with a recent independent review of base, optimum and maximum temperatures for development [26]. Note that these base temperatures are for a bilinear temperature response model. Other studies have shown that phenology can be more accurately simulated with a sigmoid or bell shaped function [43,44]. In such functions we often find lower base temperatures but with hardly any increase in development rate (low slope) up to somewhere between 10 and 16°C. So the meaning of the base temperature is different depending on the model used. In practice nobody would grow rice under temperatures close to the base temperature for prolonged time, in that sense this base temperature will always remain somewhat hypothetical. Likewise uncertainty exists in the nature of temperature response above the optimum temperature [45], The SIMRIW model and cardinal temperatures calibrated by van Oort et al. [22] suggest no decrease in development while several other phenology models show a sharp decrease in development towards a maximum temperature. Few studies have been conducted in environments with temperatures above the optimum for a long time. The Sahel environment is interesting for phenological studies because such contrasting temperatures occur, often for prolonged times above the optimum temperature for development.

Leaf growth
Leaf growth in ORYZA2000 is split into two phases. When the leaf area index (LAI) is below 1.0 (m 2 leaf m −2 soil) leaf growth is simulated as a function of temperature only. While LAI is above 1.0, leaf growth is simulated as dependent on net photosynthesis and assimilate partitioning to leaves. A mechanism has been implemented in subroutine SUBLAI3 to ensure a smooth transition between temperature and radiation driven LAI growth around the LAI = 1 point. For the first (LAI<1) phase, hourly air temperatures are calculated as [8: p. 31]: The same approach is used for calculating thermal time for phenology. In these equations TBLV, TOD, and TMD are the base, optimum and maximum temperature for leaf development.
From the hourly values, the daily average heat unit (HULV,°Cd) is calculated. Daily LAI growth, GLAI, is calculated as: Where RGRL is the "relative growth rate of the leaves" parameter. For the phase when LAI is greater than 1, LAI keeps growing depending on simulated leaf mass growth (RWLVG) and specific leaf area (SLA) and loss of leaves dues to senescence (LLV). RWLVG depends on total net biomass growth and on the fraction assimilates partitioned to the leaves FLV. In the model FLV and SLA are both functions of development stage. It was noted during simulations that for the crops with very long periods from emergence to flowering, LAI would reach very high values. This occurred in Fanaye with sowing in October to December and in Ndiaye with sowings from October to April. During these periods low temperatures ensured a long vegetative period while high radiation levels allowed for high LAI growth. The highest simulated LAI was 22. Such LAI values are not realistic. What is more realistic is a maximum of around 10. Even an LAI of 10 will only be obtained under exceptional conditions (long growing season, ample nutrition, water and radiation). The unrealistic high LAI values may have different causes. One could be that the simulations overestimate early growth. We first tried modifying the parameter RGRL. This lead to improved predictions for some sowing dates but poorer prediction for the other dates. Replacing the cardinal temperatures was more effective, and as we showed in the introduction, several studies [23][24][25][26] suggested base temperatures higher than the default 8°C for TBLV. A second possible explanation for the unrealistic high LAI is that the default model does not account for reduced assimilate partitioning to the leaves (FLV) at lower temperatures [17,18]. We implemented temperature dependent partitioning as: With these equations FLV is higher for higher daily average temperatures (TAV,°C). FLV decreases with development stage (DVS) to 0 at flowering (DVS = 1.0). The fraction assimilates partitioned to the storage organ (FSO) is a fixed function of development stage. And the fraction assimilates partitioned to the stem is calculated as 1-FLV-FSO. Under default settings FLV also decreases with DVS [8: p. 172] but FLV is not adjusted by TAV as above.
A third possible explanation for the unrealistic high LAI is that leaves in the bottom of the canopy could die when for prolonged time insufficient radiation penetrates through the canopy and when leaves reach a certain age. This process, which we will refer to as "shading to kill leaves", is not present in ORYZA2000 version 2 number 3. Following the SUCROS model [46,47] we used the following function to kill leaves: Where LLVSH (kg DM ha −1 d −1 ) is the loss of leaf dry matter (DM) due to shading, WLVG is total green leaf dry matter (kg DM ha −1 ) on a given day and LAICR = 4.0 (m 2 leaf m −2 soil) is the critical LAI above which leaves start to die. According to this equation, no leaves die below LAICR. Above LAICR the fraction of leaves dying increases linearly with LAI, up to a maximum of 3% a day at LAI = 8 and higher. A very similar shading function is used in the WOFOST crop growth model. Version 3 of the ORYZA2000 model also has a "shading to kill leaves" function that accounts for genetic variation in tolerance to shading and which simulates higher death rates at lower radiation levels (Tao Li pers. comm.). We simulated with these different approaches for "shading to kill leaves" and found that they resulted in almost similar values for accuracy of yield simulation. Since we lacked LAI data for validation, it remains impossible to tell which one is most correct. For simplicity we will present only results of the Equation 9 "shading to kill leaves" function. We created model subversions for early growth (Eq 5-7) by using either (1) the default (8, 30, 42°C) values for TBLV, TOD, and TMD or (2) the values obtained from the phenology calibration exercise discussed in previous section (14, 31, 999°C). We created subversions by using for partitioning either (1) the default non-temperature dependent function or (2) the temperature dependent partitioning function (Eq 8). We created subversions without and with the "shading to kill leaves" function (Eq 9).

Spikelet formation
In ORYZA2000 and in ORYZA_S the number of spikelets formed during the PPP phase (panicle initiation to flowering) is calculated as [8: p. 60]: Where GCR is the gross biomass growth in kg dry matter per day simulated by the model. SPGF, the spikelet growth factor, is a parameter indicating the number of spikelets formed per kilogram of dry matter formed (no kg −1 ). Summation is over the PPP development phase (0.65<DVS<1.0). The experimentally determined SPGF is 64900 for variety IR72 [8: p. 61]. Yoshida et al. [27] and Kato and Katsura [28] split up the PPP phase into two separate phases. First, up to about two weeks after panicle initiation, juvenile spikelets are formed. Their number depends linearly on total nitrogen (N) in the crop (kg N ha −1 ). Next in the approximately 2 weeks before flowering, spikelets may be aborted if biomass growth is insufficient. ORYZA2000 simulates the N content in leaves, in g N m −2 leaf ha −1 . From this in combination with specific leaf area (SLA) we can calculate the new variable for leaf N content (NCLV, kg N per kg leaf). Stem N content (NCST) is assumed to be half of leaf N content (cf [8: p. 101]). We multiply these N contents with leaf and stem biomass (kg DM ha −1 ) to obtain total N of the aboveground biomass (NBIOM, kg N ha −1 ). At DVS 0.825, halfway between panicle initiation and flowering, we simulate the total number of juvenile spikelets as: Where SPGFJ is the spikelet growth factor for juvenile spikelets formed per kilogram of N in the total above ground biomass. This model is a simplification of the Yoshida et al. [27] model in which at very low NBIOM spikelet formation is higher than expected based on a linear Equation (11). We chose for this simpler model for its simplicity and because NBIOM in our experiments was always expected to be high. Fig. 1 shows for variety IR72 the relation between NSPJUV and NBIOM, with SPGFJ = 4,131,400 juvenile spikelets / kg N (or 4,131 juvenile spikelets per gram N as in Fig. 1). The NSPJUV values shown in this figure were not directly measured, but inferred from final NSP taking into account an abortion function for which the parameter was estimated as described in [27].
In the second phase, a fraction of the juvenile spikelets is aborted (see also [48,49]). We denote the days from DVS 0.825 to flowering (DVS = 1.0) as CNT2BF and the sum of daily gross biomass growth during this period as GCR2BF = SGCR. With the Yoshida model [27] the final number of spikelets (no ha −1 ) was calculated at flowering as: Where the exponential term is the fraction aborted spikelets and 0.1 is a unit conversion term. Kato and Katsura [28] proposed an alternative model for abortion: In this equation NSPJUV is multiplied with floret weight, which we assumed to be 3 mg per floret.
Broadly, the three approaches are similar in that more biomass growth from PI to flowering results in more spikelets. Under N limitation, the Yoshida model [27] and the Kato and Katsura model [28] predict lower spikelet formation. But also the default Equation 10 would predict less spikelet formation under N stress, as N stress would also reduce GCR. Also under non-limiting conditions such as in our experiments the three approaches can give different results, due to their sensitivity to the leaf mass: stem mass ratio. Imagine two crops with the same total leaf + stem mass, but one crop has relatively more leaves. Since N content of the leaves is higher than that of the stems, the crop with more leaves will produce more juvenile spikelets (Eq 11). Abortion would also be lower if the higher leaf mass results in a higher GCR2BF. Together, production of mature spikelets (NSP) would be higher. We may therefore expect that the accuracy of the spikelet formation calculation depends strongly on accurate simulation of leaf and stem growth. We compared the three approaches for simulating spikelet number, using either the default value for SPGF (= 64900) or SPGFJ = 4131400 from Fig. 1.

Heat fertility
Heat fertility in ORYZA2000 is simulated as a function of maximum air temperature [8: p. 61] based on earlier work by Horie [36]:  [19,20], van Oort et al. [32] and Jagadish et al. [50], but inspired also by works of Matsui et al. [31] and Weerakoon et al. [51] who showed the importance of correcting for transpirational cooling. Improvements relative to the ORYZA2000 model include consideration of flowering time within the day (earlier flowering leads to exposure to lower temperatures) and transpirational cooling. First, peak flowering time was calculated based on the average of minimum temperatures in the preceding 7 days (T min7 ) and sunrise time (t sunrise ), a function from Julia and Dingkuhn [17]: Air temperature at peak flowering time was calculated by using the Goudriaan and van Laar [52] and Ephrath et al. [53] diurnal temperature model, with T min and T max the minimum and Improved Climate Risk Simulations for Rice in Arid Environments maximum air temperature and DL the daylength (see also [32]): The vapour pressure deficit VPD (in kPa) was calculated from the early morning vapour pressure VPA (which was calculated assuming T dew = T min , see §2.3.2) and the saturated vapour pressure at T air (t peakfl ), Equation 17. Panicle temperature T pan (t peakfl ), was then calculated based on the equation reported in Julia and Dingkuhn [18]: Transpirational cooling with Equation 18 can be as large as 7°C (at RH = 25%, T air = 40°C). A second empirical model for transpirational cooling was recently published [32]. Within the range of relative humidities and temperatures present in our environment this second model predicts similar magnitudes of transpirational cooling. Since the choice between the two is then arbitrary, but the Julia and Dingkuhn [17] empirical model was based on a larger dataset, we chose to simulate only with their model. Different equations have been presented for the relation between panicle temperature and heat fertility. Some of these are based on counts of germinated pollen or counts of pollen on stigma (for example see [31,51,[54][55][56][57][58]). Others are based on the assumption that the fraction filled spikelets (FFS) is the same as the fertility (e.g. [59,60,61]). The latter assumption is valid only when the crop is sink limited, under source limitation FFS will be less than fertility (SPFERT). Especially under favourable conditions, rice can produce more spikelets than it can ever fill [49] leading to source limitation and hence reduced FFS values. Source limitation is also more likely when drought stress occurs, resulting in a lower FFS value [62]. Due to difficulties in quantification of source size, some doubts may always remain on the validity of the FFS = SPFERT assumption. For these reasons, we preferred to use an equation with pollination based parameters. The following equation was derived specifically for variety IR64 and based on measured spikelet temperatures [50]: In this sigmoid 90% of the spikelets are still fertile at 29.7°C, 50% of the spikelets are fertile at 35.0°C and 10% of the spikelets are fertile at 40.0°C. With a sigmoid fertility decreases more and more per unit (°C) temperature increase up to the 50% fertility point. Beyond this point, fertility increases less and less per unit (°C) temperature increase. More realistic might be a continuously increasing sensitivity [51]. But as in our data we never came across the 50% point and Equation 19 was determined particularly for IR64, the variety used in our simulations, we proceeded with this equation. Acknowledging that flowering occurs over a number of days we counted the days from DVS 0.96 and DVS 1.2 (SIMDUR 0.96-1.2 ) and averaged daily SFHEAT values over this period: Cold fertility Cold fertility in ORYZA2000 is simulated as a function of cold temperature sum COLDTT, accumulated from DVS 0.75 to 1.2 when daily average temperature TAV (calculated as (T max -T min )/2)) is below 22°C: Our new method for simulating cold fertility considers two cold sensitive phases [18]. The microspore stage starts at panicle initiation and lasts till approximately halfway flowering. The panicle exsertion stage starts at the end of the microspore stage and continues till flowering. According to Julia and Dingkuhn [18] cold during the microspore stage causes sterility and cold during the panicle exsertion stage causes incomplete exsertion of the panicle. Moreover non exserted spikelets are also infertile. During these two stages the meristem moves up in the canopy from being at flood level at panicle initiation to in the top of the canopy at flowering. Therefore, it would be best to simulate using flood water temperature as input at panicle initiation and air temperature as input at flowering, with in between a weighted average depending on crop stage [37,38]. ORYZA2000 does not simulate panicle position within the canopy. As a simplification we therefore chose to simulate microspore fertility (SFCOLD1) with minimum floodwater temperature and panicle exsertion stage fertility (SFCOLD2) with minimum air temperature. We assumed the transition between these two phases was at DVS 0.825, halfway between panicle initiation (DVS 0.65) and flowering (DVS 1.0). SFCOLD1 was simulated as a function of minimum floodwater temperature, which depends on how much radiation reaches the floodwater and diurnal air temperature amplitude. Radiation reaching the floodwater depends on light transmission through the canopy. ORYZA2000 simulates the leaf area index (LAI). We simulated the light transmission ratio (LTR) assuming an extinction coefficient of k = 0.6 [39,40]: Minimum floodwater temperature (T w,min ) was calculated based on an empirical relation derived based on thousands of observations across a wide range of environments [39,40]: This model (as is any model) is a simplification, it does not account for floodwater depth, turbidity, temperature of inflowing water etcetera. At a later stage, incorporation of more sophisticated models [41,42] may be considered. Julia and Dingkuhn [18] described the following parabolic relation for cold sterility during the microspore stage: SFCOLD1 ¼ maxð0; 1 À ð0:0094 Â T 2 w;min À 0:431 Â T w;min þ 5:039ÞÞ ð25Þ Two conceptual objections against this equation are that SFCOLD1 can never get bigger than 0.9 (90%) and that cold sterility decreases again above T w,min = 23°C. Julia and Dingkuhn [18] assumed fertility was equal to the fraction filled spikelets, FFS. Here we do want to allow our model to reach 100% fertility and we presume the remaining 10% unfilled may have been caused by source limitation. We therefore fitted the following linear model through the data of Julia and Dingkuhn ([18], Fig. 7 of their paper): In this model fertility is 1 for T w,min > 20°C and decreases linearly to 0 at 13°C. Since the microspore stage lasts approximately 14 days, the question is how to calculate aggregate SFCOLD1 over this period. We consider two alternative rules for this: Where either the average or minimum over the duration (in days) of the microspore stage (0.65<DVS<0.825) is taken. The approaches can have a different effect when one or few very cold days happen to occur during the microspore stage, in that case the minimum rule will simulate lower SFCOLD1 than the averaging rule. Such conditions occur when the microspore stage happens to occur at the start or at the end of the coldest part of the year. Such conditions occurred for emergence months 9 and 11 (September and November) in our experiments. We therefore simulated with both approaches and compared their effect on accuracy of yield simulation.
For the panicle exsertion phase we assumed the same fertility function (with 20 and 13°C) but calculated daily fertility using minimum air temperature: Again, discussion may arise on how to calculate aggregate SFCOLD2. Since panicle exsertion is a more continuous processes in which slow exsertion in one day can be compensated by faster exsertion on another day we chose to calculate the aggregate SFCOLD2 as the average of daily SFCOLD2 values over the days from DVS 0.825 to DVS 1.0: Another issue is whether SFCOLD2 should be combined with SFCOLD1 to calculate final cold fertility, or whether SFCOLD2 should be multiplied with spikelet number. Practically for yield simulation and validation this will have no impact. But it does matter when we validate and calibrate spikelet formation. In that case, one should take care of having consistency in definitions. It would be wrong if the model output showed only the exserted spikelets while observations are the sum of exserted and non-exserted spikelets. In our experiment, spikelets observed are only the ones exserted. Non-exserted spikelets were not counted. Therefore for consistency we multiplied the number of mature spikelets (NSP) with SFCOLD2 and calculated SFCOLD as SFCOLD1.

Combined fertility and effect on yield
Combined fertility in ORYZA2000 is calculated as SPFERT = min(SFHEAT, SFCOLD). We proceeded with the same rule substituting SFCOLD (Eq 22) with SFCOLD1 (Eq 27 or 28) and SFHEAT (Eq 14) with SFHEAT (Eq 20). Again, discussion may arise how to calculate aggregate fertility. We therefore also calculated combined fertility as SPFERT = SFHEAT x SFCOLD (product rule). In many cases the distinction between these rules may seem trivial, as it will be very rare to find cold sterility during the microspore stage followed by heat sterility in the flowering stage. In normal cases SFHEAT will equal 1 if SFCOLD is less than 1 and vice versa. Again, with the large diurnal temperature amplitudes of our dataset, it may become an issue. SPFERT sets the upperbound for the fraction filled spikelets (FFS) and for the simulated yield. Yield in the model output is called WRR (weight rough rice, kg DM ha −1 ). From SPFERT we calculate the highest possible WRR as: Where NSP is the number of spikelets per hectare (see §2.2.3) and WGRMX is the maximum grain weight which we assumed to be 25 mg per grain. After flowering ORYZA2000 simulates grain filling based on assimilation and respiration, which in turn depend on environmental conditions, amount of stem reserves, LAI at flowering, senescence and duration of the grain filling phase. When the crop is sink limited (if NSP or SPFERT are low) WRR will not become larger than PWRR. Any excess assimilates produced are in such cases partitioned to the stems.

Experimental data
Sites and crop data. We used experimental data previously reported in de Vries et al. [59], for variety IR64. The crop was sown once per month at two sites in Senegal, for 15 sowing dates in a row in the years 2006-2007, thus there were 30 treatments. The site Ndiaye (16°11'N, 16°1 5'W) is located in the Senegal river delta, the site Fanaye (16°32'N, 15°11'W) is located along the same river approximately 150 km inland. The weather is more extreme in this site. Dates of emergence, panicle initiation, 50% flowering and maturity, spikelet number, grain weight and fraction filled spikelets were recorded. We took averages of observed values from 3 replicates for each treatment. The crops were well fertilised and kept free from weeds. Diseases were not a problem. With 3 replicates per treatment and 30 treatments it was practically and economically not feasible to sample within the growing season. Consequentially calibration and validation of LAI, organ biomass and organ N content during the growing season was not possible. No crop data other than developmental stages (phenology) were recorded for the 18 December emergence date in Ndiaye. Thus we have n = 29 treatments.
Weather data. Minimum and maximum air temperatures were measured from stations located in a rice field adjacent to the experiment. Wind speed and precipitation, dew point temperature and radiation were recorded, but instruments for dew point temperature and radiation failed during part of the season, caused by a storm after which replacement took a long time. In Fanaye, the weather station failed altogether in 2007, for that year we replaced temperatures with those observed for the same dates in 2006. We chose to replace all radiation with radiation from the NASA POWER database. It has been shown that this satellite based dataset provides unbiased and quite accurate radiation data when compared with radiation observed on the ground [63]. The ORYA2000 adjusts maximum photosynthesis rate according to atmospheric CO2 concentration [8: p. 48]. For atmospheric CO2 concentration we assumed a value of 382 ppm (taken from the Mona Lau record) which is representative for the period in which the experiments were conducted. Since water was non-limiting during the growing season, missing precipitation did not affect simulations. For simulation of heat sterility and transpirational cooling we needed to know the actual vapour pressure. We assumed dew point temperatures (T dew ) equal to minimum air temperatures, which results in RH max being 100% early in the morning. This is a fair assumption for rice cultivated in continuously flooded paddy fields. The resulting annual pattern of RH min (Fig. 2, bottom) is similar to that reported in Dingkuhn et al. [60] for the same environment. No soil data were used for the simulations, as these are also not needed for simulating potential production. The problems with weather data are a common problem in much of the experimental research ongoing in developing countries. Good weather stations are expensive and replacement can take a long time. Weather logs may not be sent automatically to a central database and as a result malfunctioning may not be immediately detected. The 2007 missing data for Fanaye in combination with reported growing periods will have an effect on 7 out of 30 sowing dates. We compared accuracies for the first 8 and last 7 dates at Fanaye. Accuracies were not structurally different for the two periods.

Accuracy assessment and comparison
Ideally the proposed modifications in heat and cold sterility should translate into improved yield predictions. However if other components of the model (phenology, leaf growth, spikelet formation) are also inaccurate then effects of modifications in heat and cold sterility may remain obscured. For this reason we validated the model in two steps: • Step 1: simulation of yields with observed phenology and observed spikelet number Forcing phenology and spikelet number to observed values allows us to assess accuracy improvement obtained with the different heat and cold sterility models, without results being affected too much by other possible errors. It is useful for this purpose, but in the end, one wants to know how accurate the model is if all processes are actually simulated. Therefore in step 2, we put the model to a more rigorous test by simulating all processes and with a minimum of additional calibration. Within step 1 and 2, a series of model subversions (s1 to s26) were constructed. Each subversion was used to simulate yields for each of the n = 29 treatments. By changing one process at a time in the different subversions we could clearly retrace causes of increases in accuracy. For each subversion, goodness of fit was calculated as modelling efficiency [64]: Where S i is simulated Yield in treatment i, O i is observed Yield in treatment i and Ō is the average of observed yields. A value of 1 for EF indicates perfect prediction. A value of 0 means the model predicts no better than when we would simply take the mean of observations. Negative values suggest that the average of observed values is a better predictor than the model. One way of looking at the data is through metrics such as EF. Another way of looking at the data is through visual comparison. It is possible that the model simulates the general pattern well, but is shifted away from observed data. For example the model may be systematically a bit too high because simulated potential yields are not so easily practically achieved, but may properly simulate the direction in which yields change with different sowing dates. Or yields may dramatically drop within a short timespan, making the model very sensitive to relatively small errors in phenology simulations and observations. In such cases EF can be less than 1 but visually we might still judge the model as having a good fit. We therefore also present graphics of simulated and observed yields at different sowing dates. Fig. 2 shows the temperatures and humidity in the two sites through the year. Weather is more extreme in Fanaye. Maximum temperatures in Fanaye are consistently higher but especially so during days 75 to 200. From day 320 to 365, the minimum temperatures in Fanaye are lower than in Ndiaye. Relative humidity is lower in Fanaye. Such patterns are consistent with the more inland position of the Fanaye site. More details on weather data and site can be found in de Vries et al. [59,65] and in Dingkuhn et al. [17,18,60] for earlier years in the same two sites.

With observed phenology and spikelet number
Yields were very poorly simulated when using the default heat and cold subroutines, resulting in a modelling efficiency of −0.32 (Table 1, s1), which means that predictions were worse than would be obtained by simply taking the average of observed yields. Modelling efficiency increased to 0.70 (Table 1, s5) with new heat and cold subroutines. Yields versus emergence dates with the old and new sterility models are shown in Fig. 3. The main cause of poor prediction with the default model appears to be the prediction of near zero yields on several emergence dates in Fanaye when observed yields were 4 to 11 t/ha. As a result, EF values for Fanaye are markedly lower than for Ndiaye (Table 1, s1). Fig. 4 shows in the top panes simulated old fertility and in the bottom panes new fertility. From these we can see that the large errors in Fanaye are due to gross overestimation of heat sterility. Fig. 4 also reveals large differences in simulated cold sterility. According to the default model, cold fertility is always 1, while the new cold fertility model predicts severe cold sterility for sowings in the months September (9) and October (10). Replacing the default with the new cold sterility submodel increased overall accuracy from EF 0.43 to 0.70 (s3 to s5) and increased accuracy in both sites (Fanaye: EF 0.25 !0.62, Ndiaye EF 0.55 !0.76).
The minimum rule for SFCOLD (Eq 28, s5, EF = 0.70) gave more accurate simulations than the averaging rule (Eq 27, s4, EF = 0.56). The minimum rule for SPFERT gave slightly more accurate results than the product rule (s5 vs s6). We therefore proceeded with the minimum rule  (Table 1). Additional simulations with different settings for leaf growth (not shown) had little to no effect on accuracy of yield simulations. This is somewhat surprising as we would expect assimilate supply during grain filling to be strongly affected by maximum LAI at flowering and so we would expect stronger effects on accuracy of yield simulation. The only explanation for the absence of sensitivity to pre-flowering leaf growth is that in almost all simulations the crop was sink limited. Indeed this is what we found when we compared simulated WRR and PWRR values. The comparison with ORYZA_S (s7) shows that the new model (s5) is slightly more accurate (EF 0.70 vs 0.63), in both sites. The small difference in accuracy should come as no surprise considering the shared origins of the both models and the fact that their heat and cold sterility sub-models were, at least partially, calibrated for the Sahelian environment. Although the accuracies are similar, we may hope the new ORYZA2000 subversion is more accurate than ORYZA_S when applied in more humid environments. Because the new version explicitly accounts for transpirational cooling, which will be less in humid environments, whereas ORYZA_S does not account for transpirational cooling.  With simulated phenology and spikelet number Forcing the model with observed phenology and spikelet number allowed us to inspect heat and cold sterility without being troubled by errors in biomass simulation. Now we put the model to a more severe test, by also simulating phenology (Fig. 5) and spikelet number. For six  sowing dates the observed yields were much lower than expected, shown as the encircled data points in Fig. 6. We suspect that low observed yields on these dates were due to birds or other animals. Bird damage at similar dates on the same two sites was reported before by Dingkuhn and Sow [17,18] and is a known problem in the region [66]. Tables 2 and 3 show that EF values were much higher with these six outliers removed. Also striking was the fact that observed yields in Ndiaye were systematically lower than in Fanaye, while our simulations showed highest yields in Ndiaye. Previous simulations by [17,18] and observations in the same two sites [67] showed yields for Ndiaye consistently higher than yields in Fanaye. This suggests management or soil conditions were less optimal in our Ndiaye experiment, despite all efforts to avoid this. In any case, the cause of the relatively low observed yields in Ndiaye is unclear and so the issue cannot be resolved through modelling. Considering the differences between the two sites we chose to report accuracies also separately for the two sites. Given the strong evidence of improvements on the heat and cold sterility we proceeded with the new heat and cold subroutines. First, we assessed how well the different model subversions simulated phenology. Fig. 5 shows observed and simulated number of days from emergence to flowering, simulated with ORYZA2000 default (top) and optimised (middle) cardinal temperatures and with the RIDEV model (bottom). Fig. 2 shows that the Fanaye environment is systematically hotter than Ndiaye and that crops emerging in the period from September to December will be exposed to the cool weather in the months November to March. The default cardinal temperatures overpredicted the duration from emergence to flowering in the hotter environment of Fanaye (Fig. 5 top right) and underpredicted the duration in the cooler Ndiaye (Fig. 5 top left), especially in the coolest period (emergence months 9 to 12). In an earlier paper with the same phenology and weather data van Oort et al. [22] plotted observed and simulated duration (y-axis) against average temperature during the pre-flowering phase (x-axis). These plots showed that with average temperature during the pre-flowering phase around 24°C, the default cardinal temperatures systematically underpredicted the duration by *20 days [22: Fig. 3A]. With average temperature during the pre-flowering phase around 31°C, the default cardinal temperatures predicted systematically a *20 days too long duration [22: Fig. 3A]. The RIDEV model (Fig. 4C) reproduces as similar pattern as the ORYZA2000 phenology sub model with optimised cardinal temperatures. Note that the parameters for RIDEV were determined in the early 1990s for the same variety and sites. The validation presented here shows that this model for this variety RIDEV still provides reasonable estimates of duration from emergence to flowering. We proceeded with the most accurate out of three phenology models, in this case the ORYZA2000 phenology sub model with optimised cardinal temperatures. Yield prediction was greatly improved when phenology was simulated with optimised instead of default cardinal temperatures ( Table 2, s8 vs s9, EF 0.14 vs 0.41).
Next we compared, stepwise, the impact of leaf growth simulation methods on accuracy of yield simulation ( Table 2, s9-s16). New (higher) cardinal temperatures for early leaf growth improved yield prediction, with overall EF increased from 0.41 to 0.48 and an increase in EF was noted in both sites. Temperature dependent partitioning and shading to kill leaves had no effect on accuracy of yield prediction. Adding these two components did lower maximum LAI to more realistic values (s10 vs s13,s14,s16). The maximum attainable modelling efficiency with revised leaf growth was 0.48 (s10, s14 and s16).
The ORYZA_S model is in terms of process functions most similar to the s9 subversion of ORYZA2000. Yet predictions with this model ( Table 2, s17) were less accurate than with the newly developed subversions (s9 to s16). Since ORYZA_S differs in a number of ways from the other models it was not possible to exactly pinpoint the causes of the lower accuracy. However a visual comparison of yields over time (not shown) indicated that the main cause of the poorer accuracy was due to less accurate modelling of cold sterility.
Next we compared different methods of spikelet growth simulation (Table 3, s18-s26). Spikelet growth depends on biomass accumulation and hence on leaf area during the period from panicle initiation to flowering. Therefore we also assessed interaction with leaf growth modelling. Consistently the Yoshida et al. model led to poorer predictions than the default (GCR based) model. The Kato and Katsura model gave similar or slightly more accurate  Fig. 6 shows simulated and observed yields with the most accurately simulating model subversion.

Main findings
Our research integrates insights from recent experimental work into an improved subversion of the ORYZA2000 model. The main findings are: 1. The default model overestimates heat sterility and underestimates cold sterility in arid environments.
2. Yield prediction was greatly improved with the new heat and cold subroutines.
3. The key mechanisms of rice to avoid heat sterility are transpirational cooling and early morning flowering. These adaptive mechanisms allow rice to be grown in arid environments with extreme heat (provided that there are enough water and nutrients).
4. The default model overpredicted the length of growing seasons at high temperatures and underpredicted the length of growing seasons at lower temperatures, for the variety considered here (IR64).
5. Use of different cardinal temperatures, in particular a higher base temperature and higher maximum temperature, improved phenology simulation and this in turn lead to improved yield simulation.
6. Although LAI development could not be compared with observed values, the yield simulations were improved when for early leaf growth higher cardinal temperatures than default were used 7. New methods for simulation of spikelet formation led to little or no increase in accuracy of yield simulation.
8. The adapted subversion of ORYZA2000 was more accurate than a previously developed model for rice in the same environments (ORYZA_S).
The model was validated with a relatively large dataset in an arid environment and a wide range of temperatures. Additional validations in other environments are still needed. We expect that with the incorporation of environment dependent transpirational cooling and early morning flowering and with cold sterility simulated with minimum rather than average temperature, the adapted subversion of ORYZA2000 presented in this paper will be more robust than its predecessors which did not account for these processes.

Interpretation of the results
A model is always a simplification of reality. Perfect prediction is suspect, may be caused by over parameterisation on a limited dataset and runs a risk of adjusting parameter values without sound ecophysiological justification. We have tried to avoid this by using a large dataset, by making only modifications substantiated by solid experimental research and by keeping calibration to a minimum. Only phenological parameters were calibrated. In our experiments we were confronted with missing data and practical difficulties in realising potential production. It is interesting to see that despite these uncertainties and with a minimum of calibration, the adapted model subversions predicted yields reasonably well. The systematic stepwise incorporation of new model components allowed for assessing the effect of different individual and combined changes in the model on accuracy of yield simulation. Some modifications did contribute to improved yield simulation (heat and cold sterility; cardinal temperatures for phenology; cardinal temperatures for early leaf growth). Other modifications did not contribute to improved yield simulation (temperature dependent assimilate partitioning to leaves; shading effects on leaf death; spikelet formation).
This study is not a new validation of individual processes (phenology, early leaf growth, assimilate partitioning, leaf senescence, spikelet formation, heat sterility, cold sterility). For phenology, calibration by van Oort et al. [22] was on the same dataset as used in this paper. For the other processes: we did not measure leaf area during growth, biomass growth and nitrogen from PI to flowering and pollination, so we could not directly test the model components evaluated here. What we did was asking the question "suppose that this is a better description of reality for a processes at level n-1, then does this help us better predict yield? (an emerging property at level n)". If so, it adds credibility, but not proof, to the new process description. The results call for further research into the model components addressed in this paper. For example, we propose dedicated experiments on LAI growth at different sowing dates and in different environments to estimate the early growth parameters (TBD, TOD, TMD, RGRL) instead of simulating them as we did in this paper by assuming the same cardinal temperatures as obtained for phenology and using the default value for RGRL. The fact that modifications such as shading to kill leaves and new methods of spikelet formation did not lead to increased accuracy in yield simulation does not invalidate these processes. Possibly if fed with observed crop nitrogen contents in the current study, or in other environments for example with less nitrogen, the Yoshida et al. [27] and Kato and Katsura [28] models would lead to better predictions than the default model. Such hypotheses can be tested more easily now that these sub-models have been incorporated in a larger model. In summary, the modelling work presented here has generated new hypotheses that can be subsequently tested experimentally. This paper has shown the relevance of testing these hypotheses for increased accuracy of yield prediction.
As we noted in the introduction, the default ORYZA2000 model has been calibrated and validated in many environments before, with high accuracies. How is this possible considering the poor performance reported here? There are a number of underlying reasons why issues raised in this paper were not previously detected. Firstly, the arid environment considered here is quite uncommon in Asian countries where ORYZA2000 has mostly been used. As we discussed (and shown before in [17,18]) the arid climate creates some typical conditions affecting heat sterility (strong transpirational cooling) and cold sterility (dangerously low night temperature, acceptable average temperature, due to large diurnal temperature amplitude). The model has rarely before been tested in environments as extreme as considered in this paper and it is only in these extreme environments that errors in the phenology and sterility models become apparent. Secondly the sowing dates considered here were far off from the optimum, exposing the crop to severe cold sterility. Normally in agronomic experiments one would never sow in some of the sowing dates considered in this paper. Then normally also the failures of the model under such cold conditions would not be discovered. They would only be detected in experiments dedicated to impose cold stress. Thirdly, the conditions under which extremely high LAI were simulated are exceptional: low temperatures during the vegetative stage leading to a long duration from emergence to flowering (up to 130-140 days) combined with high radiation levels and high daytime temperatures during this period allowing for high leaf biomass accumulation. Fourthly, acknowledging that any model is a simplification of reality Bouman et al. [8: p. 196] identified those parts of the model which they thought to be less generic, requiring additional calibration specifically for the variety and environment at hand. This list amongst others includes the development rates and the relative leaf growth rate. Possibly the issue of the validity of the cardinal temperatures was never detected because people were resolving the issue through calibration of the rates. Especially when calibrating separately for different environments and varieties, or when calibrating for a limited number of experiments, errors in cardinal temperatures may remain undetected. The interesting question is whether more generic, variety specific development rates and relative leaf growth rates can be found once we recalculate them with more appropriate cardinal temperatures. It seems unlikely that there would exist one unique set of cardinal temperatures valid for all rice varieties; genetic variation is simply too large for this [22,43,44]. What we hypothesise is that once a set of cardinal temperatures has been appropriately estimated for a particular variety, recalibration of development rates and relative leaf growth will not be necessary every time again when the model is applied in new environments. So that parameters become truly genetic parameters, not needing to be recalibrated when applied in different environments. Large datasets for the same varieties in contrasting environments are needed to test the hypothesis that more generically valid crop parameters exist.

Implications
In the previous section we highlighted how modelling exercises such as presented in this paper can be used to increase insights in a particular system and to narrow knowledge gaps. A second more common role for modelling is in decision support. Models can be used in this role if they are sufficiently accurate. In ideotyping, genetic parameters are changed and it is assessed whether these lead to desirable changes such as crop yield [68], in some cases also tailored to specific environments [69]. Models can also be of use for investigating different management options, such as irrigation scheduling and amount, fertilisation scheduling and amount, effects of bund height, etc [5,11,12,13,14,70]. Models can be used to simulate if it is more economical to growth one or two crops per year and for deriving the optimum sowing dates for different sites [17,18]. At the farm level, models can be used to investigate economically optimal water allocation strategies, dividing available water among different crops [21]. Models can be used for climate change impact studies and can in this role help shape the research agenda into adaptation strategies [1,2]. All such model applications are contingent on the accuracy of the model. The most thorough and wide spanning climate change impact assessment in rice was by Matthews et al. [1,2] for Asia. In different parts of Asia negative effects of climate change were predicted with the ORYZA2000 and SIMRIW models which used the same heat sterility subroutines. Analyses of yield declines identified heat sterility as the main cause of yield declines in Southern Japan, India, South Korea and China. Other studies [24,26,71] report temperature thresholds similar to those used in the default ORYZA2000 model. Such temperature thresholds have recently been used in global climate change assessments [3,4]. The recent studies on rice adaptive mechanisms to heat stress [19,20,29,30,32] and the findings of this study call for a re-assessment of climate risks for rice production, in the current and future climate. It is known that genotypic differences exist in cardinal temperatures, heat tolerance and heat avoidance. Selection for appropriate varieties adapted to their climate is therefore possible.

Conclusions
ORYZA2000 could not well simulate yields in two arid environments. New research on several model components was systematically and stepwise incorporated into the existing model and effects on increased accuracy were investigated. The most important finding is that in arid environments the model overestimates heat sterility and underestimates cold sterility. The cause for overestimation of heat sterility was that the default model ignored early morning flowering and transpirational cooling. The cause for underestimating cold sterility was that the default model calculated with daily average rather than daily minimum temperatures. Other model improvements were on cardinal temperatures for phenology [22] and cardinal temperatures for early leaf growth. Alternative models for simulation of spikelet formation did not lead to improved yield prediction.
Supporting Information S1 Table. List of variables and parameters. (DOCX)