The relationship between prenatal heat exposure and birth outcomes: How much does the heat metric matter?

Mary-Alice Doyle; Bernard Leckning

doi:10.1371/journal.pone.0330498

Abstract

The impact of prenatal heat exposure on birth outcomes is well-established, but what is it about heat drives this relationship? Is it exposure to extreme temperatures, to moderate heat, or the confluence of heat and humidity? Despite the large body of research on heat exposure and birth outcomes, the literature lacks consistent measurement. This means we cannot extract practical recommendations around which heat conditions pose the greatest risk, and hence should be avoided during pregnancy. It also means we cannot predict the implications of climate change on neonatal health and healthcare needs at a population level. This paper has two goals: first, to demonstrate that our conclusions around the existence and magnitude of the impact of heat exposure vary dramatically with the choice of heat exposure metric, and second, to make general recommendations for how heat exposure should be measured in future. We present analysis from Australia’s Northern Territory — a region spanning tropical and arid climates. We compare commonly used heat exposure metrics, alongside additional metrics supported by theory. We find that a metric based on ‘bands’ of exposure and incorporating daily minimum as well as maximum measures provides the best fit; this is consistent with our theoretical understanding that both moderate and extreme heat affect fetal development in different ways. Estimates based on our preferred heat metric suggest that the impact of prenatal heat exposure on preterm birth is orders of magnitude larger than what would be implied by some metrics commonly used in the literature. Our findings underscore the importance of getting the measure of heat right, particularly in tropical climates.

Citation: Doyle M-A, Leckning B (2025) The relationship between prenatal heat exposure and birth outcomes: How much does the heat metric matter? PLoS One 20(9): e0330498. https://doi.org/10.1371/journal.pone.0330498

Editor: Xinde James Ji, UF: University of Florida, UNITED STATES OF AMERICA

Received: April 11, 2025; Accepted: August 1, 2025; Published: September 3, 2025

Copyright: © 2025 Doyle, Leckning. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The study used de-identified linked health data which is hosted on a secure cloud-based platform with restricted access. Access is restricted to Human Research Ethics Committee approved investigators and projects as per conditions set by data custodians to protect sensitive information. Applications for data access should be directed to the CYDRP Research Program Leader, Bernard Leckning: bernard.leckning@menzies.edu.au. Queries regarding ethics approval should be directed to ethics@menzies.edu.au.

Funding: This research was funded by a London School of Economics PhD Studentship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

There is a wealth of evidence that ambient heat exposure adversely affects human health [1,2]. The impact of prenatal heat exposure on a baby’s health at birth is particularly well-studied. Chersich et al.’s [3] systematic review finds that heat exposure is associated with a higher risk of preterm birth, lower average birthweight, and a higher risk of stillbirth. These effects may have long-term consequences: prenatal heat exposure has been linked to lower levels of physical health, mental health, education and earnings in adulthood [4–7].

But what exactly is the nature of the relationship between heat and newborns’ health? Is fetal development damaged by short-lived exposure to extreme temperatures? Or should we be more concerned about prolonged exposure to moderately high temperatures? Or is it instead the interaction of heat with humidity that poses the greatest risk?

We are unable to answer these questions because, to date, research on this topic has lacked consistency in the way that heat exposure is defined and measured. But answers to these questions are important: in order to anticipate the likely impact of climate change on healthcare needs, and mitigate the impacts of heat exposure on population health, it is important to understand what level and duration of heat exposure leads to poorer health at birth.

In this paper, we analyse how much the way we measure heat exposure matters for our conclusions. To do this, we first set out options for measuring heat exposure, including the three measures of heat exposure that are most common in the literature, alongside two additional measures motivated by our conceptual framework: we call these our five ‘heat metrics’. We estimate regressions with each of the five metrics, and combinations of multiple metrics together. We identify a preferred metric in our context, based on standard measures of goodness-of-fit. We then demonstrate how reliance on non-preferred metrics would affect our conclusions about the impact of prenatal heat exposure on birth outcomes.

In our analysis, we use data from the Northern Territory of Australia – a region spanning tropical and arid climates. The fact that we use data from these climate zones is important, because around half of the world’s population lives in climates like these [8,9], yet most empirical research on this topic uses data from cooler, less humid climates (e.g., in Chersich et al.’s [3] systematic review, around one-third of studies relate to subtropical and semi-arid climates, but with just one from an arid climate and none from equatorial/tropical climate zones). Using data from the Northern Territory means we can analyse exposure to very high temperatures (e.g., above 35 degrees Celsius), and exposure to hot, humid conditions – which are either not experienced at all in other climates, or are experienced so rarely that their effects cannot be reliably estimated.

There is good reason to expect that the choice of heat metric is particularly important in tropical climates. In temperate climate zones, which cover much of the USA, Europe, and East Asia, correlations between various measures of heat exposure are high, meaning that one measure of heat exposure (e.g., maximum daily temperature) is a good proxy for others (e.g., minimum daily temperature, or wet bulb temperature). Fig 1 demonstrates this, showing the correlations between daily minimum and daily maximum temperatures in a handful of major cities – the correlations are high in cities with temperate climates.

Download:

Fig 1. Scatter plot of daily maximum and minimum temperature, 2020–2023.

Source: NASA reanalysis data.

https://doi.org/10.1371/journal.pone.0330498.g001

However, in tropical and subtropical climates, this is not the case. For instance, in Lagos, Mexico City, Mumbai, Jakarta, and Darwin (the capital city of the Northern Territory), Fig 1 shows that there are relatively low correlations between maximum and minimum daily temperature. The same is true for the correlation between air temperature and wet bulb temperature (see S1 Fig). Correlations are substantially higher in the Northern Territory’s arid climate zones, but still lower than in temperate zones shown in Fig 2 (see S2 Fig). In tropical climates, one measure is not a good proxy for all others, and therefore we must carefully consider which aspect of heat exposure we want to measure.

Download:

Fig 2. Summary of mechanisms through which heat stress and heat strain may affect birth outcomes.

Source: Authors’ analysis.

https://doi.org/10.1371/journal.pone.0330498.g002

This paper makes two main contributions to the knowledge base on the link between prenatal heat exposure and birth outcomes.

First, we quantify how much the way we measure heat exposure matters. Given the large body of research that finds an impact of heat exposure on birth outcomes, we are confident that this causal relationship exists. But every recent review on this topic has highlighted a lack of consistency in choice of heat metric [3,10–14]. The problem with this inconsistency is that when different metrics are applied in different populations, we do not know whether it is the population, the metric, or something else entirely, that explains differences in findings. Our analysis compares alternative heat metrics within a single population. We find that both the existence and size of the estimated relationship between heat exposure and birth outcomes can vary depending on the metric chosen. For example, we find that a metric based on average maximum daily temperatures alone – as is common in the literature – captures less than half of the impact that our preferred metric estimates.

Second, we analyse which measurement choices matter most. Recently, some researchers have questioned the use of air temperature as a default in this literature [15–17]. Their reasoning is that metrics which measure both air temperature and humidity together (e.g., wet bulb temperatures and heat indices), will more accurately reflect people’s experiences of heat than air temperature alone. However, we find that wet bulb temperatures have limited explanatory power in the population we study, and instead, other metrics – e.g., including both maximum and minimum daily temperatures – provider better explanatory power. Unlike wet bulb and heat indices, these measures are also more readily available, meaning there is little barrier to their use. Our discussion reflects on these and other findings, setting out some recommendations for researchers analysing the impact of prenatal heat exposure.

This paper also provides new evidence on how heat exposure affects birth outcomes in the Northern Territory of Australia. We estimate that typical seasonal variation in heat exposure contributes to a 4.5 percentage point higher risk of preterm birth at some times of year. This is large – for example, close in magnitude to the risks of frequent smoking during pregnancy [18].

In our analysis we focus on preterm birth, as this is the birth outcome most commonly studied in previous research. Preterm birth is associated with poorer health and developmental outcomes later in childhood [19] and leads directly to an increase in healthcare costs by triggering admission to neonatal intensive care units [20]. However, as we will discuss in our conceptual framework, heat exposure may affect fetal development in a range of ways, most of which do not necessarily lead to preterm birth. It is therefore important to note that we find impacts of heat exposure that are consistent across five different measures of health at birth; this confirms that heat exposure affects fetal development and hence health at birth in general, the effect is not isolated to a single outcome.

The rest of this paper proceeds as follows. Section 2 sets out a conceptual framework, outlining the possible causal pathways that may explain the impacts of heat exposure on prenatal development, and how they may be measured. Section 3 provides details on the Northern Territory, the administrative data we use, the heat exposure metrics we consider, and our estimation methods. Section 4 presents our data analysis, in which we identify our preferred heat metric and discuss conclusions we may draw if we instead used alternative heat metrics. Section 5 discusses practical implications of using non-preferred heat metrics, and recommendations for future analysis, before turning to our conclusion in Section 6.

2. Conceptual framework

2.1. What is heat exposure?

Heat exposure it often discussed interchangeably with air temperature – that is, the temperature that can be measured with a standard thermometer. However, while air temperature is a major contributor to the level of heat that a person experiences, there are additional contributors. As McGregor and Vanos [21] explain in their primer on the physiological impacts of heat on the human body, these include other weather conditions (humidity, windspeed, radiation), individual factors (levels of exertion, pre-existing medical conditions, and medications), and the built environment.

Many researchers argue that humidity is particularly important [15,17,22]. This is because one way that the body cools itself is through having sweat evaporated from the skin, and sweat evaporates more slowly when it is humid. Therefore, for a given temperature, higher humidity increases the body’s risk of overheating.

In deciding how to measure heat exposure, we must understand which aspects – such as air temperatures, humidity or other contributors – matter for prenatal development. The difficulty here is that there are multiple mechanisms through which heat exposure during pregnancy can affect the mother and developing fetus [23], and hence lead to poorer birth outcomes. We do not yet know from physiological science which of these mechanisms is most important [24]. In the remainder of this section, we explain these mechanisms (summarised in Fig 2), before returning to the question of which measures may best capture these effects.

2.2. How does heat exposure affect the fetus?

We set out three channels through which heat exposure may affect prenatal development: heat stress, heat strain and maternal behaviour.

It is important to note that while these mechanisms can increase the risk of adverse birth outcomes like preterm birth – which is the focus of this study – they will not necessarily do so. They may affect fetal development in ways that impact a child’s health and development in infancy and/or later in life, but which do not necessarily lead to preterm birth.

Heat stress.

Research on the impact of heat exposure on health tends to focus on heat stress; that is, the acute health conditions that can result from an increase in the core body temperature [21]. These effects include heat stroke, cardiovascular stress, respiratory stress, and acute kidney failure [2,25]. Heat stress typically occurs when air temperatures are around or above the average body temperature (37 degrees C). However, it can occur at lower air temperatures [2], for instance due to high humidity or high levels of exertion.

We know from both animal studies and epidemiological evidence that maternal heat stress during pregnancy can cause birth defects [24,26]. This may happen either through the fetus itself overheating, or indirectly through the heat stress-induced condition that the mother is facing. Some suggestive evidence also indicates that heat stress early in pregnancy may affect the development of the placenta, increasing the likelihood of pre-eclampsia [27], and that it may affect neural tube formation, leading to neural tube defects [28,29].

Heat strain.

Even when conditions are not hot enough to cause heat stress, heat may still impact fetal development because it causes ‘heat strain’ for the mother [21]. Heat strain involves the normal workings of the thermoregulatory system to cool the body, and hence avoid heat stress: that is, by sweating and by redirecting blood flow from the core of the body to the skin to dissipate heat [30].

If heat strain is experienced for a prolonged period, it may affect the fetus. This can happen because of sweat-induced dehydration. It can also happen because when the mother’s blood flow is redirected to her skin, this means reduced blood flow to the placenta, and hence reduced flow of nutrients to the fetus [24]. Both channels have been evidenced in experimental animal studies – where researchers have been able to control heat exposure and fluid intake in a way that is not possible in epidemiological research. These studies show long-term ‘fetal programming’ effects on offspring health [31,32].

These potential fetal programming effects explain how the mother’s natural responses to heat strain may reduce nutrient flow to the fetus – if this continues for a long period, it can affect the fetus’ development. This could happen at any time during pregnancy. In addition, epidemiological evidence suggests that exposure to moderate heat, if experienced towards the end of pregnancy, can bring forward labour leading to increased rate of preterm birth [33].

Maternal behaviour.

Beyond physiological effects, prolonged heat exposure can also affect maternal behaviour, with flow-on effects to the fetus. Heat exposure reduces metabolism and appetite [30], which may mean changed food and drink consumption patterns [34]. Heat may also reduce quality and length of sleep [35] and lead to changes in exercise patterns [24].

2.3. Implications for heat exposure measurement

Given the mechanisms outlined above, we draw three takeaways for how we might construct heat exposure metrics.

First, it is not obvious which underlying measure of heat we should use. Within the literature, heat exposure is almost universally measured based on air temperature. But as described above, a person’s experience of heat is a result of a range of factors, including temperature, humidity, wind and radiation. Therefore, in theory, a data series incorporating these variables may provide a more accurate picture of true heat exposure. One option is wet bulb temperature, which can be observed directly, and reflects air temperature and humidity – other options include Apparent Temperature, Heat Index, Universal Thermal Climate Index, and Wet Bulb Global Temperature [15]. However the reason that this is not an obvious choice is that, empirically, we do not see a consistent link between high humidity and poorer health outcomes. As Baldwin et al. [16] suggest, there may be a range of explanations for this, including the fact that a build-up of humidity when it is not raining has different effects than high humidity while raining.

Second, that the effects of extreme heat – that is, heat at levels around or above normal core body temperatures – may be different from the effects of moderate heat, because extreme heat is more likely to cause heat stress. This means estimates of the impact of heat exposure based on average temperatures over a given period are likely mis-specified, as an average will fail to distinguish between conditions likely to lead to heat strain vs heat stress.

Third, that even moderate levels of heat, when experienced for a prolonged period, may affect fetal development by causing heat strain or changes in maternal behaviour. If this is the case, then it may be important to choose metrics that reflect prolonged heat exposure – such as high daily minimums, reflecting at least 24 hours of high temperatures – in addition to the metrics used in the overwhelming majority of studies which measure peak levels of heat exposure (i.e., high daily maximums). This may be particularly important in tropical climates where daily maximums and minimums are not strongly correlated with one another (as shown in Fig 1).

In our analysis, we estimate the impact of heat exposure on preterm birth, using a range of different heat metrics, to test which fits the data best. In line with this conceptual framework, we test metrics that include both air temperature and wet bulb temperature, metrics based on daily maximums and minimums, and metrics allowing for non-linear effects of heat exposure. We compare these metrics with those most commonly used in prior research (described below). However, our conceptual framework also points out a potential moderator between heat exposure and birth outcomes that we are unable to measure in our data – maternal behaviour. While maternal behavioural responses to heat exposure do not feature in our data analysis, behaviour is likely a key factor, and may help to explain differences in the magnitudes and timing of effects (in terms of whether exposure has different effects depending on the timing of exposure during pregnancy) across different populations and contexts.

3. Materials and methods

3.1. Study context

The Northern Territory (NT) is one of Australia’s eight states and territories. The NT is a large region covering the central part of northern Australia, in which residents face high levels of heat exposure. The NT has a population of around 233,000, 60 percent of whom live in or around the capital city of Darwin [36]. Previous research shows that heat exposure plays a major role in explaining month-to-month variation in average birth outcomes in the NT [37].

The tropical north of the NT, where Darwin is located, is hot and humid. Temperatures vary within season but, on an average day in the wet season (November to April), temperatures range between 25–33 degrees Celsius, and in the dry season (May to October), between 20–30 degrees. The hottest time of year is October-December, in the ‘build up’ to the wet season – when temperatures and humidity are high, and there is little rainfall. Heavy rainfall usually begins in late December, though the timing can vary from year to year.

The central and southern parts of the NT have an arid climate, with very hot summers and mild or cold winters. In Alice Springs, the largest town in the region, temperatures on an average summer day range between 20 and 35 degrees, and on an average winter day, between 4 and 20 degrees.

Around one-third of the NT population identify as Aboriginal. There are significant differences between the Aboriginal and non-Aboriginal populations in the NT, in terms of geography, heat exposure and economic resources. Eighty percent of Aboriginal residents in the NT live outside of Darwin, many in remote Aboriginal communities which experience more extreme weather conditions. In remote communities, many houses are poorly insulated, and many residents face energy poverty [38].

3.2. Data

We combine administrative birth records with daily weather data.

Our analysis sample includes 95 percent of all babies who were born in the NT and conceived between March 2000 and September 2009, a sample of 34,258 children. There were 35,899 babies born during this period. Our sample excludes 1,020 births for which the mother’s place of residence could not be geocoded and therefore could not be matched to weather data, and 621 births for which some covariates were missing. We define our sample based on date of conception instead of date of birth because definitions based on date of birth will systematically exclude children born preterm at the beginning of the sample period, and exclude those born late term at the end of the sample period – this could lead to bias, especially when analysing the impact of seasonal exposures like heat [39]. We determine date of conception by subtracting gestational age (in weeks) from birthdate.

We include only births to mothers whose usual place of residence is in the NT, and could be geo-coded. There are a small number of births for which the mother’s place of residence as entered in the perinatal data could not be found, either using a fuzzy match with the R package ‘geonames’, or through manual search on Google, the NT Place Names Register (https://www.ntlis.nt.gov.au/placenames/) and BushTel (https://bushtel.nt.gov.au/). We include stillbirths (making up under 1% of births) and plural births (under 2.5% of births) in the analysis sample.

The birth record data were extracted from administrative records and prepared for analysis between 3 July and 31 August 2017. Data analysis for research purposes for this paper was conducted between 26 June 2023 and 19 December 2024 using Stata 17. Analysis data are de-identified, and the authors do not have access to information that could identify individual participants during or after the data were collected. Approval for the research has been provided by the Northern Territory Department of Health and Menzies School of Health Research Human Research Ethics Committee (Ref: 18–3261). The data linkage project’s First Nations Advisory Group have independently reviewed and endorsed the research to ensure it is respectful of Aboriginal perspectives.

We link these records to NASA’s daily weather reanalysis data, based on the mother’s place of usual residence at the time of the birth. We do not have data on place of residence throughout pregnancy, therefore we assume that the place of residence at birth corresponds with the place the mother spent most of her time during pregnancy. The weather data are available at intervals of 0.5 x 0.625 degrees of latitude and longitude (roughly 50x55km). This gives a total of just over 500 cells throughout the NT, 166 of which have births within our analysis period – an average of 203 births per cell. We link data based on date of conception, such that the first trimester is the first 12 weeks from the conception date, the second trimester is the following 14 weeks, and the third trimester is the following 13 weeks. Definitions for timing and length of trimesters vary across contexts, but these are the definitions commonly used in Australia. We analyse 39 weeks of gestation, as this is the average gestational length in our population.

The weather reanalysis data comes from NASA’s model using ground station and satellite data. Some may be concerned about accuracy of these data compared with traditional weather station observations. However, in our context, the NASA data lines up very closely with observations from the Australian Bureau of Meteorology’s weather stations (see S1 Appendix for details). We use the reanalysis data because they are highly localised and have no missing observations – which is not the case for weather stations across remote parts of the NT. For instance, over this period some weather stations are decommissioned, and others have systematic missing observations (i.e., in the wet season when they become inaccessible due to flooding and therefore observations are systematically missing during periods of extreme weather).

3.3. Heat exposure metrics

We construct five different heat exposure metrics. The first three are based on common metrics used in prior research, while the next two build on the conceptual framework outlined above. Our metrics are:

Benchmark.

This is a count of the number of days in each trimester with maximum daily temperatures within 5-degree ranges (under 20, 20–24.99, 25–29.99, 30–34.99, 35–39.99 and 40+). In our analysis, the omitted category is 25–29.99 degrees. This metric allows for a flexible functional form, and it is the approach that Dell et al. [40] recommend in cases when researchers are agnostic about how the heat metric should be specified. It has been widely used [4,5,33,41–43].

We take this as our ‘benchmark’ metric, because, out of the metrics that are most commonly used in the literature, this best allows for the non-linearities that we expect to see, based on our conceptual framework.

Average temperatures.

This is a simple average of maximum daily temperatures in each trimester in pregnancy. This metric is also widely used [44–49]. It assumes linear effects of each additional degree, and hence does not allow for different effects for higher temperatures most likely to cause heat stress.

Heatwave count.

This is a count of the number of heatwaves in each trimester of pregnancy. We calculate this based on the Excess Heat Factor, which measures the extent to which daily air temperatures are unusually high for a given location and time of year [50]. It is calculated by combining a) average daily temperatures in the previous 3 days, compared with the past 30 days, and b) average daily temperatures in the previous 3 days compared with the long-term location average. A heatwave is defined as three or more consecutive days with a positive Excess Heat Factor. Full details of how this is constructed are available in Nairn and Fawcett [50]. We use this measure as it is the one used by the Australian Bureau of Meteorology.

Counts of extreme heat events (whether defined as heatwaves or not) are common in the literature [51–55]. Such a metric allows for nonlinearities in the effects of heat exposure, but imposes a specific structure on that nonlinearity; it assumes a threshold over which effects occur, and that there is no impact of exposure to more moderate heat conditions below that threshold.

Max and min.

This is an enhanced version of the benchmark metric, where, in addition to maximum temperatures (under 20, 20–24.99, 25–29.99, 30–34.99, 35–39.99 and 40+), we include counts of the number of days in each trimester with minimum air temperatures within 5-degree ranges (under 5, 5–9.99, 10–14.99, 15–19.99, 20–24.99, 25+). In our analysis, the omitted category for minimum temperatures is 15–19.99. This metric reflects the implication from our Conceptual framework that daily minimum temperatures – which, if high, reflect prolonged exposure to heat – may indicate higher risk of heat strain continuing for long enough to affect the developing fetus.

Wet bulb.

This metric is analogous to the ‘Max and min’ metric described above, but using wet bulb temperatures as the underlying data series, instead of air temperature. We construct counts of the number of days within each trimester with daily average wet bulb temperatures with ranges of <10, 10–14.99, 15–19.99, 20–24.99, 25 + , and the number of days with daily maximum wet bulb temperatures within ranges of <10, 10–14.99, 15–19.99, 20–24.99, 25–29.99 and 30 + . This metric reflets the possibility highlighted in our Conceptual framework that wet bulb temperatures may measure heat exposure more directly than air temperatures.

Maximum wet bulb temperature is not available directly within the NASA data, but we construct this measure using daily maximum combined with relative humidity, using Stull’s equation [56]. We use average wet bulb temperature instead of minimum because this measure is directly available. However, as daily average is a linear transformation between maximum and minimum, the relationship between maximum and minimum (for air temperature), and maximum and average (for wet bulb temperature) should be constant.

Table 1 sets out the mean, standard deviation, maximum and minimum of these measures over the full pregnancy, for babies in our sample. During an average pregnancy, there are 2.4 heatwaves and just over one week with maximum temperatures above 40 degrees – but this varies greatly, with some pregnancies experiencing up to 9 heatwaves, and 98 days with maximum temperatures above 40 degrees.

Download:

Table 1. Descriptive statistics.

https://doi.org/10.1371/journal.pone.0330498.t001

3.4. Outcome measure

Our main outcome measure is preterm birth, defined as birth before 37 complete weeks of pregnancy. In our population, 10 percent of babies are born preterm (Table 1). We focus on preterm birth for comparability with previous research: it is the most commonly studied birth outcome. However, our conclusions hold across four additional measures of health at birth: birthweight, small for gestational age, Apgar scores, and admission to a special care nursery. Estimates using these outcomes are presented in the supplementary material.

Preterm birth is an important intermediary outcome: a large body of research tells us that children born preterm are more likely to face worse health and educational outcomes [19,57,58]. In the NT, children who were born preterm are more likely to be assessed as developmentally vulnerable at age 5 [59].

Recent research demonstrates that this relationship is not fixed: the predictive power of preterm birth has declined over time [60], and many children born preterm may face no detectable long-term effects [61]. This makes sense, for two reasons. First, advances in neonatal healthcare have greatly improved outcomes for preterm-born infants [62]. Second, because preterm birth has many causes [63]. The aetiology of preterm birth, and hence the long-term outcomes associated with it, may differ over time and across contexts. However, even in the desirable situation where preterm birth does not increase the risk of long-term developmental vulnerabilities for an individual, it remains an outcome of interest given the high cost of providing remedial neonatal care [20].

3.5. Analytical methods

Our goal in this analysis is to assess whether and how our estimates of the causal impact of heat exposure on preterm birth change when we use different heat exposure metrics. A challenge in doing so is that heat exposure may be correlated with birth outcomes for reasons which do not necessarily reflect the causal impact of heat itself.

Of primary concern is omitted variable bias. Heat exposure varies both over time, and across locations. But most of the variation in birth outcomes over time and across locations is not due to heat exposure. For instance, socioeconomic status is known to affect birth outcomes, but is also likely to affect both place of residence and timing of conception [64]. A link with place of residence may be particularly important in the NT, given that most non-Aboriginal people live in the capital city of Darwin, whereas most Aboriginal people live in remote communities [65], many of which face higher levels of heat exposure than Darwin. Similarly, we know that there are seasonal risks in addition to heat exposure, such as disease prevalence and economic conditions, which contribute to month-to-month variations in birth outcomes [37].

Our analytical approach is to estimate the causal effect of heat exposure by isolating fluctuations in heat that are exogenous to these (possibly endogenous) sources of time- and place-based variation. Exposure to such exogenous variation is beyond individuals’ control, and cannot be anticipated more than a few days in advance. This variation in exposure is, therefore, as good as random.

To do this, we specify a regression model with fixed effects for time and place. This approach is discussed in detail in Dell et al.’s review [40], and has been widely used in prior research.

As is common practice, we allow for the effects of heat exposure to vary by trimester; this allows for the possibility, as discussed in the Conceptual framework, that the effects of heat exposure may depend on its timing – for instance, exposure in the first trimester may increase the risk of pre-eclampsia [27], and exposure in the third trimester may bring forward labour [33].

We therefore estimate the following linear fixed effects model:

(1)

where is an indicator of whether baby i, who was conceived on date t (in month m, year y), was born preterm or not. k is an index of the three trimesters of pregnancy. X is a set of individual-level covariates, which includes an indicator for whether it is the mother’s first pregnancy, the age of the mother (in 5-year age bands), and an indicator for whether the baby is Aboriginal. Standard errors are clustered at the location level. We identify location, j, based on the mother’s suburb/community of residence, which we have geocoded, and grouped into locations of maximum 50-mile distance between points using cluster analysis. There are 106 such clusters in our data, with an average of 311 births per cluster.

Heat represents any one of the five heat metrics outlined above. They are based on daily weather observations from small-area location j, which are summarised over each trimester based on exact date of conception t. Our coefficients of interest are .

In specifying the fixed effects, we follow prior studies in using fixed effects for the month-year of conception, , and for the mother’s place of residence, interacted with the month of conception and the baby’s sex, . Interacting location fixed effects with month and sex allows for the possibility that there may be different seasonal patterns in birth outcome across locations – this may be the case in the NT given, for example, that some communities experience regular flooding in the wet season, and hence their experience of the wet season may be different from other communities that do not flood. Furthermore, we know that different types of exposures may affect male and female fetuses differently [66]. Interacting these location- and season-specific effects with sex allows for this possibility.

To illustrate what we are comparing with these community-month-sex fixed effects, consider two boys born in Darwin in January of the same year. Both boys were conceived in the same location (Darwin) and the same month (April), so they share the same location-month-sex fixed effect, though they were conceived two weeks apart. They also share the same month-year fixed effects. However, their mothers experienced different heat exposure during pregnancy due to particularly warm weather occurring in January while one baby remains in utero but the other has already been born. The fixed effects allow us to compare outcome for these two boys, attributing any difference to the variation in heat exposure their mothers experienced, while holding constant all other factors that vary systematically by location, month of conception, and baby’s sex.

This day-to-day variation in weather within the same location-month-sex group is the source of identification in our model. It represents truly exogenous variation—mothers cannot anticipate or control whether their pregnancy will coincide with an unseasonably hot days versus a cooler days within the same general season and location.

Our choice of interacted fixed effects follows prior studies [4,67]. That said, S1 Table shows that our estimates are similar regardless of the specifics of how we define these fixed effects.

Machine learning considerations.

In this analysis, we set out to select a preferred heat metric from among many potential regressors. To do this, we initially considered use of machine learning – and in particular lasso regression. However, in this case we determined that a machine learning approach was not appropriate because of the low explanatory power of heat exposure variables relative to other factors that determine the risk of preterm birth. As we show in our analysis and address in our discussion, heat exposure explains a very small share of variation in outcomes. The differences in explanatory power between alternative metrics are smaller still (see Discussion section below). Hence, when we apply a lasso regression, the penalty structure prioritises parsimony over detecting small but meaningful effects – and eliminates heat exposure variables. We are aware that double machine learning approaches can help to address the regularisation bias. But we determined that, given the goal of our analysis was to make general recommendations for how heat exposure should be measured, we favoured a more direct approach to comparing alternative metrics.

3.6. Our approach to selecting a preferred heat metric

Part of our goal in this analysis is descriptive: to run the same model sequentially with each heat metric, and learn how choice of heat metric affects our conclusions around the impacts of heat exposure on preterm birth. To do this, we present regression coefficients for the same model, using each of our five heat metrics.

However, we also wish to select a preferred heat metric. Out of our five candidate metrics, we want to know which one best captures the impact of heat exposure on preterm birth in the NT. In doing so, we face a challenge: because each regression uses different heat exposure metrics, the magnitudes and statistical significance of the resulting coefficients cannot be directly compared with each other.

We therefore take a two-stage approach to selecting a preferred metric. First, we present an F-test of joint significance of the regressors included in each heat exposure metric. If the regressors are not jointly statistically significant, we exclude the metric from further comparison. Second, we compare goodness of fit, for the remaining metrics.

We use two measures of goodness of fit. First, the R-squared. This tells us how much variation in the outcome is explained by the regressors. A standard R-squared increases when additional regressors are added. To account for different numbers of regressors in each metric, we use an adjusted R-squared: this penalises additional regressors. A higher adjusted R-squared, therefore, indicates that the model fits the data better, when comparing among models with different numbers of regressors. Second, we present the Akaike Information Criterion (AIC). The AIC helps us to compare several alternative, non-nested models, on a single measure: it is calculated as the likelihood of the model (estimated using maximum likelihood), penalised for the number of regressors. It therefore weighs up model fit with model complexity, allowing us to compare multiple working hypotheses against each other [68]. In their review on variable selection, Heinze, Wallisch and Dunkler [69] recommend using AIC in cases such as ours, where theory supports a relatively small set of competing models, and we want to select between them. In comparing AIC scores, a lower score indicates better fit.

4. Results

4.1. Goodness of fit comparison

Table 2 presents a comparison of how well each heat metric fits the data.

Download:

Table 2. Measures of model fit for each heat metric.

https://doi.org/10.1371/journal.pone.0330498.t002

Across our five metrics, we can first exclude ‘Trimester average’ and ‘Heatwave count’ from comparison: in an F-test of joint significance at the 5% level of significance, we would fail to reject the hypothesis that the coefficients in these metrics are jointly equal to zero, hence that they do not help to explain variation in preterm birth rates.

We then have three remaining metrics. Across all metrics, the adjusted R-squared is low and varies little, ranging between 0.035 to 0.037. However, the metrics produce very different AIC scores. On both R-squared and AIC measures, the ‘Max and min’ metric performs better: it has a slightly higher adjusted R-squared and substantially lower AIC.

We separately consider whether a combination of metrics provides a better fit – for instance, combining the ‘Max and min’ metric, the ‘Wet bulb’ metric and ‘Heatwave count’ metric together. However, while some of the additional coefficients are statistically significant, they do not meaningfully improve model fit (S2 Table), and interpretation of individual coefficients becomes very difficult.

We therefore select the ‘Max and min’ metric as our preferred heat exposure metric.

4.2. Coefficient estimates

Having identified the ‘Max and min’ metric as our preferred one, we now turn to a presentation of the coefficients for the regressors in each heat metric. This tells us both the estimates from our preferred model, and the potential consequences, in terms of inferences we may draw, from selecting a non-preferred metric.

We present estimates graphically, showing coefficients and their 95% confidence intervals. Tables with all coefficients cited and their standard errors can be found in S3–S10 Tables.

Benchmark heat metric.

Fig 3 presents estimates from our benchmark metric. We can see little impact from heat exposure in the first trimester. However, in the second and third trimesters we find that exposure to cooler temperatures (below our omitted category of 25–30 degrees) contributes to lower risk of preterm birth; we see no additional impact of heat exposure above 30 degrees. This suggests that it may be predominantly heat strain (i.e., prolonged exposure to moderate heat), instead of heat stress (i.e., exposure to extreme heat) affecting pregnancies in the second and third trimesters.

Download:

Fig 3. Benchmark heat metric.

This figure shows the regression coefficients and 95% confidence intervals from Equation 1, where the heat metric is a count of the number of days within each trimester that the daily maximum temperature fell into the following ranges: < 20, 20–25, 25–30, 30–35, 35–40, 40 + . The omitted category is 25–30 degrees. Sample size: 34,258.

https://doi.org/10.1371/journal.pone.0330498.g003

If it is the case that maternal heat strain affects the baby through reduced flow of nutrients to the placenta [24], this could explain why we see positive effects of cooler temperatures in the second and third trimesters, but not the first trimester: in the first trimester, the fetus receives nourishment from the yolk sack and not the placenta, and is therefore less reliant on maternal blood flow for nutrients. When the placenta takes over around the beginning of the 2^nd trimester, this is when we might expect an impact of heat strain.