Estimating Indoor PM2.5 and CO Concentrations in Households in Southern Nepal: The Nepal Cookstove Intervention Trials

High concentrations of household air pollution (HAP) due to biomass fuel usage with unvented, insufficient combustion devices are thought to be an important health risk factor in South Asia population. To better characterize the indoor concentrations of particulate matter (PM2.5) and carbon monoxide (CO), and to understand their impact on health in rural southern Nepal, this study analyzed daily monitoring data collected with DataRAM pDR-1000 and LASCAR CO data logger in 2980 households using traditional biomass cookstove indoor through the Nepal Cookstove Intervention Trial–Phase I between March 2010 and October 2011. Daily average PM2.5 and CO concentrations collected in area near stove were 1,376 (95% CI, 1,331–1,423) μg/m3 and 10.9 (10.5–11.3) parts per million (ppm) among households with traditional cookstoves. The 95th percentile, hours above 100μg/m3 for PM2.5 or 6ppm for CO, and hours above 1000μg/m3 for PM2.5 or 9ppm for CO were also reported. An algorithm was developed to differentiate stove-influenced (SI) periods from non-stove-influenced (non-SI) periods in monitoring data. Average stove-influenced concentrations were 3,469 (3,350–3,588) μg/m3 for PM2.5 and 21.8 (21.1–22.6) ppm for CO. Dry season significantly increased PM2.5 concentration in all metrics; wood was the cleanest fuel for PM2.5 and CO, while adding dung into the fuel increased concentrations of both pollutants. For studies in rural southern Nepal, CO concentration is not a viable surrogate for PM2.5 concentrations based on the low correlation between these measures. In sum, this study filled a gap in knowledge on HAP in rural Nepal using traditional cookstoves and revealed very high concentrations in these households.


Introduction
Approximately 3 billion people worldwide rely on solid fuels (biomass or coal) for cooking and heating due to lack of access to cleaner fuels [1,2]. Solid fuels are typically used with unvented, inefficient combustion devices leading to high emissions of toxic pollutants due to incomplete combustion, including two main pollutants contributing to morbidity and mortality: particulate matter (PM) of various sizes, and carbon monoxide (CO) [2,3]. High concentration of fine PM is a known risk factor for cardiopulmonary adverse outcomes. CO is associated with fatality and acute exposure-related reduction of exercise tolerance and also a marker for PM exposure in some studies [3,4]. Combining exposure to all related pollutants, household air pollution (HAP) due to solid fuels was estimated to account for 3.5 million deaths across the world in 2010, and was the leading risk factor for death in South Asia [5]. To reduce the disease burden due to HAP in households using solid fuels, understanding the exposure-outcome relationship is critical [3,6].
Difference in stove design, fuels used and cooking practices across regions can lead to large variability in HAP concentrations. Therefore information is needed to assess exposures across different study locations [6]. Research on solid fuel related pollutants and health impacts are limited in low income countries like Nepal [7]. Previous studies documented high HAP concentrations during cooking in Nepalese houses using biomass fuel: 4,741 μg/m 3 and 13.7 parts per million (ppm) for PM 2.5 and CO, respectively [8,9]. In rural India, recent studies found 24-hour average concentrations of 686 μg/m 3 for PM 2.5 and 2.6 ppm for CO among households using biomass fuel [10], and 48-hour average concentrations of 1,250 μg/m 3 for PM 2.5 and 10.8 ppm for CO in households using traditional cookstoves [11]. These levels are many times higher than current air quality guidelines published by the World Health Organization (WHO): 25 μg/m 3 for 24-hour average ambient PM 2.5 exposure and 6 ppm for 24-hour average indoor CO exposure [4,12].
The Nepal Cookstove Intervention Trial-Phase I (NCIT-I) was designed to assess and reduce adverse health effects (mainly acute lower respiratory infection) of biomass fuel smoke exposure among women and young children with installation of enhanced, ventilated biomass stoves to replace the traditional open burning mud stoves. Continuous daily concentrations of PM 2.5 and CO were measured before and after the installation of new stove in area close to stove among eligible households. This paper reports the methods for quantifying daily indoor PM 2.5 and CO concentrations using monitoring data collected before enhanced stove installation in preparation for further analysis of related health outcomes. Issues and systematic solutions regarding data reduction and data analysis for daily continuous HAP concentrations are detailed as well as method for determining the pollutant concentration during cooking or stove-influenced (SI) times.

Data Collection
NCTI-I was conducted in Sarlahi, a district on Nepal's southern border with Bihar State in India. Residents of all households in four Village Development Committees were screened for enrollment eligibility. The final eligible households only used traditional biomass cookstove indoor and had a married woman aged 15-30 or a child younger than 36 months. Detailed methods for study design and enrollment criteria have been published previously [13]. Between March 2010 and July 2012, all participating households in NCIT-I received two HAP assessments, once before the new stove was installed and once afterwards. This assessment comprised measurement of PM 2.5 and CO concentrations, temperature, and relative humidity every 10 seconds for a period of approximately 21 hours. For each household-day, measurements started at approximately 3:00 pm and stopped around 12:00 pm the following day. This interval covered nearly all the cooking events since lunch is not a typical meal in rural Nepal. In addition, household characteristics including roof and wall material, room dimensions, and number of external openings in the kitchen were collected during enrollment. Date (season) and fuel type (wood, animal dung, crop waste) were collected during environmental sampling.
HAP concentrations were measured using a package of instruments including the DataRAM pDR-1000 (Thermo Scientific, Franklin, MA), the LASCAR CO data logger (EL-USB-CO300, Erie, PA), and the HOBO U10 Temperature and Humidity (TH) Data Logger (Onset Computer Corporation, Pocasset MA), all recording data in 10-second intervals. A package of instruments was placed approximately 1 meter in front of the stove and approximately 1.5 meters off the floor during each measurement to best capture the exposure to individuals who were cooking.
This study was approved by the institutional review boards (IRB) of the Johns Hopkins Bloomberg School of Public Health and the Institute of Medicine, Tribhuvan University, Kathmandu, Nepal.
As a significant proportion of this population was illiterate, verbal informed consent was received from all participating households and individuals and consent was documented directly on data collection forms and entered into the study database. All IRBs approved the consent procedures and all other procedures used in these studies. The trials are registered at Clinicaltrials.gov (NCT 00786877).

Pre-Processing HAP Signals
To ensure consistency of quality control for data collected in NCIT-I, all daily pollution records collected before and after stove installation were pre-processed together; a total of 7684 PM and 6615 CO measurements. Measurements of PM concentration were removed from analysis when a) data were in the wrong format or could not be connected to an eligible household (2.6%), b) total sampling time was shorter than 18 hours (12.2%) and c) PM data with abrupt change (>5%) from baseline during sampling, or with an entirely flat line during sampling while cooking-time peaks were observed in corresponding CO results, were identified as physically implausible and a result of machine malfunction (1.6%). Similar quality control was conducted for CO measurements (3.6%, 19.7%, 3.3% removed from analysis respectively). The pDR-1000 is a passive nephelometric device and measured PM concentrations can be biased due to variations in ambient humidity and particulate matter composition. PM concentrations were adjusted to account for these factors using an algorithm described previously, resulting in gravimetric equivalent PM 2.5 values [14]. PM 2.5 measurements that lacked concurrent temperature and humidity measurements necessary for adjustment were removed (3.4%). Correspondingly, 10.7% CO measurements were removed due to missing PM 2.5 measurements.
Drift of all PM 2.5 measurements was calculated by subtracting the machine reported internal average concentration from the manually calculated unadjusted time-weighted average concentration of real-time pDR readings [15,16]. In this project, given the high average concentration for PM 2.5 , the ratio of drift over daily average PM 2.5 concentration worked better as an exclusion criterion than the absolute value. Measurements with a drift ratio higher than 50% were removed (0.5%). After eliminating these data, the mean drift ratio was 1.3% and the greatest drift was 262 μg/m 3 , both acceptable when compared with the unadjusted time-weighted average concentration of 1712 μg/m 3 . Since the drift might have occurred at any point after the start of sampling and the drift was proportionally small, no further drift adjustment was performed.
The measurement range for pDR-1000 is 1 to 400,000 μg/m 3 , with 1 μg/m 3 resolution; PM 2.5 concentrations below the limit of detection were recorded as 0 μg/m 3 [17]. To account for possible bias caused by this setting, all 0 μg/m 3 PM 2.5 measurements were adjusted to a value that is closest to half of the limit of detection 0.5 μg/m 3 , and is equal to or higher than the 1 μg/m 3 resolution (1 μg/m 3 in this case). Similarly, since the limit of detection for the LASCAR CO data logger was calculated to be 1.1 ppm, with 0.5 ppm resolution, all 0 ppm CO measurements were adjusted to the resolution (0.5 ppm in this case). 11.3% of PM measurements and 16.1% CO measurements were replaced.
To ensure that concentration excursions shorter than 30 seconds would be discounted, a running median of length 5 was applied to the 10-second data before aggregation into data with 5-minute intervals. These excursions may have been caused by unexpected disturbances to the sampling machine, for example by being bumped. Aggregating the filtered 10-second data into averages of 5 minutes reduced the size of the dataset while preserving the original diurnal trends in concentration.

Quantifying daily Average HAP
Daily average concentration for each pollutant measure was estimated by the arithmetic mean of the observed 5-minute interval values derived from the smoothed 10-second time series. In addition to daily averages, three additional metrics were calculated for each household to more fully summarize distributions of PM 2.5 and CO concentrations during each daily measurement period: the 95 th percentile, hours above 100 μg/m 3 for PM 2.5 or 6 ppm for CO; and hours above 1,000 μg/ m 3 for PM 2.5 or 9 ppm for CO. 100 μg/m 3 is four times the WHO guideline for 24-hour average PM 2.5 exposure and was used as a threshold in previous study on the association with acute lower respiratory infection [18]. 9 ppm is the exposure threshold above which carboxyhemoglobin level is expected to exceed 2% for a normal subject engaging in light or moderate exercise for 8 hours, while 6 ppm is recommended to address impact of chronic exposure for 24 hours [4].
To further understand air pollutants and concentrations attributed to cookstove usage, an algorithm was developed to differentiate SI and non-stove-influenced (non-SI) periods. Since stove related cooking events would elevate the level of PM 2.5 and CO, we first defined a baseline concentration and threshold above which measurements were candidates to be defined as SI. Since activities like sweeping and smoking could also increase pollution concentrations for short periods, a filtered time series was obtained for each home by applying a running median smoother of length n to the 5-minute average values to eliminate short peaks. The baseline level was then defined as the α (e.g. 10 th ) percentile of this filtered series. This baseline was meant to represent a typical value during the non-SI period. The SI threshold was then defined as a value β times the baseline level. The 5-minute intervals for which the filtered values exceeded this threshold were defined as SI, and all other times were defined as non-SI. With this definition of SI, the original 5-minute aggregated data (before the running median smoother of length n) were then used in all subsequent calculations.
The SI partition depends on three constants: n, α, and β. We studied the dependence of the final daily average concentration measurements on the choice of these constants among the values: n = 5, 7 and 9; α = 10, 20 and 30%, and β = 1.2, 2.0, and 4.0, producing 27 different average values, one for each combination of constants to study the effect of constant choice on the characterization of SI concentration. The correlation was estimated using the Pearson correlation coefficient for each pair of the 27 averages to determine the influence of the constants on the average SI concentrations.
We also estimated the correlation between daily average PM 2.5 and CO under different fuel types and seasons using Pearson's correlation coefficients. The association between pollution concentrations and household characteristics was initially assessed by stratification. Logarithm-transformed concentrations were linearly regressed on household characteristics controlling for fuel type, season, roof material, kitchen wall material, kitchen size and presence of external openings in kitchen. Both daily average PM 2.5 and CO concentrations among all households had a more nearly symmetric distribution on the logarithmic scale. Logarithmtransformed values were therefore used in all regression models.

Household Characteristics
Applying the inclusion criteria described above, the number of daily households with prestove installation data available for analysis was 2,980 for PM 2.5 and 2,013 for CO measurements. The average sampling period per household was 21.7 h (interquartile range: 20.9 h-22.1 h). In Table 1, we summarized the distributions of household characteristics and season of assessment. More than 70% of houses had walls made from bamboo with mud plaster or wood; the vast majority of roofs were tile or tin. Nearly a third of kitchens were internal rooms with no window or door opening to the outdoors. More than half of households burned wood alone or in combination with crop waste or dung and most of the measurements were conducted during the dry season. Households with missing information on characteristics or environmental factors had similar PM 2.5 and CO concentrations as those with such data.

General description of HAP
In Table 2, we summarized PM 2.5 and CO concentrations in four metrics. The arithmetic mean and median for daily average PM 2.5 concentration exceeded 1,000 μg/m 3 , while that for CO concentration were over 8 ppm. The mean and median for the 95 th percentile PM 2.5 concentration were higher than 4,500 μg/m 3 , while that for CO concentration were over 35 ppm. And half of the households in these communities had approximately 15 hours of PM 2.5 concentrations over 100 μg/m 3 , and 5 hours of that over 1000 μg/m 3 . 15% of households experienced CO levels higher than 9 ppm for more than 8 hours (data not shown).
Correlations among SI concentrations calculated with different sets of constants were high for both pollutants, ranging from 0.92 to 1.00 for PM 2.5 , and 0.83 to 1.00 for CO. PM 2.5 results were reported for running medians of length 7 (n = 7), baseline level at the 30 th percentile (α = 30%), and SI defined to be at levels 4 times the baseline (β = 4.0). The corresponding values for CO were chosen as n = 7, α = 20%, and β = 2.0. Qualitatively similar results are obtained for the other parameter values and are available from the authors. An example of the raw and filtered, then averaged PM 2.5 and CO data with SI periods highlighted is shown in Fig 1. In Table 3, we estimated HAP concentrations during SI and non-SI periods. The SI concentrations were about half of the 95 th percentile concentrations for both PM 2.5 and CO, and about 20 times of non-SI concentrations. The non-SI CO concentrations were close to zero while the non-SI PM 2.5 concentrations were over 100 μg/m 3 . Fig 2 presents hourly average PM 2.5 and CO concentrations. Both pollutants had two peaks (observed elevation in median pollutant concentrations) corresponding to cooking times between 7:00 am and 11:00 am, and then again between 6:00 pm and 10:00 pm, representing the typical cooking pattern in the study population. Total cooking hours under this pattern were similar to the estimated mean and median SI times in Table 3. As displayed in Fig 3(A), the monthly average PM 2.5 concentration was lower in the rainy season and higher in dry season. For CO (Fig 3(B)), there was no evidence of substantial seasonal variation. Daily average indoor relative humidity and temperature tracked the seasonal outdoor patterns as shown in Figs 3(C) and 2(D).  [19] to estimate the possible non-linear relationship. Log scales were used because each variable had a more nearly symmetric distribution on the log scale. Also displayed was the fitted linear regression of PM 2.5 against CO, which appeared curvilinear on the log  Relationship between HAP and household characteristics PM 2.5 concentrations varied across household characteristics even after stratification by season, and the same held for CO concentrations (Tables 4 and 5). PM 2.5 concentrations in the rainy season were roughly half the corresponding concentrations in the dry season, while CO concentrations remained constant across the year. Among all metrics, non-SI in the rainy season was the only one that yielded PM 2.5 lower than the WHO 24-hour guidelines, while non-SI in dry season was still 10 times higher than the WHO 24-hour guidelines. Wood burning produced the lowest PM 2.5 concentrations across the year, while dung/wood was associated with lowest CO concentrations in rainy season; crop waste burning was lowest in the dry season. Presence of an external opening in the kitchen was associated with reduced PM 2.5 and CO concentrations in the dry season but higher levels in the rainy season. Bamboo with mud plaster or wood planks had the lowest PM 2.5 and CO concentrations among all kitchen wall materials. Table 6 presents the results of models in which the average daily concentrations were regressed on season and household characteristics. Higher PM 2.5 concentrations were associated with fuels other than wood, dry season, having an internal kitchen without an external window/door, larger kitchen size, and wall material other than bamboo with mud plaster or wood planks. Higher values of CO concentrations were associated with fuels other than crop waste, rainy season, smaller kitchen size, and wall material other than bamboo with mud plaster or wood planks (Table 7).

Discussion
We present a methodology for pre-processing and quantifying average indoor concentrations of PM 2.5 and CO in a representative sample of nearly 3,000 households in rural southern Nepal. We established criteria for data pre-processing to ensure the quality of data and efficiency of analysis. From every 10 seconds sampling, we used non-linear filters to eliminate spurious outliers, averaged the resulting values into 5-minute interval data that were used to estimate SI and non-SI periods. SI and non-SI concentrations were then used to characterize HAP. The method for estimating SI and non-SI includes 3 constants that can be used to tune the methods for this or other applications. The first is the length of a running median that eliminates shorter excursions that are more likely caused by suspension of settled particles through sweeping or other indoor activities rather than lighting of the stove. The second establishes the quantile of the 5-minute data series that should be used as the baseline level. This controls the length of time that could potentially be categorized as SI and represents a typical value during the non-SI period. The third constant defines a threshold that is a multiple of the typical non- SI concentration, allowing fluctuation in non-SI period. Since this study was conducted in an extremely poor environment with a per capita income of $146 [20], the cookstove is used for both cooking and heating, which makes it the main source of indoor air pollutants. It is reasonable to assume that these relatively long peaks identified were mostly caused by cookstove related activities. When deciding on the actual number of 3 constants, we compared results from different groups of constants to avoid misclassification of smaller peaks from fire star-up and end of burn periods. We also differentiated SI and non-SI exposures by identifying peaks through the change of differences in value between two neighboring data points, and setting rules to combine or exclude identified peaks [21]. It requires two constants to establish change qualified for peak, one to establish the length for combining peaks, and the other to establish the length for excluding peaks. This method is more intuitive, but also more sensitive to change in constants, and requires more intensive adjustment in achieving an acceptable result. The results from this method had a correlation coefficient of 0.85 for PM 2.5 and 0.87 for CO with results from the first method. Although misclassification exists both methods, we would recommend the first method given it is easier to generalize.
This study revealed extremely high HAP concentration with traditional biomass cookstoves in rural areas of southern Nepal. It demonstrated patterns with both high concentration peaks and long periods of elevated concentrations. Average PM 2.5 concentrations for daily, SI period, and non-SI period were 40, 100, and 10 times higher than the WHO guidelines. The use of a passive device for measuring PM 2.5 concentrations is subject to error when humidity levels are high and when air currents around the device are irregular. To address this concern, we calibrated this passive device against a gold standard gravimetric approach [14]. All concentrations reported herein are adjusted by this calibration. The calibration study was conducted in both a model house and local participating houses typical of the study setting. We did not limit movement of persons in the houses and, thus, some error could have been induced in our  measurements from irregular air currents passing through the detection chamber of the device. While we are unable to estimate the level of this potential error in measurement, for the purposes of the randomized trial, there is no reason to expect this potential error would be different before and after installation of the improved biomass stove. It is clear that controlling of SI PM 2.5 could significantly reduce the indoor concentration levels. The estimated high non-SI PM 2.5 concentrations could be a result of elevated community baseline PM 2.5 concentrations due to SI PM 2.5 exfiltration from other households or a contribution from other sources such as road dust, suggesting that community wide interventions may be necessary for reducing PM 2.5 concentrations to a significant degree. In contrast, compared with WHO guidelines, average CO concentrations for daily and SI periods was only 37% and 3 times higher respectively, while the CO concentration for non-SI periods was close to zero, indicating that successful control of SI CO could reduce the overall concentrations to an acceptable level. Given the fact that most households were sampled between 3pm and 12pm for logistic reasons, and that local people do not cook around noon, the daily HAP reported here might be higher than the actual 24-hour time weighted HAP in these households. The same situation might exist for non-SI concentrations but not for SI concentrations.
Since few data were available on daily HAP related to biomass fuel combustion with traditional cookstoves in rural Nepal, comparisons were made with studies carried out in rural India, where similar types of cooking behaviors, fuels, traditional stoves, and geographic and climate characteristics are present. Previous studies reported daily average PM 2.5  concentrations ranging from 686 to 1,250 μg/m 3 and daily average CO concentrations ranging from 2.6 to 10.8 ppm, both similar but lower than 1,377 μg/m 3 and 10.9 ppm reported in our study [10,11]. Previous studies in rural Nepal also identified 4,741 μg/m 3 PM 2.5 concentrations and 13.7 ppm CO concentrations during cooking periods, which are also close to 3,466 μg/m 3 and 21.9 ppm reported in our study [8,9]. Season was identified as the most significant influential factor in PM 2.5 concentrations, especially in the non-SI period. Given the relatively low temperature in dry season, changing behaviors such as closing windows or doors, and prolonged periods of fuel burning as a source of heat could explain the increase in overall and SI PM 2.5 concentrations. Increases in non-SI PM 2.5 concentrations could result from the lack of a dampening effect of rain on ambient particulate matter, and increases in background PM 2.5 concentrations in non-SI period due to closed windows and doors. However, CO concentrations were slightly increased in the rainy season, potentially caused by more inefficient combustion with wet fuel. Having an external opening to the outdoors in the kitchen also decreased the indoor PM 2.5 concentration, while CO was not impacted as much. The effect of external openings should be interpreted with caution because the status of these openings was not recorded during sampling. Wood was the cleanest fuel for both pollutants, while adding dung into the fuel led to worse indoor air quality. Other household characteristics such as the type of roof and wall material did not have consistent associations with concentrations of HAP.
A previous study from Guatemala suggested the use of CO as a surrogate for PM 2.5 [22] based on high correlations between these two pollutants [23]. However, the overall correlation between PM 2.5 and CO in this study was 0.55, lower than reported in Guatemala [22,23]. The coefficient of determination for this linear regression is estimated to be 0.30, lower than previously reported 0.73 to 0.78 in Guatemala [24]. The increases in correlation through stratification were also lower than a 2 fold increase reported from China [25]. This suggests that the use of CO concentration as a surrogate for PM 2.5 concentration is less appropriate on our setting, even after stratification by fuel type and season.

Conclusions
In this paper, we summarized the time series pre-processing in estimating daily average concentrations from the original 10-second intervals data. A simple, flexible method developed to distinguish periods during which pollution concentrations are SI or non-SI improved the understanding of HAP and could be easily applied to other studies by tuning 3 constants in the algorithm.
We also filled a gap in knowledge on HAP in rural Nepal with daily concentration data collected in~3,000 households and revealed the severity of the HAP problem in rural Nepal. For households utilizing a traditional open burning mud stove, the median daily average PM 2.5 concentration was over 40 times higher than the WHO guideline for daily exposure, and the median daily average CO concentration was about 30% higher than the WHO recommended guideline for daily exposure. A detailed description of the concentrations using multiple metrics will also facilitate further analysis on health outcomes for NCIT-I that will be reported in a future manuscript. Exploration into the influence of environmental factors and household characteristics on HAP provided potential intervention methods for reducing indoor air pollution.