Dynamic Analysis and Pattern Visualization of Forest Fires

This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.


Introduction
Forest fires are a major concern in many countries, like United States, Australia, Russia, Brazil, China and Mediterranean Basin European regions [1][2][3]. Every year forest fires consume vast areas of vegetation, compromising ecosystems and contributing to the carbon dioxide emissions that are changing Earth's climate [4][5]. Besides the long-term economic implications associated to the climate change, forest fires have direct impact upon economy due to the destruction of public and private property and infrastructures [6]. Fires are mainly caused by natural factors, human negligence, or even human intent. Fire propagation and burnt area depend on many natural factors and conditions, not only on the terrain orography and the type of vegetation, but also on the efficacy of detection and suppression strategies. Moreover, fires caused by incendiaries contribute to increase the complexity of the phenomena. Understanding the underlying patterns of forest fires in terms of their size and spatiotemporal distributions may help the decision makers to take preventive measures beforehand, identifying possible hazards and deciding strategies for fire prevention, detection and suppression [7][8].
Forest fires have been studied using classical statistical tools. However, those methods reveal limitations, both in capturing all characteristics underneath forest fires dynamics, and the evolution along years [9]. Forest fires dynamics exhibits correlations in size, space and time. Size-frequency distributions unveil long range memory, which is typical in complex systems. Correlation between data is characterized by self-similarity and absence of character-istic length-scale, meaning that forest fires exhibit power-law (PL) behaviour [10][11][12][13].
Several studies have been published during the last years about this topic [14][15][16][17]. In references [18][19] it is shown that forest fires exhibit PL frequency-size relationship over many orders of magnitude and that such behaviour seems consistent with the self-organized criticality arising in complex systems. The most important practical implication of such results is that the frequency-size distribution of small and medium fires can be used to quantify the risk of large fires [19]. Nevertheless, some authors [15] suggest that a simple PL distribution of sizes may be too simple to describe the distributions of forest fires over their full range.
In reference [20] the time dynamics of forest fires is investigated and it is shown that forest fires exhibit time-clustering phenomena. More recently, the fractality of the forest fires was addressed in [21] using spatial and temporal fractal tools. The authors prove that these phenomena exhibit space-time clustering behaviour.
In this paper we look at forest fires from the perspective of dynamical systems. A public domain forest fires catalogue containing data of events occurred in Portugal, in the period 1980 up to 2012, is addressed. The data is analysed in an annual basis, modelling the occurrences as a sequence of Dirac impulses. Therefore, instead of modelling individual forest fires, we are describing the global dynamics along several decades. In this perspective, mutual information and visualization trees, generated by hierarchical clustering algorithms, are used. The Multidimensional Scaling (MDS) tool is adopted in order to compare and to extract relationships among the data. Finally, we propose an amplitude-space embedding technique that produces a clear fire pattern classification.

Characterization of the Dataset
Data from forest fires is available online at the Portuguese Institute of Nature and Forest Conservation (INCF), http://www. icnf.pt/portal/florestas/dfci/inc/estatisticas, and the catalogue contains events since 1980 up to 2012. Ignitions might have different sources, as natural causes, human negligence or human intentionality, among others. The data analysed in this paper was retrieved in December, 2013. Each record contains information about the events date, time (with one minute resolution), geographic location and size (in terms of burnt area). We decided to discard small size events, as those are prone to measurement errors, by adopting a cutoff threshold value of A min = 10 hectares. Fig. 1 illustrates the temporal evolution and size of the events occurred in Portugal, during 1980-2012 and meeting the cutoff threshold criterion. We tackle the concept of 'circular time' (since  there is a kind of one-year periodicity, with December close to January and not the opposite, as a Cartesian scale implicitly assumes). The (circular) time scale evolves along an Archimedean spiral, with origin at the center of the circumferences, given by: where (r, h) denotes the radius and angle coordinates, respectively, i = 0, …, 32, represents the year and p = q = 1. The burnt area is expressed in logarithmic units and is related to the color of the marks. We can note two annual cycles: the first is weaker and includes the months of February and March; the second is stronger and is due to the major incidence of fires during summer [22]. In Fig. 2 we depict the evolution of the burnt area per year and number of occurrences versus year. It is visible the increasing number of events as well as the strong activity verified around the middle of the decade 2000-2009. Nevertheless, the charts reveal a large volatility and pose difficulties to capture some trend. We observe minimal values for years 1983, 1988, …, 2008, and maximum values for 2003 and 2005, but no straightforward method to correlate data points. Fig. 3 represents the complementary cumulative distributions of the events size and the time interval between consecutive events.
The results shown above illustrate through simple statistics the increasing importance of understanding the behavior of forest fires and characterizing the spatiotemporal distributions unveiled by such a complex phenomenon. For that purpose, in the next sections we adopt several complementary mathematical tools.

Mutual Information Analysis
In this section we adopt the mutual information to correlate forest fires annual patterns. First we compute the mutual information, based on events size (i.e., burnt area), for each pair of years in the time period 1980-2012. Second, we use a hierarchical clustering algorithm to find relationships among the data. Visualization trees are used to highlight the interpretation of the results.

Mutual Information
The mutual information is a measure of the statistical dependence between two random variables, giving the amount of information that one variable ''contains'' about the other. If X i and X j are two discrete random variables, then the mutual information, I(X i , X j ), is given by: where p(x i , x j ) is the joint probability distribution function of (X i , X j ), and p(x i ) and p(x j ) are the marginal probability distribution functions of X i and X j , respectively. The concept of mutual information comes from the information theory [23] and has been adopted in the study of complex systems from diverse fields, namely in experimental time series analysis, in DNA and symbol sequencing and in providing a theoretical basis for the notion of complexity [24][25][26][27][28][29][30].
In this section, instead of expression (3), we use the normalized mutual information, I N (X i , X j ), given by [31]: where H(X i , X j ) represents the joint entropy between X i and X j : The normalized mutual information I N (X i , X j ) M [0, 1] simplifies comparison across different conditions and improves sensitivity.
Forest fires are analysed in an annual basis. For each year, i = 0, …, 32, in the period 1980-2012 the events are represented by: leading to 33 one-year length time series. This means that the events are modelled as Dirac impulses, where A k represents fire size (i.e., burnt area), t k is the instant of occurrence (with one minute resolution), t represents time and T denotes the time period of one year.
The signals x i (t) are then normalized according to (7): where m and s represent the mean and standard deviation values of all events listed during 1980-2012, with magnitude larger than A min = 10 ha. The mutual information is calculated to correlate events occurred in different years of the analysed time period. Fig. 4 depicts in a contour map the mutual information, I N (X i , X j ), between every pair of years i, j = 0, …, 32. The probabilities for calculating the mutual information are estimated from the histograms of amplitudes A k , constructed considering 476 bins, each one having width equal to 0.1 ha. To facilitate the comparison the cases i = j (i.e., those with maximum value of mutual information) are removed from the graph.
The map reveals strong correlations between certain years, corresponding to higher values of mutual information. This is well

Hierarchical clustering
Having in mind an efficient method to visualize and to compare results, a hierarchical clustering algorithm is adopted, based on the mutual information, I N (X i , X j ), between pairs of objects.
The goal of hierarchical clustering is to build a hierarchy of clusters, in such a way that objects in the same cluster are, in some sense, similar to each other [30,[32][33]. Based on a measure of dissimilarity between clusters, those are combined (or, alternatively, split) for agglomerative (or, alternatively, divisive) clustering. This is achieved by using an appropriate metric, quantifying the distance between pairs of objects, and a linkage criterion, defining the dissimilarity between clusters as a function of the pairwise distances between objects. The results of hierarchical clustering are presented in a phylogenetic tree adopting the successive (agglomerative) clustering and average-linkage method (Fig. 5). The software PHYLIP was used for generating both graphs (http:// evolution.genetics.washington.edu/phylip.html).  Fig. 4 and Fig. 5 can be used to visualise and to compare the events, in an annual basis. Fig. 5 leads to a result easier to interpret than Fig. 4, as it identifies groups of objects that are similar.

MDS Analysis and Visualization
In this section we adopt the MDS tools to handle information and the relationships embedded into the data.
MDS is a statistical technique for visualizing data that can reveal similarities between objects. The algorithm requires the definition of a similarity measure (or, inversely, of a distance) and the construction of a s6s symmetric matrix D of similarities (or distances) between each pair of s objects. MDS assigns a point to each object in a m-dimensional space and arranges the set in order to reproduce the observed similarities. A shorter (larger) distance between two points means that the corresponding objects are more similar (distinct). For m = 2 or m = 3 dimensions the resulting locations may be displayed in a ''map'' that can be visualized [34][35][36][37][38][39].
In our case, we obtain D (33633 dimensional) by means of the mutual information (4). Fig. 6 and Fig. 7 show the MDS maps for m = 2 and m = 3, respectively. The Shepard and the stress plots assess the quality of the MDS maps. The Shepard diagrams (Fig. 8 and Fig. 9) show an acceptable distribution of points around the 45 degree line, which means a good fit of the distances to the dissimilarities. On the other hand, the stress plot reveals that a three dimensional space describes adequately the data (Fig. 10). This can be concluded by observing the stress line, which diminishes strongly until the dimensionality is two, moderately towards dimensionality three and weakly from then on. Often, the maximum curvature point of the stress line is adopted as the criterion for deciding the dimensionality of the MDS map.
The MDS maps of Fig. 6 and Fig. 7 confirm the groups previously identified by the hierarchical clustering and, consequently, the relationships between the corresponding yearly patterns. Comparing Fig. 5 with Fig. 6 and Fig. 7, we conclude that all allow an easy interpretation of the results. The MDS maps, in particular the 3D plot, are more intuitive than the phylogenetic  tree. Moreover, most software for MDS analysis allows the user to rotate and visualise the maps from different perspectives, easing the identification of clusters. This is useful especially when dealing with large amounts of data.

Forest Fires Spatial Patterns
In this section we study forest fires in a complementary line of thought, namely by considering spatial information. First, we divide the geographic territory under study (i.e., 36.95u# lat # 42.15u; 29.50u# lon #26.19u), using a M6N (M = 30, N = 15) rectangular grid, and we determine the 33 bidimensional histograms of relative frequencies for all years in the period 1980-2012. Second, for characterizing the histograms, we calculate the Shannon entropy, S i , given by: where the probabilities p i (m, n) are approximated by the relative frequencies.
In Fig. 11, for example, we depict the bidimensional histogram for year 2010. The corresponding entropy is S i = 4.08 (i = 30).
In a more global perspective, we verify that amplitude and space data lead to distinct observations. The conclusions are 'decoupled' and reveal that both directions must be explored, with more data, in order to include all information in a global tool of analysis.
In this line of though, we embed amplitude and space data into a single graph by adding to the bidimensional MDS plot of Fig. 6 a vertical axis representing the Shannon entropy (Fig. 14).
We note that only two years, Y = {1983} and Z = {2005}, have now a clearly distinct separation from the main cluster, X. In Fig. 13 we observed them to be located at near extreme values, but, as mentioned, it is difficult to get idea due to large volatility. The embedding of amplitude-space techniques produced a clear classification pattern.

Conclusions
We analysed forest fires from the perspective of dynamical systems. Data from a public domain forest fires catalogue, containing information of events for Portugal, during the period 1980-2012, was studied in an annual basis. Mutual information to correlate annual patterns was considered. Phylogenetic trees generated by hierarchical clustering algorithms and MDS visualization tools were used to compare to extract relationships among the data and to identify forest fire patterns. Those tools allow different perspectives over forest fires that may be used to better understand the dynamics emerging in the plethora of phenomena that occur in forest fires.