Meal and habitual dietary networks identified through Semiparametric Gaussian Copula Graphical Models in a German adult population

Gaussian graphical models (GGMs) are exploratory methods that can be applied to construct networks of food intake. Such networks were constructed for meal-structured data, elucidating how foods are consumed in relation to each other at meal level. Meal-specific networks were compared with habitual dietary networks using data from an EPIC-Potsdam sub-cohort study. Three 24-hour dietary recalls were collected cross-sectionally from 815 adults in 2010–2012. Food intake was averaged to obtain the habitual intake. GGMs were applied to four main meals and habitual intakes of 39 food groups to generate meal-specific and habitual dietary networks, respectively. Communities and centrality were detected in the dietary networks to facilitate interpretation. The breakfast network revealed five communities of food groups with other vegetables, sauces, bread, margarine, and sugar & confectionery as central food groups. The lunch and afternoon snacks networks showed higher variability in food consumption and six communities were detected in each of these meal networks. Among the central food groups detected in both of these meal networks were potatoes, red meat, other vegetables, and bread. Two dinner networks were identified with five communities and other vegetables as a central food group. Partial correlations at meals were stronger than on the habitual level. The meal-specific dietary networks were only partly reflected in the habitual dietary network with a decreasing percentage: 64.3% for dinner, 50.0% for breakfast, 36.2% for lunch, and 33.3% for afternoon snack. The method of GGM yielded dietary networks that describe combinations of foods at the respective meals. Analysing food consumption on the habitual level did not exactly reflect meal level intake. Therefore, interpretation of habitual networks should be done carefully. Meal networks can help understand dietary habits, however, GGMs warrant validation in other populations.

Introduction Diet-disease studies frequently evaluate dietary patterns using data reduction techniques (such as and principal component, PCA, or cluster analysis) based on habitual intake. From habitual intake, defined as long-term average, we cannot imply which foods are eaten together. Therefore, our understanding of how dietary patterns arise from food intake is limited. The composition of meals is influenced by personal beliefs and preferences, by social, cultural, geographical, and economic factors, among others [1,2]. Such influences may affect meal intakes, which in turn may affect habitual dietary patterns. Therefore, considering population-specific meal differences, a healthy and an unhealthy dietary pattern might not be formed in the same way in different populations. Analysing food consumption and relationships between foods on the meal level can help to better understand how foods are consumed in relation to each other. This knowledge can be useful for shaping understandable meal-based dietary advice easily adaptable by the public.
Exploratory methods can also be applied to meal-specific data. For instance, Woolhead et al. identified 12 meal types from PCA [3]. However, PCA-derived dietary patterns are difficult to interpret as the interrelation between foods is not fully elucidated [4]. Probabilistic Graphical methods such as networks derived through Gaussian Graphical Models (GGMs) offer an insight into the relation between the dietary components and can help understand how foods are consumed in relation to each other during meals. These methods construct conditional independence networks between highly correlated variables in a dataset [5]. GGMs are commonly used in research areas such as omics [6,7] and psychopathology [8,9]. In the field of nutritional epidemiology, these methods have been previously used to construct and visualize dietary networks in specific populations [10]. Semiparametric Gaussian Copula Graphical Models (SGCGMs), a nonparametric extension to GGMs, can be used to analyse skewed data, as is often the case with dietary data [11]. These methods applied to dietary data may help to identify conditional intakes of different foods at meal-level and how those foods appear in habitual dietary patterns.
In this study, we estimated and described meal and habitual dietary networks derived through SGCGMs in a study sample of German adults and compare the relations found in meal networks to the ones present in the habitual network. This study will help to better understand the interrelation of foods consumed at meals and provide an insight about information lost or retained when we perform similar analyses using averaged (habitual) daily dietary data.

Sample size
Data collected between 2010 and 2012 from a validation sub-study within the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort were used for this study; 815 men and women participated in this sub-study. After exclusion of one participant due to dementia, a total of 814 participants were included in our analyses (S1 Fig). More details about this study design are available elsewhere [12]. The Ethics Committee of the Medical Association of the State of Brandenburg provided ethical approval. All participants gave their written informed consent.
recorded. The following 24hDRs were collected via telephone on randomly chosen days. All recalls were performed by trained interviewers. Food intake data were recorded in 11 eating occasions throughout one day (S1 Table).

Assessment of other variables
Body weight and height were measured in the study centre during the participants' first visit. Body mass index (BMI) was calculated as the ratio of weight in kg to height squared in meters. Study participants wore a combined heart rate and uniaxial movement sensor (Actiheart, CamNtech, Cambridge, UK) continuously for one week. Physical activity was then calculated as the total energy expenditure to resting energy expenditure ratio [14].

Modelling food intake
Food intake was collapsed into 39 food groups previously used in other studies (S2 Table) [15,16]. For modelling meal-specific food intake, four eating occasions were chosen: breakfast, lunch, afternoon snack, and dinner, based on four observed peaks in food consumption (S2 Fig). Meal food intakes were analysed separately by meal type to identify foods that were consumed together. For modelling habitual food intake, we averaged all available 24hDRs per participant and all 11 eating occasions in the day were taken into account.

Statistical methods and network analysis
GGMs describe conditional independence between variables, i.e., the relationship between two variables independent of the effect of other variables. They can be used to produce probabilistic graphs in which nodes represent variables and edges represent a relationship between the variables. These graphs can be quantified using partial correlations, under the assumption of a normal distribution. A high-dimensional multivariate data set can have no or few 0 values, which would form very dense, less informative graphical representations of the networks. For this reason, regularization methods for covariance estimation are available. Regularization is achieved by choosing a penalty parameter (λ >0), which reduces the variance and helps avoid overfitting of the model (avoiding the false inclusion of edges) [11]. Various methods are available for choosing the penalty parameter λ [17].
In this study, due to highly skewed data, the meal and habitual dietary networks were derived through SGCGMs, which is a nonparametric extension of GGMs. It performs the nonparanormal skeptic (Spearman/Kendall estimates preempt transformations to infer correlation) transformation in order to perform semiparametric analyses suited for highly skewed data [18,19]. This transformation is based on a nonparametric ranking of correlation coefficient estimators using Spearman's rho and Kendall's tau and offers an alternative for estimating high dimensional undirected graphical models without requiring normal distribution of the underlying data [20].
For the analyses here presented, skeptic transformed inverse covariance matrices were estimated using the "huge" R package [11,19]. The selection of the optimal penalization λ was performed with a tenfold cross-validated graphical lasso (glasso), which was run in R with the package "nethet" [21]. Communities, sets of closely related links, were detected within all identified networks to facilitate interpretation using the R package "linkcomm", which is able to detect nested and overlapping communities in networks [22]. For food groups belonging to more than one community, centrality was assessed as a measure for the importance of a node based on the number of communities it belongs to [23]. The identified networks and corresponding communities were exported for formatting to CorelDRAW Graphics Suite X3 (Corel GmbH, Munich; www.corel.de). Food groups were considered to form a network when three or more groups were related to each other. Partial correlations equal or greater than ± 0.30 were considered as strong. The proportion of (direction-specific) relations (i.e. edges) from meal-specific networks present also in the habitual network was used as measure of the degree of appearance or reflection in the habitual network. All statistical analyses were performed in SAS (Version 9.4, Enterprise Guide 6.1, SAS Institute Inc., Cary, NC, USA) or R (Version 3.1.3, R Foundation for Statistical Computing, Vienna, Austria).

Statement of previously published data
Previous publications have presented GGM dietary networks established from food frequency questionnaire (FFQ) data of the EPIC-Potsdam cohort collected at baseline between 1994 and 1998 (8,20). In our analysis we used multiple 24hDRs from a subgroup collected in 2010-2012. Furthermore, the previous publications did not assess communities or centrality of food groups. Another publication is based on spearman correlation to understand PCA patterns, as such patterns are also based on correlations [4], while this analysis is based on GGM approach, i.e., using partial correlations to identify networks. These networks visualize combinations of food intake consumed at the meal level reflecting the intake patterns.

Results
Baseline characteristics of all 814 participants are shown on Table 1. Participants were on average 65.5 years old, had a mean BMI of 27.5 kg/m 2 , and the majority was sedentary. A total of n = 2,411 breakfast observations (mean time 08:02), n = 2,236 lunch observations (mean time 12:37), n = 2,119 afternoon snack observations (mean time 15:31), and n = 2,346 dinner observations (mean time 18:45) were available. Mean intakes of the food groups per meal type and mean habitual intakes are shown on Table 2. Values are means ± SDs unless otherwise indicated. 1 n = 685; BMI, body mass index; TEE, total energy expenditure; REE, resting energy expenditure.

Breakfast networks
The SGCGM analysis identified one major breakfast network (Fig 1) where foods are grouped into five communities. Starting in the lower left, a community is made up of fresh fruits, nuts, legumes, and other cereals, linked by positive correlations, among which the strongest is nuts and other cereals (partial correlation = 0.47). Next, two partly overlapping communities can be observed; one composed of bread consumed together either with margarine (partial correlation = 0.25), or with butter and sugar & confectionery (partial correlations = 0.30 and 0.34, respectively) and the other composed of bread consumed with processed meat and cheese (partial correlations = 0.38 and 0.34, respectively). Processed meat in turn is consumed with margarine (partial correlation = 0.20) but not with sugar & confectionery (partial correlation = -0.17). The fourth and fifth communities found are also overlapping and they describe the dependency structure of intake of sauces, fish, fruiting & root vegetables, other vegetables, and poultry. Central food groups were in decreasing importance as follows: other vegetables, sauces, bread, margarine, and sugar & confectionery. Not all food groups that are part of this network were represented in the communities; tea and coffee, for instance, are strongly correlated with each other (partial correlation = -0.64), but were not part of a community, suggesting that these food groups are less closely linked to other food groups in the network (Fig 1).

Lunch networks
Our analysis identified one major lunch network for this meal characterized by six communities (Fig 2). Overall, with a more complex structure, this network reflects a variable consumption of foods. The community on the left describes the dependency structure between other cereals, condiments, legumes, and soups. In the centre of the network, there is a community composed by other cereals, other vegetables, vegetable oils, margarine, and red meat, with a strong positive correlation between red meat and other vegetables (partial correlation = 0.33) and with a negative correlation between margarine and vegetable oils (partial correlation = -0.22). A partially overlapping community describes the dependency structure between other vegetables, vegetable oils, bread, and potatoes. Next, on the right side of the network a community was detected where bread correlates strongly positively with cheese (partial correlation = 0.30) and negatively with potatoes and pasta & rice (partial correlations = -0.32 and -0. 16 (Fig 2).

Afternoon snack networks
There was one afternoon snack network identified with six communities (Fig 3). Similar to the lunch network, this network reflects a variable food intake, though it revealed stronger partial correlations among intakes. At the bottom of the network, a community was identified where coffee, cakes & cookies, and milk & dairy correlate strongly positively with each other (partial correlations = 0.46, 0.30, 0.45, respectively) and water correlates negatively with coffee and cakes & cookies (partial correlations = -0.32, -0.25, respectively). This community is linked with the two following communities through a negative correlation between cakes & cookies and bread. Bread, on one side, belongs to a community where bread is consumed with margarine, processed meat, and cheese and where fruiting & root vegetables are consumed with processed meat, margarine, and cheese. On the other side, bread belongs as well to a community where it is consumed with butter (partial correlation = 0.56) and butter consumed with cabbages and with fruiting & root vegetables. The largest community within this network involved potatoes, vegetable oils, other vegetables, fruiting & root vegetables, red meat, cabbages, and soups. Central food groups were, with decreasing order of importance: fruiting & root vegetable (part of five different communities), other vegetables, processed meat, cabbages, cheese, bread, and potatoes. Only tea, fish, leafy vegetables, other cereals, and poultry were part of the network but did not belong to any community (Fig 3).

Dinner networks
The SGCGM analysis identified one major dinner network and a smaller network (Fig 4). The major network shows a complex meal composition with four communities and one central food group, other vegetables, belonging to three communities. On the top right, a community shows that bread is consumed with processed meat and with either margarine or butter. Bread correlated strongly positive with processed meat and margarine (partial correlations = 0.41 and 0.37, respectively) and butter and margarine correlated strongly negatively (partial correlation = -0.37). Another community shows the concomitant consumption of potatoes with cabbages, red meat, and other vegetables. On the upper right, an independent community was found in a smaller dinner network. This final community is composed by beer, tea, and water, which all correlate negatively with each other, indicating that only one of these beverages is chosen in this meal. Sugar & confectionery was also part of this network, correlating positively with tea, but it was not present in any community.
Other food groups such as cheese, soups, pasta & rice, leafy vegetables, and fruiting & root vegetables were also part of the larger dinner network but did not form part of a community, suggesting these links are less closely linked to other food groups in the network (Fig 4).

Habitual diet network
One habitual network was identified by SGCGMs (Fig 5). This network is formed by a complex structure of interrelated food groups, where beer, red meat, fresh fruits, bread, butter, fruiting & root vegetables, potatoes, sauces, and processed meat play central roles, with decreasing importance. Overall, the ten communities identified within this network show: i) positive correlations between legumes, other cereals, and soups; ii) positive correlations between nuts, fruiting & root vegetables, and fresh fruits; iii) positive correlations between fish, fruiting & root vegetables, and vegetable oils; iv) positive correlations between sauces and pasta & rice and with potatoes but a negative correlation between potatoes and pasta & rice; v) positive correlations between cabbages, potatoes, red meat, and sauces; vi) a positive correlation between fresh fruits and milk & dairy as well as between red meat and beer, while fresh fruits and milk & dairy correlated negatively with red meat and beer; vii) positive correlations of beer with bread, processed meat, and butter; viii) negative correlations between beer, water, and tea; ix) positive correlations between bread, butter, and sugar & confectionery; and x) positive correlations between bread, margarine, and processed meat, and a negative correlation between margarine and butter. Out of the 39 food groups, 33 of them were part of this complex network and 22 of them formed part of at least one community. Soft drinks and wine formed part of this network but did not show in any of the meal networks.

Comparison of meal and habitual dietary networks
In general, partial correlations were stronger on the meal-specific dietary networks than on the habitual dietary network, especially in the case of the afternoon snacks. Some food groups that had central roles in meal networks were also central food groups in the habitual network, such as bread and potatoes. Four of the ten communities in the habitual network resembled communities found in the meals: the community formed by beer, water, and tea was also found in dinner; the community formed by bread, processed meat, margarine, and butter was similar to one seen in dinner; the community formed by soups, legumes, and other cereals was similar to one observed in lunch; and the community formed by red meat, cabbages, potatoes, and sauces was part of a larger community found in lunch. A few food groups that showed strong partial correlations only in a specific meal persisted on the habitual network, such as the relationship between milk & dairy and breakfast cereals seen in the breakfast network. In general, correlations between food groups were in the same direction (positive or negative) in meal and habitual networks, with the exception of soups and potatoes, which was positive in the afternoon snack and dinner networks and negative in the habitual network. By estimating the percentage of connections between foods in the meal-specific networks that were also present in the habitual dietary network we found that the dinner network was best reflected in the habitual network. Specifically, we found 50.0% of the breakfast, 36.2% of the lunch, 33.3% of the afternoon snack, and 64.3% of the dinner networks relations between food groups were present in the

Discussion
This study identified meal dietary networks through SGCGMs, an extension of GGMs suited for non-normal distributed data [20]. Communities and centrality of food groups were detected to assist interpretation. GGMs had not yet been used for meal-specific analyses. The meal-specific networks showed clear differences in composition and strength of correlations.
The combination of bread, cheese, processed meat, and margarine or butter was present in most networks (all meals except for lunch). Afternoon snack, which is a smaller but a culturally important eating occasion in Germany (equivalent to the British tea time) showed the strongest correlations, where the communities including bread correlated negatively with coffee, milk & dairy, and cakes & cookies, suggesting that either one or other combination is consumed during this meal. The networks for lunch and afternoon snack showed a complex structure, indicating a more variable food intake. Potatoes, red meat, other vegetables, and bread were often central food groups in the dietary networks but differences for each meal were evident. Out of the four main meals, dinner networks were best reflected in the habitual dietary network and the afternoon network was the least reflected despite the strong partial correlations. Despite a variable food intake in dinner, this network was among the least dense. The variable and substantial intake in this meal may have contributed to the better representation in the habitual dietary network. This analysis revealed different strengths of correlation and combinations of food intake at meal and habitual intake levels. As food is consumed at meal level, habitual level intake may not reflect the clear picture of food intake patterns in a population. The interrelation between intakes of different food groups is complex and when analysed using meal-aggregated data such as FFQs, weaker correlation structures are observed, which may arise due to an increased intra-subject variability [24]. This is usually a phenomenon seen in building exploratory dietary patterns, such as PCA-patterns. In our study, habitual diet takes into account some day-to-day variation within study participants, while the meal intakes were analysed independently from participant to accurately reflect foods eaten together. This resulted in stronger partial correlations in the meal networks. However, some characteristics of the habitual diet that were not traceable to the meal networks might come from other factors, such as individual characteristics and preferences and eating occasions not considered in this analysis such as smaller snacks [25,26]. Further investigations are required to understand why some of the relationships among food groups at the habitual level did not appear at the meal level.
GGMs were previously applied to dietary data from the EPIC-Potsdam cohort [10], a population of which our participants are a sub-cohort. In this study data from FFQs were used. Therefore, although the results highlight an overall-diet structure, they do not represent meal level intake-specific relationships. Despite numerous methodological differences in the dietary intake assessment and pattern analysis (food networks) between this and our study, we could observe certain similarities between their dietary networks and our habitual dietary network; for example, potatoes and red meat, which often played central roles in our meal networks, were simultaneously linked to multiple food groups in the principal networks. Nevertheless, in order to understand dietary habits, which in turn are the drivers of dietary patterns, food consumption should be analysed in a timing-, or meal-specific manner [27].
Established meal-specific habits are known to be present in the human diet. For instance, a few studies have observed a more homogeneous and simple composition of early meals and a more complex and varied composition of later meals [28,29]. Meal setting is an important factor affecting meal composition. For example, breakfast is more likely to be consumed at home and dinner outside of home [30]. Although we did not explore meal setting in our study, we did see a more simple structure of the breakfast network and a dinner network that was closest to the habitual network. In line with these observations, our recent study comparing meal and day level food intake to habitual diet using the same study sample revealed a consistent composition of the breakfast meal and a more variable intake at dinner, which was the meal that contributed the most to the formation of PCA-habitual dietary patterns [4]. Such differences across meals and similarities within them are not visible in day-aggregated data such as data commonly used to derive dietary patterns.
Commonly, the method of choice for deriving dietary patterns is PCA. This method (PCA) was previously applied to this study sample to derive breakfast patterns [31]. As PCA is also based on correlations, two PCA patterns, i.e., processed food pattern and dairy & cereal pattern, shared considerable similarity with our breakfast network. However, GGMs identify sparse networks reflecting patterns of intake and visualize the identified combinations of intakes in relation to each other. Smaller sub-networks (communities) are easier to interpret as compared to PCA patterns, which comprise of all the food variables [22], and GGM networks (specifically residualized, or conditional independence networks) have the ability of showing conditional independence between food groups [9] which PCAs or simple correlation analyses do not.
Furthermore, consistent intakes are underrepresented in PCA patterns [4] but this was not the case in GGM dietary networks. Therefore, GGMs are a valuable tool for the analysis and interpretation of the complex data structure of food intakes often seen in the field of nutritional epidemiology. To our knowledge, these tools had not yet been applied to dietary networks in the context of meals. Other exploratory methods have been used to explore meal patterns. For example, Hearty et al. [32] used artificial neural networks and decision trees with the purpose of predicting dietary quality in terms of the Healthy Eating Index. These are complicated but interesting applications of predictive machine learning models that may provide a better understanding of how hypothesis-based dietary patterns (i.e., dietary quality indices) [33] arise in a population but not directly comparable to the here presented data-driven GGM dietary networks, which are not intended to identify meal patterns predictive of or meeting dietary guidelines, but are rather describing the intake of the studied population in detail. Nevertheless, GGM dietary networks can also be combined with methods to predict disease or adherence to dietary guidelines similar to how is done with PCA patterns [33]. Such procedure is exemplified in the recent publication by Iqbal et al. [34] using habitual (non-meal specific) GGM dietary networks. Overall, hypothesis-based or data-driven methods should be considered based on the research question of interest [1].
Other studies estimated habitual diet with more sophisticated statistical methods such as the National Cancer Institute (NCI) method [35][36][37]. This method adjusted out day-to-day variation by accounting for food intake in the analysed population. In the present study we could not apply the NCI method due to a very high proportion of zeros at meals resulting in convergence problems of the statistical algorithms. Furthermore, our methods remain consistent with our previous work on meal, day, and habitual intake analyses [4]. Working with non-normal data implies some limitations; in order to circumvent the Gaussian assumption, SGCGMs perform a rank-based transformation of the original variables. Estimates, power, and Type I error can be dependent of the transformation method, sample size, and degree of non-normality [38,39]. Nevertheless, our sample size was large, with at least 2,119 observations per meal; also, for the descriptive purpose of this study, potential alterations in power and Type I error play a less important role. Nevertheless, this should be kept in mind for studies intending to find diet-disease associations. The stability of the resulting networks depends also from the approach used, which could be threshold-or model-based. Threshold-based networks, also called relevance networks, remove correlations weaker than a pre-determined correlation strength [40]. However, this threshold is typically arbitrary and may result in inclusion of false edges or exclusion of true edges [41]. In this study, we preferred a model-based approach (lasso using cross-validation), which seeks to identify a sparse model (identifying only important variables) by maximizing log-likelihood of the data [42,43].
In conclusion, SGCGMs identified meal-specific dietary networks describing combinations of foods that are eaten together. Clear differences were seen across meals. The habitual dietary network retained some but not all information from the meal-specific dietary networks and additionally showed relations not present at the meal level. As a result, interpretation of such habitual networks needs to be done carefully. Analysing food intake using both meal-based and habitual intake data can provide a broader picture about eating behaviour than using one approach only. GGMs and SGCGMs can be used as tools to obtain meal-specific insights of diet, which may be used as a foundation for meal-based recommendations. Nevertheless, these are methods that have not been applied often in the field of nutritional epidemiology and warrant further applications in other populations due to their specific features.
Supporting information S1 Table. List of all eating occasions with participant-identified labels used to record food intake in the 24-hour dietary recalls.