Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

LeFood-set: Baseline performance of predicting level of leftovers food dataset in a hospital using MT learning

  • Yuita Arum Sari ,

    Contributed equally to this work with: Yuita Arum Sari, Atsushi Nakazawa

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    * yuita@ub.ac.id; sari0y0mif@s.okayama-u.ac.jp

    Affiliations Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Kita-Ku, Okayama, Japan, Faculty of Computer Science, Brawijaya University, Malang, Indonesia

  • Atsushi Nakazawa ,

    Contributed equally to this work with: Yuita Arum Sari, Atsushi Nakazawa

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – review & editing

    Affiliation Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Kita-Ku, Okayama, Japan

  • Yudi Arimba Wani

    Roles Conceptualization, Data curation, Methodology, Validation

    Affiliation Nutrition Department, Faculty of Health Sciences, Brawijaya University, Malang, Indonesia

Abstract

Monitoring the remaining food in patients’ trays is a routine activity in healthcare facilities as it provides valuable insights into the patients’ dietary intake. However, estimating food leftovers through visual observation is time-consuming and biased. To tackle this issue, we have devised an efficient deep learning-based approach that promises to revolutionize how we estimate food leftovers. Our first step was creating the LeFoodSet dataset, a pioneering large-scale open dataset explicitly designed for estimating food leftovers. This dataset is unique in its ability to estimate leftover rates and types of food. To the best of our knowledge, this is the first comprehensive dataset for this type of analysis. The dataset comprises 524 image pairs representing 34 Indonesian food categories, each with images captured before and after consumption. Our prediction models employed a combined visual feature extraction and late fusion approach utilizing soft parameter sharing. Here, we used multi-task (MT) models that simultaneously predict leftovers and food types in training. In the experiments, we tested the single task (ST) model, the ST Model with Ground Truth (ST-GT), the MT model, and the MT model with Inter-task Connection (MT-IC). Our AI-based models, particularly the MT and MT-IC models, have shown promising results, outperforming human observation in predicting leftover food. These findings show the best with the ResNet101 model, where the Mean Average Error (MAE) of leftover task and food classification accuracy task is 0.0801 and 90.44% in the MT Model and 0.0817 and 92.56% in the MT-IC Model, respectively. It is proved that the proposed solution has a bright future for AI-based approaches in medical and nursing applications.

Introduction

Observation of patients’ leftover food is a daily task in hospitals since it can evaluate patients’ nutritional intake. By analyzing patients’ leftovers, healthcare professionals can obtain valuable insights into the patient’s health status [1], including malnutrition [2] and depression [3]. Excessive leftovers can not only be a barrier to cost-saving measures but also negatively impact patients’ recovery rates. Thus, it is essential to improve the quality and taste of meals in nutrition facilities to reduce food waste and improve patient outcomes [4]. In a hospital context, analyzing meal leftovers give significant information regarding numerous areas of the hospital’s food service [5]. Based on a food waste survey conducted at a regional public hospital in Sidoarjo, Indonesia, there is a high residual rate of patient food, above the standard of 20% Limited meal variation, variances in inpatient classes, a lack of diversity in menu items, and patients not touching the same food due to surgery all contribute to this high residue. As a result, there is a massive loss of food that the patients did not consume [6]. Hospitals also assess patient satisfaction with meals delivered by studying food leftovers [7]. Thus, recording and evaluating leftovers in hospital and long-term care settings is an important task in terms of quality of care and environment. It is also important for the general consumer market.

Human visual estimation simplifies the weighing process, which is called subjective evaluation. Dietitians or nutritionists in hospital settings rely on qualified assessors to weigh food or perform visual examinations such as Comstock level analysis [8, 9]. Comstock level prediction is widely used in food consumption surveys to assess leftover food. This approach is the most popular and used in hospitals [10]. However, it not only requires a meticulous, skilled and trained estimator, but is also prone to frequent overestimation or underestimation, resulting in uncertainty about the actual food intake of patients or others. Based on these backgrounds, several researchers have leveraged artificial intelligence (AI) in an attempt to minimize the estimation errors in visual observations [1113]. This could serve as a decision support tool for dietitians and nutritionists estimating food leftovers of hospitalized patients.

In this paper, we propose a method to solve this problem by deep learning-based image recognition. Specifically, we have developed a method to estimate the amount of food eaten with higher accuracy than human observation, simply by inputting images before and after serving. Existing studies used simple image processing techniques such as counting areas using image segmentation; however, the method had large measurement errors [14]. In recent years, deep learning has been used in research on areas such as daily food consumption, improving bodily health, and reducing the occurrence of diseases. These universal methods can be applied to various domains without explicit feature extraction, allowing for extensive application in food computing, especially through image recognition for nutrition analysis [15]. [11] used convolutional neural-network (CNN) to estimate the amount of leftover liquid meals in hospitals. In deep learning projects, VGG and ResNet as examples of CNN models are often used as baselines. VGG architecture primarily comprises convolutional layers utilizing 3x3 filters along with max-pooling layers, illustrating that enhancing its depth can lead to better performance overall [16]. ResNet architecture help address the vanishing gradient issue in deep neural networks. These pathways enable information to traverse through layers, enhancing the network’s learning capabilities. By preserving or boosting performance, ResNets are capable of achieving a greater depth compared to VGG networks [17]. VGG networks are commonly used as a standard for comparison due to their high performance in image classification tasks [18]. On the other hand, ResNet is also frequently used as a benchmark in experiments designed to test the scalability of new topologies or their effectiveness in deeper networks [17]. The CNN model has not yet been applied to a limited dataset with imbalanced data in each food category to predict leftover foods objectively. Therefore, in this study, a multitasking model was implemented to calculate leftover food. Against this background, this study has the following highlights:

  1. The first large-scale open dataset for leftover food prediction was presented. The dataset contains 524 pairs of images from 34 food categories showing the food portions before and after consumption. In addition, there is data on the observed amount of leftover food and the results of weighing using digital scales both before and after consumption.
  2. A novel method for predicting leftovers and food categories using deep learning, which outperforms human observation.
  3. In an experiment, we implemented and compared several network architectures. As a result, the methods using multi-task (MT) learning architecture performed better than that of not using MT learning.

The remainder of this paper is organised as follows. [sec:related_work]Sect 1 outlines the related research on this topic. In [sec:dataset]Sect 2, a detailed description of the dataset is presented. [sec:method]Sect 3 explains methodology of this research. In [sec:results]Sect 4, the experimental results used as a baseline are presented. A discussion is provided in [sec:discussion]Sect 5, and limitations are elaborated in [sec:limitations]Sect 6. Finally, the article is concluded in [sec:conclusion]Sect 7.

1. Related work

Relevant literature on image-based food analysis and leftover prediction is reviewed in this section. Aizawa et al. [19] proposed an approach that uses images to obtain individual dietary patterns to improve food balance estimation. Their food recording system included food image extraction [20], food balance analysis [21], and summarization and visualization [22]. Food photos act as input to identify categories and weights in the nutritional evaluation system developed in [23]. Shape templates and area-based weights are used to increase the accuracy of the system [21, 24, 25]. Several methods showed the algorithm of calculating the number of food regions pixel-wise or segmented-based algorithms [26]. [2628] attempted to estimate food waste from images of plates before and after a meal, taking into account background variations and the need for background removal. The algorithm identified utensils as part of the plate during and after a meal, resulting in an estimation inaccuracy of 45.8%. Hence, due to the inherent limitations in accuracy and adaptability of the pixel-wise method in leftover calculation, an alternative strategy is required.

Recently, deep learning has been used in research in areas such as [11] that compares the accuracy of estimating liquid food residue using computer vision. The study focuses only on liquid foods and provides recommendations for future research to determine whether AI-based measurement techniques can be readily used by medical professionals in clinical settings, as the usefulness of the proposed AI-based measurement method is currently unknown. The AI estimation approach had lower errors than the visual estimation approach for fermented milk, peach juice, and total. However, there was no significant difference in errors for thin rice gruel. Errors in AI estimation were biased towards leftovers. Overall, AI estimation had smaller mean squared errors and higher coefficient of determination values for fermented milk and peach juice compared to visual estimation. Total R2 value was equal in accuracy for both AI and visual estimations.

Another research uses a deep learning object identification network to identify food categories and calculate meal consumption levels in pre- and post-meal photographs. The method uses mask R-CNN to identify food categories and homography transformation to correct post-meal images based on meal plate regions. Meal intakes are estimated by comparing pre-meal and post-meal images of measured food volumes. However, this method is limited by the definition of the shape of the food in the container in 3D shapes, whether a spherical cap, cone, or cuboid [29].

Other studies connected to artificial intelligence are also utilized for measuring food intake [30]. A smartphone-based system named FoodIntech uses AI to recognize and calculate food leftovers without human intervention. The system uses images of a QR code on the meal tray to detect and recognize food items. Testing showed the method to be reliable, with excellent results for 39% of dishes and good results for 19%. Implementing this method can improve dish recognition and can be expanded with more photos to include new dishes and foods. This study employs deep learning techniques for the segmentation of areas. However, the calculation of food waste is based on pixel-wise resolution, which presents a limitation. The accuracy of the pixel-wise calculation heavily relies on the quality of the segmentation of food areas.

The hospital can utilize various datasets based on its specific requirements and different methodological approaches to address this issue. The primary focus of this research is on analyzing remaining food from 2D images, enabling the predictive learning model to accurately predict leftovers and recognize different types of food simultaneously. We encountered challenges when searching for a comparable dataset that includes information on food consumption before and after specific weight measurements and leftovers, particularly within a hospital environment. A paper closely related to our research has been previously published by [11]. However, the dataset in the study mainly focuses on liquid foods, such as 432 servings of rice, 72 servings of fermented milk, and 72 servings of peach juice. Unfortunately, this dataset is not available for public use, preventing us from utilizing it for future research and development.

2 Dataset

This chapter describes the details of the dataset we have constructed. The dataset can be accessed at the following link: https://data.mendeley.com/datasets/cchsk79jkt/1.

2.1 Data acquisition

Images were taken at a hospital in Malang, Indonesia, using a Nikon D3200 DSLR camera. The focus of the study was a single food item with solid and liquid components. Solid and liquid foods were served on plates and in bowls, respectively. White dishes were used throughout. To photograph the food, we used a camera, tripod, external flash, table, wooden placemat, reflector, and sheet of paper as illustrated in Fig 1. The food was presented on a 78×78cm wooden placemat with a black and white checkerboard pattern, with each grid measuring 2×2cm. The placemat was placed on a table measuring 80 cm in length, 80 cm in width, and 75.5 cm in height to ensure optimal and consistent plate or bowl placement during data collection. The camera was set as follows: F = 4.5, ISO 100, exposure = -2, manual flash mode with REPEAT 1/128, wide camera range, AF-2 focus mode, and positioned at a 45-degree angle to the diagonal meeting point of a placemat. The camera was then mounted on a tripod at maximum height and positioned 53.5 cm from the diagonal meeting point of a placemat. We used a white cloth reflector stretched on three sides of the placemat opposite the camera to improve the image quality. The data consisted of images of meals taken before and after consumption and the difference in weight obtained by physical measurements. The weight was measured by weighing the plate containing food using a digital scale.

2.2 Dataset annotation

The dataset comprises 524 image pairs before and after consumption. For each image pair, the following information is annotated as shown in Table 1.

thumbnail
Table 1. The detail of the dataset showing the ID, food name, the number of example pairs in total and each food leftover level given by a human observer, where the leftover level is ranging from 1 (Not consumed at all) to 7 (zero remaining).

https://doi.org/10.1371/journal.pone.0320426.t001

  1. id The IDs for the food category.
  2. name The name of the food.
  3. weight_before The weight of meal before eaten (gram as unit measurement).
  4. weight_after The weight of meal after eaten (gram as unit measurement).
  5. visual observation of leftover Level of leftover given by a trained observer or skilled trainer based on visual estimation ranging from 1 (Not consumed at all) to 7 (zero remaining). Samples for each level are presented in Fig 2.
thumbnail
Fig 2. Example of pictures taken before and after eating based on seven different levels of leftovers.

https://doi.org/10.1371/journal.pone.0320426.g002

2.3 Dataset analysis

Fig 3 and Table 2 show the correlation between the leftover level (human observation) and the weight eaten (normalized from 0 to 1). As seen, a strong correlation was found between the two data; namely, the Pearson correlation r between them is 0.95. It is also found that the standard deviations of the weights in levels 1 and 7 as described in Fig 2 are relatively smaller than those in levels 2 to 6. These statistics suggest a larger difference in judgments when the observer predicted at the intermediate levels.

thumbnail
Fig 3. The relation of leftover levels of human visual observation (x-axis) and weight (y-axis), where the Peason’s correlation r = 0.95.

https://doi.org/10.1371/journal.pone.0320426.g003

thumbnail
Table 2. The relation between leftover levels and weight.

https://doi.org/10.1371/journal.pone.0320426.t002

3. Method

3.1 Network architectures

We designed four end-to-end convolutional neural networks to infer the amount of food consumed from two images (before and after eating). Fig 4 shows the network structures. In all architectures, image features are extracted by feature extraction networks, concatenated, and transferred to fully connected (FC) network(s).

thumbnail
Fig 4. Network architectures.

(a) Single task (ST) model, (b) Multit-ask (MT) model, (c) Single task model with Ground Truth food category (ST-GT), and (d) Multi-task model with inter-task connection (MT-IC). The feature extraction layers are pre-trained CNN, including VGG11, VGG13, VGG16, VGG19, ResNet50, ResNet101, or ResNet152. FC1, FC2, FC3, FC4, and FC1+ are fully connected layers whose input and output dimensions are (4096, 1024), (1025, 512), (512, 1), (512, 50), and (4096+34, 1024), respectively. ylo and ycat are the normalized leftover level (0.0 - 1.0) and food category, and labelcat in the model (c) is the label of the ground-truth food category.

https://doi.org/10.1371/journal.pone.0320426.g004

The single-task (ST) model is a naive regression model for leftovers, and the multi-task (MT) model outputs the amount of leftovers and the food category simultaneously. The single-task model with ground truth (ST-GT) is a model assumed to be used when the food category is given. Namely, the ground truth food category is combined to the first fully connected layer. The multi-task model with inter-task connection (MT-IC) is designed to provide feedback on the inference result of the food category and is explicitly used for leftover estimation. A model with food classification as ground truth was also incorporated, designated as the MT model with Food Category Ground Truth Feedback (MT-GT). This model is employed for the general evaluation of performance in comparison to the MT-IC model. Fundamentally, food classification in the MT-GT model is determined based on the provided label, which is expected to result in a lower error rate compared to the MT-IC model.

3.2 Loss function and training

Regarding the leftover value, the networks are trained to output the leftover value normalized from 0.0 to 1.0. Assuming the output of the network as ylo, the loss function Llo is given as in Equation 1,

(1)

Regarding the inference of food category, the loss function Lcat is given as the binary cross entropy, namely, as stated in Equation 2

(2)

where ycat and labelcat are the inference result and the label of the food category.

In training MT models (MT and MT-IC), both losses are linearly combined as Equation 3:

(3)

In the implementation, we set to 0.9. The models are optimized using the Adam optimizer with lr set to 0.0001. Early stopping was used to stop training after the first 20 consecutive epochs if performance did not improve. The primary goal of early stopping was to monitor the model’s performance on a separate validation dataset that was different from the training dataset. The data augmentation used included random horizontal flip, random vertical flip, random rotation, random padding, random Gaussian blur, random sharpness adjustment, and random contrast with a probability of 1/7.

4 Results

The data were divided into three sets: training, validation, and test sets. We used 7/10 for training, 2/10 for validation, and 1/10 for testing. Each model was tested in 10-fold cross-validation and took their average.

4.1 Leftover level prediction

Table 3 and Fig 5 shows the mean absolute error (MAE) of leftover prediction of all models (ST, ST-GT, MT, and MT-IC) with different feature extraction networks (ResNet50, Resnet101, ResNet152, VGG11, VGG13, VGG16, and VGG19). For the reference, the MAE of human visual observation (Fig 2) was computed assuming the visual leftover level 1,...,7 as 1/7, 2/7, 3/7, ..., 1.0 and computed the MAE to the normalized weight values. Overall, the ResNet group outperformed the VGG group in MAE and classification accuracy. Regarding the network architecture, MT model outperformed others and human visual observation, namely, MAE of human observation is 0.0926, while the MT model, MT-GT and MT-IC with ResNet101 are 0.0801, 0.07826 and 0.0816, respectively.

thumbnail
Fig 5. Result of food level estimation (mean absolute error (MAE), smaller is better).

https://doi.org/10.1371/journal.pone.0320426.g005

thumbnail
Table 3. Result of leftover level estimation (MAE (S.D)).

https://doi.org/10.1371/journal.pone.0320426.t003

Fig 6 shows the MAE of each food category. This allows the tendency of the food category results towards the food leftover prediction results to be determined. Twenty-two food categories, representing 34 categories overall, demonstrate that the MT model outperformed human observation. Meanwhile, the MT-GT and MT-IC models are better in 20 food categories compared to humans. This indicates that the proposed MT model can accurately predict the majority of food leftover levels.

thumbnail
Fig 6. Class-wise MAE (smaller is better).

x-axis shows the food ID.

https://doi.org/10.1371/journal.pone.0320426.g006

4.2 Food category classification

Table 4 and Fig 7 show the accuracy of food classification. Similar to the leftover level prediction, ResNet groups outperformed VGG groups, namely, the accuracy reached 92.56% when ResNet152 were used to MT and MT-IC.

thumbnail
Fig 7. Result of food classification accuracy (higher is better).

https://doi.org/10.1371/journal.pone.0320426.g007

5 Discussion

In general, the proposed method, comprising the ST, ST-GT, MT, and MT-IC Models utilizing ResNet101, exhibits a smaller error than a human observer’s visual estimation. The incorporation of weight information derived from food classification automatically from the fully connected layer (MT-IC Model) has also been observed to enhance the precision of estimates, as evidenced by a reduction in standard deviation values compared to the MT Model.

Proposed v.s Human observation. The results of the analysis were subjected to further investigation through the use of a visual explanation, which elucidates the significance of feature extraction. The results of explainable visualization are represented in GradCAM [31]. In most of the cases, the proposed algorithm grasped the features of the food. However, in instances, it could not extract features and observed wrong portions, such as empty regions when human observers were superior, as illustrated in Fig 8.

thumbnail
Fig 8. Visualization of leftover prediction in MT Model vs Visual Estimation by Human Observer.

https://doi.org/10.1371/journal.pone.0320426.g008

Methods comparison. Incorporating food type information into the regression model to predict leftovers can improve the effectiveness of the regression procedure. This is evidenced by Fig 9, which illustrates that GradCAM with the MT-IC Model (see Fig 9(b)) can effectively delineate the object. In contrast, the GradCAM feature maps in the MT Model (see Fig 9(a)) also cover plates and tables in the before image, which impairs the success of feature extraction. This also indicates that, in the presence of such conditions, the MAE of the MT Model is greater than the MT-IC Model. The impact of incorporating food type information from the food classification task into the leftover prediction regression task is also evident in the GradCAM results. These results are corroborated by the MAE value, which indicates that GradCAM visualizations effectively identify objects having a lower MAE than those that are unable to perform feature extraction correctly.

thumbnail
Fig 9. GradCAM of leftover prediction in MT Model and MT-IC Model of Tahu Bulat (Round Tofu)).

https://doi.org/10.1371/journal.pone.0320426.g009

Food Classification. Overall, the accuracy of food classification based on experimental findings in multi-task learning is notably high, exceeding 90%. In this instance, MT-IC has a competitive edge over MT by incorporating food classification data into the regression function to predict leftovers. Fig 10 demonstrates that errors in food classification can occur when items belonging to various categories share resemblances in terms of both color and shape.

Adding class information into regression also has positive impacts on classification tasks, including feature extraction. In Fig 11, for food type 12, that is Ayam Bumbu Laos (Galangal Seasoned Chicken), the feature map of MT-IC Model matches the shape of the object, as well as MT-Model. In that model, the accuracy of the MT-IC Model is 92.86% while the MT Model achieves 71.43%. Performance on food classification is somehow affected by the number of data. The correlations between accuracy and the number of data for the MT Model and MT-IC Model are 0.511 and 0.480, respectively. The correlations also show that the MT-IC Model has a lower dependence on data size compared to the MT Model. On the other hand, the number of data has less impact on food estimation results, indicated by the correlation of 0.061 and 0.163 for the MT Model and MT-IC Model, respectively.

thumbnail
Fig 11. Impact in Food Classification Task: GradCAM of leftover prediction in MT Model, and MT-IC Model of 12.Ayam Bumbu Laos (Galangal Seasoned Chicken).

https://doi.org/10.1371/journal.pone.0320426.g011

6 Limitations

The proposed method is susceptible to error when predicting leftovers that contain excessive oil or a minimal amount of sauce (see Fig 12). The predictions of human observers indicate that the food has been completely consumed (100% consumed). In contrast, the multi-task learning models (MT Model and MT-IC Model) indicate that there is still food remaining on the plate. This is evidenced by GradCAM failing to generate a heatmap in the leftover prediction task, indicating that no features are extracted effectively to predict leftovers. Despite the failure of the leftover task to predict leftovers, the food recognition task demonstrates the capacity to identify the type of food accurately.

thumbnail
Fig 12. Failure cases (too oily dishes).

While humans are observed to be fully consumed, the proposed methods detect the source (oil) as the food of leftover.)

https://doi.org/10.1371/journal.pone.0320426.g012

In instances involving images of rice and rice porridge, human predictions typically outperform computer algorithms due to the difficulties in accurately extracting features from the objects. The generated heatmap does not emphasize the object itself but focuses on the surrounding areas, including post-consumption changes undetectable by GradCAM (see Fig 13). This discrepancy is largely attributed to the similarity in color between the background and the object, thus complicating the analysis process.

thumbnail
Fig 13. Failure case of extracting feature in rice (above) and rice porridge (bottom).

https://doi.org/10.1371/journal.pone.0320426.g013

In addition, the large number of sample data has an impact on the performance of the proposed model. Ideally, a larger amount of data will lead to more accurate predictions than a smaller amount of data. However, in this study, the sample data used was imbalanced in several food categories, which resulted in inconsistencies in the number of samples. In practice, this phenomenon occurs in hospital settings where patients may receive different types of menus or foods based on their individual dietary needs. The quantity of food in certain categories may be limited due to the patient’s diet. While this research addresses the issue, further enhancements to the model or performance of the proposed architecture can be achieved by increasing the number of samples through image sample generation techniques. However, the concern is not only for generating the images but also the information information attached to this data, such as information on the weight of food before and after consumption.

7 Conclusion

This paper introduces a novel dataset featuring images captured before and after food consumption. The dataset is augmented with subjective evaluations of the leftover food by an observer and objective weight measurements using digital scales commonly found in hospitals. The research findings illustrate the capability of utilizing the AI approach that uses deep learning techniques to predict the amount of leftover food both in subjective and objective evaluation. Overall, the MT Learning approach is superior to human visualization estimation for predicting leftover amounts. We propose two multi-task learning models: the MT and the MT-IC models. The optimal results were achieved using ResNet101, with MAE of 0.0801 for the leftover task and 90.44% for the food classification accuracy task in the MT Model and 0.0817 for the leftover task and 92.56% for the food classification accuracy task in the MT-IC Model, respectively. In general, the MT Model is more effective than the ST Model. The MT-IC Model is a model in which the ground truth food category information is automatically extracted, thereby enabling the MT-IC Model to be automatically fed from the food recognition task. The MT-IC Model is more precise than the MT Model, as evidenced by the smaller standard deviation value observed in the MT-IC Model. In addition, MT-IC is also effective for data sets with a limited number of samples.

This paper is not without its shortcomings. Firstly, The presence of oil or sauce in image data disrupts the detection of objects that are then estimated. This can lead to discrepancies between the assessment made by humans and that made by the proposed model, with the latter classifying the objects as food waste. Secondly, feature extraction remains an ongoing challenge when dealing with objects and backgrounds of the same color. Lastly, the data is imbalanced across food categories and levels of leftovers, which negatively impacts the accuracy of leftover predictions.

In future studies, it will be valuable to enlarge the dataset and investigate multiple potential applications, including food recognition, food image segmentation, food image retrieval, food image generation, and other techniques that can precisely forecast leftover foods. Further research is required in the field of nutritional analysis in hospitals using this dataset as an alternative for decision support. Additional data pertaining to the composition of food ingredients are necessary to predict the nutritional needs of patients.

Acknowledgments

We thank the students of the Faculty of Health Sciences at Brawijaya University in Malang, Indonesia, for assisting in data collection at the hospital during the research process. We thank Brawijaya University and Okayama University for their support in creating the baseline performance and facilitating the further development of this research. During the discussion and analysis, the authors would like to thank Sigit Adinugroho from the Informatics Department at Brawijaya University, Indonesia who has supportive comments.

References

  1. 1. Sumardilah DS. Analisis Sisa Makanan Pasien Rawat Inap Rumah Sakit (Food Leftovers Analysis of Hospital Inpatients). Jurnal Kesehatan. 2022;13(1):101–9.
  2. 2. Simzari K, Vahabzadeh D, Saeidlou S, Khoshbin S, Bektas Y. Food intake plate waste and its association with malnutrition in hospitalized patients. 2017.
  3. 3. Zhang Y, Hou F, Cheng J, Chen G, Wang L, Jiang X, et al. The association between leftover food consumption and depression among older adults: Findings from a cross-sectional study. J Affect Disord. 2022;307:157–62. pmid:35390351
  4. 4. Mustafa AW, Amra N. Overview of usual food leftovers in inpatients at the Jailolo regional general hospital, West Halmahera Regency. Int J Sci Res Manag. 2021;09:(12):08–12.
  5. 5. Collins J, Porter J. Quantifying waste and its costs in hospital foodservices. Nutr Diet. 2023;80(2):192–200. pmid:36690908
  6. 6. Fadilla C, Rachmah Q, Juwariyah J. Description of Food Leftovers for Inpatient Hospitals in Sidoarjo District Hospital. (Gambaran Sisa Makanan Pasien Rawat Inap RSUD Kabupaten Sidoarjo). Amerta Nutrit. 2020;4(3):198.
  7. 7. Setianto B, Adriansyah AA, Hanik U, Bistara DN. The Correlation Between Patient Satisfaction Regarding Nutrition Service And Hospital Length Of Stay With Food Waste In Covid–19 Patients. JHS. 2021;14(02):147–52.
  8. 8. Pramandari HW, Astawan M, Palupi NS, et al. The role of cook-chill and cook-freeze methods as indicators of quality of nutrition services in hospital. J Med Health Stud. 2023;4(2):86–100.
  9. 9. Wirasamadi NLP, Adhi KT, Weta IW. Analysis of inpatients food leftover at Sanglah hospital Bali province. Publ Health Prevent Med Archiv. 2015;3(1):72–7.
  10. 10. Parent M, Niezgoda H, Keller HH, Chambers LW, Daly S. Comparison of visual estimation methods for regular and modified textures: real-time vs digital imaging. J Acad Nutr Diet. 2012;112(10):1636–41. pmid:23017574
  11. 11. Tagi M, Tajiri M, Hamada Y, Wakata Y, Shan X, Ozaki K, et al. Accuracy of an Artificial Intelligence-Based Model for Estimating Leftover Liquid Food in Hospitals: Validation Study. JMIR Form Res. 2022;6(5):e35991. pmid:35536638
  12. 12. Sari Y, Adinugroho S, Maligan J, Candra E, Utaminingrum F, Nur’Aini N. Leftovers food recognition using deep neural network and regression approach for objective visual analysis estimation. In: 2021 4th International Conference of Computer and Informatics Engineering (IC2IE). IEEE. 2021. 24–9.
  13. 13. Sari Y, Maligan J, Prakoso A. Improving the elementary leftover food estimation algorithm by using clustering image segmentation in nutrition intake problem. In: 2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM). IEEE. 2020. 435–9.
  14. 14. Sari YA, Saputra VW, Agustina A, Wani YA, Bihanda YG. Comparison of image thresholding and clustering segmentation methods for understanding nutritional content of food images. In: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology. ACM. 2020. 124–9. https://doi.org/10.1145/3427423.3427441
  15. 15. Mansouri M, Benabdellah Chaouni S, Jai Andaloussi S, Ouchetto O. Deep Learning for Food Image Recognition and Nutrition Analysis Towards Chronic Diseases Monitoring: A Systematic Review. SN Comput Sci. 2023;4(5).
  16. 16. Simonyan K. Very deep convolutional networks for large-scale image recognition. arXiv preprint. 2014. https://arxiv.org/abs/1409.1556
  17. 17. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–8.
  18. 18. Iqbal S, Qureshi AN, Li J, Choudhry IA, Mahmood T. Dynamic learning for imbalanced data in learning chest X-ray and CT images. Heliyon. 2023;9(6):e16807. pmid:37313141
  19. 19. Aizawa K, Maruyama Y, Li H, Morikawa C. Food Balance Estimation by Using Personal Dietary Tendencies in a Multimedia Food Log. IEEE Trans Multimedia. 2013;15(8):2176–85.
  20. 20. Aizawa K, Maeda K, Ogawa M, Sato Y, Kasamatsu M, Waki K, et al. Comparative Study of the Routine Daily Usability of FoodLog: A Smartphone-based Food Recording Tool Assisted by Image Retrieval. J Diabetes Sci Technol. 2014;8(2):203–8. pmid:24876568
  21. 21. Kitamura K, de Silva C, Yamasaki T, Aizawa K. Image processing based approach to food balance analysis for personal food logging. 2010. p. 625–30.
  22. 22. Amato G, Bolettieri P, Monteiro de Lira V, Muntean CI, Perego R, Renso C. Social media image recognition for food trend analysis. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017. p. 1333–6. https://doi.org/10.1145/3077136.3084142
  23. 23. He Y, Xu C, Khanna N, Boushey CJ, Delp EJ. Food image analysis: segmentation, identification and weight estimation. In: 2013 IEEE International Conference on Multimedia and Expo (ICME). 2013. p. 1–6. https://doi.org/10.1109/ICME.2013.6607548 pmid:28572873
  24. 24. Kitamura K, Yamasaki T, Aizawa K. Food log by analyzing food images. In: Proceedings of the 16th ACM international conference on Multimedia. 2008. p. 999–1000. https://doi.org/10.1145/1459359.1459548
  25. 25. Kitamura K, Yamasaki T, Aizawa K. Foodlog: Capture, analysis and retrieval of personal food images via web. In: Proceedings of the ACM multimedia 2009 Workshop on Multimedia for Cooking and Eating Activities; 2009. p. 23–30.
  26. 26. Sari YA, Dewi RK, Maligan JM, Ananta AS, Adinugroho S. Automatic food leftover estimation in tray box using image segmentation. In: 2019 International Conference on Sustainable Information Engineering and Technology (SIET). 2019. p. 212–6.
  27. 27. Adinugroho S, Sari YA, Maligan JM, Sari K, Bihanda YG, Nuraini N, et al. Nutrition estimation of leftover using improved food image segmentation and contour based calculation algorithm. J Environ Eng Sustain Technol. 2022;9(01):30–40.
  28. 28. Lubura J, Pezo L, Sandu MA, Voronova V, Donsì F, Šic Žlabur J, et al. Food Recognition and Food Waste Estimation Using Convolutional Neural Network. Electronics. 2022;11(22):3746.
  29. 29. Kim J, Lee D, Kwon S. Food Classification and Meal Intake Amount Estimation through Deep Learning. Applied Sciences. 2023;13(9):5742.
  30. 30. Van Wymelbeke-Delannoy V, Juhel C, Bole H, Sow A-K, Guyot C, Belbaghdadi F, et al. A Cross-Sectional Reproducibility Study of a Standard Camera Sensor Using Artificial Intelligence to Assess Food Items: The FoodIntech Project. Nutrients. 2022;14(1):221. pmid:35011096
  31. 31. Selvaraju R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 618–26.