Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Towards automatic farrowing monitoring—A Noisy Student approach for improving detection performance of newborn piglets

  • Martin Wutke ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    mwutke@tierzucht.uni-kiel.de

    Affiliations Institute of Animal Breeding and Husbandry, Faculty of Agricultural and Nutritional Sciences, University of Kiel, Kiel, Germany, Faculty of Agriculture, South Westphalia University of Applied Sciences, Soest, Germany

  • Clara Lensches,

    Roles Data curation, Investigation

    Affiliation Department of Animal Sciences, Georg-August University, Göttingen, Germany

  • Ulrich Hartmann,

    Roles Investigation

    Affiliation Chamber of Agriculture Lower Saxony, Division Agriculture, Oldenburg, Germany

  • Imke Traulsen

    Roles Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing – review & editing

    Affiliations Institute of Animal Breeding and Husbandry, Faculty of Agricultural and Nutritional Sciences, University of Kiel, Kiel, Germany, Department of Animal Sciences, Georg-August University, Göttingen, Germany

Abstract

Nowadays, video monitoring of farrowing and automatic video evaluation using Deep Learning have become increasingly important in farm animal science research and open up new possibilities for addressing specific research questions like the determination of husbandry relevant indicators. A robust detection performance of newborn piglets is essential for reliably monitoring the farrowing process and to access important information about the welfare status of the sow and piglets. Although object detection algorithms are increasingly being used in various scenarios in the field of livestock farming, their usability for detecting newborn piglets has so far been limited. Challenges such as frequent animal occlusions, high overlapping rates or strong heterogeneous animal postures increase the complexity and place new demands on the detection model. Typically, new data is manually annotated to improve model performance, but the annotation effort is expensive and time-consuming. To address this problem, we propose a Noisy Student approach to automatically generate annotation information and train an improved piglet detection model. By using a teacher-student model relationship we transform the image structure and generate pseudo-labels for the object classes piglet and tail. As a result, we improve the initial detection performance of the teacher model from 0.561, 0.838, 0.672 to 0.901, 0.944, 0.922 for the performance metrics Recall, Precision and F1-score, respectively. The results of this study can be used in two ways. Firstly, the results contribute directly to the improvement of piglet detection in the context of birth monitoring systems and the evaluation of the farrowing progress. Secondly, the approach presented can be transferred to other research questions and species, thereby reducing the problem of cost-intensive annotation processes and increase training efficiency. In addition, we provide a unique dataset for the detection and evaluation of newborn piglets and sow body parts to support researchers in the task of monitoring the farrowing process.

Section 1: Introduction

Monitoring farrowing in livestock farming is of crucial importance to gain deeper insights into the physiological nature and complex behavioral characteristics of the birth process and to access critical information like the onset of farrowing, the specific time intervals between consecutive birth events or the overall farrowing duration [1]. As the intensity of pig production systems has increased over the years, new challenges in farrowing and concerns about reduced animal welfare have emerged [2, 3]. Thus, space limitations can lead to an increased stress level and a reduced expression of natural behaviors [4], which is crucial for an adequate welfare assessment [5, 6]. In addition, the challenge of adequately monitoring the status of sows increases as the animal-to-staff ratio increases as well [7], making the early detection of problems during farrowing more difficult. As both direct as well as manual video-based observation require an extensive amount of monetary and time-related resources, innovative methods from the field of computer vision (CV) and deep learning (DL) offer the potential to automatically monitor the animals and to provide important information about their welfare status and the overall farrowing process. Here, previous studies have shown promising results in the course of automatically monitoring different production stages within pig farming systems for example to analyze the behavior of prepartum sows [8, 9], to access essential welfare indicators like the development of the animal’s body mass [1014] or the detection of individual pigs [15].

Although various studies have successfully shown the scale of DL and CV for the extraction of husbandry-related information, most CV approaches have been designed and conducted based on a large number of manually annotated images. For example, Ho et al. (2021) implemented a CV-based framework for monitoring the lactation frequency of sows by detecting and tracking newborn piglets [16]. In the course of their study the authors manually annotated 1611 images containing a varying number of piglets. Furthermore, Liu et al. (2023) increased the usability of a regular bounding box detection by determining the animals orientation using rotated bounding boxes for which 38699 pig instances within 3123 images had to be manually annotated [17]. Although it is well known that the underlying annotated data forms the basis and backbone of every supervised machine learning algorithm, the manual annotation process limits the power and generalizability of these methods for future research and the transferability into practice.

To achieve a high model performance, sufficient high-quality labeled data samples are required [18]. Particularly, in situations such as the application of early warning systems, where accurate object detection results are crucial for avoiding false alarms, a high data quality is necessary. However, the corresponding manual annotation process is cumbersome, error prone and cost intensive [19, 20]. Despite the promising results of recent DL-based object detection studies, the detection of dense target objects like newborn piglets is challenging for many state-of-the-art detection frameworks [21], where the number, shape, position or posture of the piglets can vary significantly. In addition, free farrowing systems are increasingly being used to enhance animal welfare, but these systems increase the complexity to automatically monitor the entire pen due to the potentially higher risk of piglet occlusion and a larger variation of possible sow postures.

In recent years, the importance of object detection in livestock husbandry has increased significantly. Here, object detection refers to the task of locating multiple objects within an image, each belonging to a distinct category of interest [22]. Especially in the scientific community, object detection is more and more applied to support researchers in tackling various research questions. For example, Qiao et al. (2023) use object detection to determine individual body parts of cattle, which, in the authors’ view, would improve automatic phenotyping or enhance the evaluation process of specific behavioral situations [23]. Küster et al. (2021) also use object detection in the form of a bounding box annotation to quantify and evaluate the frequency of interactions between sows and their respective pen facilities [9]. Another promising area of application that is currently being actively researched is the use of object tracking algorithms to evaluate animal behavior over time. Current state-of-the-art tracking applications incorporate convolutional neural networks (CNNs) and a tracking by detection approach where object detection provides the basic information [2428]. Although the aforementioned studies show the potential of object detection algorithms to address specific problem scenarios, the applicability of most methods within a commercial setting is still challenging. On the one hand, most approaches yield satisfying results within the boundaries of their respective study design, but their performances decreases significantly under new conditions due to a lack of model generalizability. On the other hand, the rising cost in human labor [29] and the complex nature of animal behavior increases the need for sufficient annotated data samples, making manual model adjustments inefficient, cumbersome and cost intensive.

One way of mitigating the labor intensive process of creating large-scaled datasets is to deploy a self-training method like the Noisy Student Training algorithm which was originally proposed by Xie et al. in 2020 for image classification [30]. The general idea of the Noisy Student method is that a supervised model is trained as an initial teacher model entirely on manually labeled data samples and is then applied on new, unseen samples to create pseudo-labels [31]. After that a second model, referred to as the student model, is then trained using the large-scaled pseudo-annotations as well as the the manual annotations. During the student model training additional noise, such as random data augmentation, stochastic depth and random dropout is introduced to increase the generalization performance of the student compared to the teacher [29].

Despite the reported significant performance improvements by Xie et al. (2020), the Noisy Student concept has so far found very little recognition in agricultural research. As one of the few studies utilizing this method Duong et al. (2024) compared the Noisy Student concept for the multi-class classification of weed images to other transfer learning strategies [32]. Emphasizing the complexity of real-world scenarios, the Noisy Student method achieved the highest classification performance with an accuracy score of 99.26%. In addition, Keh (2020) used the Noisy Student method to compare different CNN architectures like VGG16, ResNet101, and DenseNet161 to classify plant pathogens in order to address the problem of lost crops due to pests and other pathogens [31]. As a result, they identified the EfficientNet structure to be the most efficient model for the given task and reported that applying the Noisy Student approach significantly improved the robustness and convergence rate of the model during training.

As Liu et al. (2021) point out, a large share of previous studies have focused on transfer learning methods for image classification tasks, while object detection tasks are underrepresented due to a higher complexity of instance annotation [33]. In this work we aim to address this limitation and additionally leverage the Noisy Student method to improve the detection performance of newborn piglets. More specifically, the contribution of this work is threefold. First, we extent the scope of the Noisy Student method by introducing a novel image transformation based on the pseudo-labels of the teacher model for subsequently training the student model. This transformation allows us to reduce the possibility space of the detection model and to focus on the target class in the form of newborn piglets. To the best of our knowledge, this is first study combining the self-learning Noisy Student approach with an image transformation by reducing the observable space to a restricted target area. As we demonstrate, this transformation step reduces the influence of complex pen environments and increases the detection efficiency of newborn piglets without the need to monitor the entire farrowing pen. It is therefore possible in subsequent studies to change this target area to any image area in order to focus on different target classes and to address alternative research questions. Second, there are currently hardly any publicly available datasets for newborn piglet detection which limits the power of existing machine learning methods for object detection. To address this limitation, we created a novel manually annotated dataset for the detection of sow body parts and newborn piglets and made this dataset publicly available. Third, automatic piglet detection is an intensively studied field and an important step for the implementation of early warning systems and automatic monitoring solutions. To the best of our knowledge this is the first study in the area of livestock research to leverage and extend the self-learning Noisy Student Training concept for the detection of newborn piglets. With this work, we aim to demonstrate the potential of our approach which can be used to address various research questions within the livestock domain like behavioral analysis, automatic phenotyping or the detection of agonistic anomalies.

This work is structured as follows. Section 2 provides a description of the data used for this analysis and illustrates the methodical foundation of the Noisy Student training rationale as well as the process of the model evaluation applied in this study. The results of the model comparison in detecting newborn piglets between the teacher model, trained on manually annotated images, and the student model, trained on automatically generated pseudo-labels, are presented and discussed in Section 3. Furhermore, a model comparison for different dataset sizes in presented. Section 4 concludes this work and provides an outlook for future research.

Section 2: Materials and methods

Section 2.1: Data acquisition and processing

In this study an extensive video dataset was collected between March 2021 and September 2023 as part of the collaborative project DigiSchwein (funding code: 28DE109G18) at the research farm of the Chamber of Agriculture Lower Saxony in Wehnen, Germany. In addition to other work packages of DigiSchwein that focus, for example, on the prevention of tail biting outbreaks or the analysis of nutrient flows, our aim is the development of a birth monitoring tool to assist farm staff by providing critical information about the health of the animals and the entire farrowing process. In the course of this study parts of the data have been utilized to develop and evaluate the proposed Noisy Student piglet detection model. For data acquisition, a static camera of the type AXIS M3206-LVE (Axis Communications AB, Lund, Sweden) was assembled 3m above the ground of eight farrowing pens. Over a period of 23 batches, each lasting 40 days, each camera recorded the farrowing process with a frame-per-second (FPS) rate of 20 frames and a display resolution of 1920*1080 pixels.

The dataset used for this study is comprised of video sequences containing both situations before farrowing, in which only the sow is visible, as well as situations during and after farrowing where both the sow and newborn piglets are visible. Both day and night recordings with varying lighting conditions have been used. In order to avoid a potential bias of the detection models for different color settings, each video frame has been grayscaled prior to the process of model training and evaluation [34, 35]. Furthermore, to increase computational efficiency, the original image dimension was downscaled to 640*640 pixel and used as the input dimension for the training process. Fig 1 exemplarily shows four video frames at different lighting conditions and a varying number of newborn piglets, which have been manually highlighted by yellow circles.

thumbnail
Fig 1. Example frames of different farrowing pens and a varying intensity of piglet occlusion.

The occlusion can be caused by the pen infrastructure, the sow or piglets. The number of piglets (yellow circles) ranges from one and three piglets in (a) and (b) to twelve and ten piglets in (c) and (d). Frames (a—c) have been captured at daylight, while frame (d) has been taken at night-time.

https://doi.org/10.1371/journal.pone.0310818.g001

As can be seen in Fig 1 monitoring the entire farrowing pen inevitably leads to a high number of occluded piglets, depending on the length of the observation time, the camera’s recording angle, the equipment in the pen, the posture of the sow or the number of piglets. Therefore, the decision to monitor the entire pen or only parts of it depends on the research question and the aim of the study. Since one of the main tasks of the collaborative project DigiSchwein is to access critical farrowing information about newborn piglets like individual birth intervals, a close observation of a restricted birth area within the video frame is desired without the need of monitoring the entire pen, thus reducing the complexity of the scene.

In order to generate the required image structure, a Noisy Student approach is chosen, in which an initial object detection model (hereafter referred to as the teacher model) first detects the head and rear region of the sow and then uses these information to select a distinct birthing area at the rear of the sow. This image area is then cropped and rescaled to a pixel dimension of 640*640 pixel. The subsequent detection models are trained and evaluated on this target image structure. An illustration of the original image and the final target image structure is provided in Fig 2. Here, two example frames from the teacher model training set have been selected and annotated using a bounding box annotation for the object classes head, rear, tail and piglet. A detailed description of the training process for both the teacher and the student model is given in Section 2.2.

thumbnail
Fig 2. Example of the image structure transformation.

The image dataset of the teacher model consists of images with four distinct object classes, namely the head of the sow (green box), the rear region of the sow (green box), the sows tail (yellow box) and newborn piglets (red box). The head and tail information have been used to compute the orientation of the sow (green connection line) and to compute the restricted birth region (blue circle). For the generation of the target image structure of the student model only the restricted birth region has been used to reduce the possibility space to only two object classes, tail and piglet. All subfigures in this illustration are daylight recordings that have been converted to grayscale images for further processing.

https://doi.org/10.1371/journal.pone.0310818.g002

In the course of this study two types of datasets have been employed. To train the teacher model, 1100 images of the whole farrowing pen have been manually labeled using a bounding box annotation for the above mentioned object classes (head, rear, tail and piglet). For the process of image annotation the software LabelMe (Version 5.2.1) was used [36]. In contrast, the training process of the student model is carried out using the target image dataset which consists of 9,800 images, that have been automatically annotated using the Noisy Student approach (Section 2.2). Since the student model is designed to monitor the target area, the possibility space was further reduced to only two object classes, namely tail and piglet. Additionally, we performed simple image augmentations in the form of horizontal and vertical flips for both datasets to increase the dataset size by factor four. To stimulate further research, we made the manually annotated dataset for the teacher model publicly available [37].

Section 2.2: Noisy student training

The concept of Noisy Student training thematically belongs to the area of SSL and the theoretical foundation of a self-training framework was already mentioned in 1965 by Henry Scudder [38]. In recent years, SSL-based object detection has shown great potential due to its straightforward approach and reduced dependency on resource-intensive annotations [3941]. Although Noisy Student training has been extensively used in the field of computational linguistics like speech recognition or analysis (e.g. [4246]), it is nowadays increasingly being used in other disciplines as well. In the area of CV Xie et al. (2020) [30] showed that the addition of noise during the model training has a positive effect on the quality of the pseudo-label generation and the performance of the subsequent student model. Furthermore, using Noisy Student as a self-learning approach, they have been able to improve the classification performance on the well known ImageNet dataset [47] by two percent compared to state-of-the-art models at that time. The Noisy Student algorithm is illustrated in Fig 3.

thumbnail
Fig 3. Illustration of the Noisy Student algorithm proposed by [30].

https://doi.org/10.1371/journal.pone.0310818.g003

In the Noisy Student setting an initial teacher model is first trained on a small set of labeled information and is then applied to generate pseudo-labels for a larger set of unlabeled data [48]. After that, a second model, denoted as the student model, is trained on the pseudo-labeled data and is subjected to noise in the form of dropout and random augmentation. In an iterative manner, the student is then considered as the new teacher to obtain a more robust student model. Finally, this training procedure is repeated until the student’s performance is satisfactory for a given task [49]. Fig 4 illustrates the basic idea of the Noisy Student method applied in this work.

thumbnail
Fig 4. Overview of the Noisy Student concept.

While the teacher model was trained entirely on manually annotated images, we extended the Noisy Student concept and trained the student model only on the new image structure based on the pseudo-labels within the designated target area. All images shown in this figure are from the teacher and student training dataset. All subfigures in this illustration are daylight recordings that have been converted to grayscale images for further processing. This figure is based on [30].

https://doi.org/10.1371/journal.pone.0310818.g004

As can be seen in Figs 2 and 4, we extended the Noisy Student method by introducing a novel image transformation and the usage of two distinct image structures. The teacher model was trained on the original images which covered the whole farrowing pen but were therefore more susceptible to object occlusion and misclassification. In contrast, in order to improve the detection rate of newborn piglets we focused only on the designated target area and generated the pseudo-labels for the student model by only using high confidence detections above the 0.9 confidence threshold. Additionally, as suggested by previous studies [5052] we manually checked the generated pseudo-labels and verified the correctness of the class assignment and bounding box localization. Image instances containing incorrectly labeled information have been removed from the training set of the student model accordingly.

As it was shown by [30] the inclusion of additional noise to the training process of the student model like data augmentation and random dropout can improve the performance and generalization ability of the student. Therefore, we adopted this strategy and additionally used random dropout [53] and mixup augmentation [54], each with a rate of 0.5 only for the student training. The general purpose of data augmentation is to train a model on similar but different instances that are present in the training data [55]. The mixup augmentation was originally proposed for image classification tasks and proved successful in reducing adversarial perturbation in the network structure [54]. The basic idea is to linearly interpolate multiple image samples and their corresponding labels to generate new training instances [56]. As an extension, Zhang et al. (2019) successfully demonstrated the implementation of a mixup augmentation for the task of object detection without adding additional overhead costs during inference [57]. By merging multiple images with their corresponding labels they tested various mixup sampling distributions and reported a final improvement of up to 5% absolute precision compared to state-of-the-art baselines at that time.

Section 2.3: Model training and evaluation

In the course of this study, all detection models have been trained using the YOLOv8-Architecture which has been published by Ultralytics in 2023 [58]. YOLOv8 is a state-of-the-art object detection model which is able to handle the multiscale nature of objects [59]. Since this study aims at leveraging the Noisy Student Training concept for improving detection performance of newborn piglets, the largest available model version (YOLOv8x) was selected. While the first version of YOLO (You Only Look Once) was developed by Redmon et al. in 2016 [60] as an anchor-based method, YOLOv8 follows an anchor-free approach which leads to a reduced number of box predictions and improves processing speed [61, 62].

Both the teacher and the student model have been trained for 300 iterations with a batch size of 16 images per batch and an input image dimension of 640*640. The decision to train the models for 300 training iterations was determined empirically, as no significant improvements were achieved with a higher number of iterations. We followed previous studies and used the Stochastic Gradient Descent optimizer with a learning rate of 0.01 [6366] to train all models. The model training was carried out on a workstation equipped with two Intel Xeon Gold 6230 CPUs, 1024 GB RAM, and a Nvidia Quadro RTX 8000. The models have been implemented using the programming language Python (version 3.9), the deep learning framework Pytorch (version 2.0.1) and Ubuntu 22.04.3 as the operating system. During the training process no signs of overfitting have been detected for both the teacher and the student model.

In order to access the overall detection performance and to be able to adequately compare both the teacher and the student model, we followed previous studies and evaluated each model based on the well-known performance metrics Recall, Precision and F1-score [6770]. Here, we first determined the number of True Positives (TP), False Positives (FP), and False Negatives (FN) over all test images. The evaluation metrics are defined as follows: (1) (2) (3)

For the evaluation process we generated a distinct evaluation dataset comprised of 650 randomly selected images and ground truth annotations. Since the student model is trained on a reduced number of classes, we used the manual annotations of the sows head and rear region to generate the required target image structure. For the teacher model, we only used the detection results within the target region and compared these results with the detection results of the student model (see Fig 2). Table 1 provides an overview of the class distribution of the training and evaluation sets.

thumbnail
Table 1. Distribution of target classes in absolute (percentage) frequencies for the training data of the teacher and student model and for the evaluation dataset.

https://doi.org/10.1371/journal.pone.0310818.t001

Section 3: Results and discussion

In the following section, the results of the detection evaluation are presented and critically discussed on the basis of the methodology described in Section 2. Existing challenges are addressed in detail and design choices are motivated.

To assess the suitability of the proposed Noisy Student approach for improving detection performance of newborn piglets, we were interested in the performance of the student model, trained on pseudo-annotated data samples, against a model trained on manually annotated data. Since the initial teacher model was trained on 1100 human annotated images, we used this model as a proxy for a manual labeling approach and performed the comparison between the final student model and the initial teacher model. For the creation of the evaluation dataset, the two target classes piglet and tail as well as the body parts head and rear have been manually annotated in 650 test images. The evaluation dataset is made publicly available [37].

To ensure an objective assessment, both models were evaluated only on the basis of their detection performance for the piglet and tail object classes in the target area. For the teacher model, only the detection results within the defined birth area were included. The results of the model evaluation are shown in Table 2, where the detection results are presented separately for each target class as well as in an aggregated form, in which both target classes have been consolidated prior to the calculation of the evaluation metrics.

thumbnail
Table 2. Results of the detection evaluation for the teacher and the student model on the evaluation set.

Each model performance is listed both as a total result and differentiated by the target object class.

https://doi.org/10.1371/journal.pone.0310818.t002

As can be seen in Table 2, focusing on the consolidated results both models achieved very good Precision values, indicating that most objects detected by the models have been correctly localized and classified. With a value of 0.944, the student model achieved a value almost ten percent higher than the teacher model with a value of 0.838. In terms of Recall, the difference between the teacher and student model is greater, with the student model achieving a score of 0.901 compared to the teacher model with a score of 0.561. Since the Recall is the proportion of true positive detections over all positive ground truth information, it reflects a detectors ability of detecting an object of interest, without taking into account the correctness of the detection [71]. Since both models have high Precision values, the difference in Recall can be explained by the higher number of undetected objects by the teacher model, which is reflected in the form of higher FN values. Subsequently, given that the F1-score is the harmonic mean of Recall and Precision, the teacher model’s lower Recall is responsible for its F1-score of 0.672, in contrast to the student’s higher F1-score of 0.922.

From a more differentiated perspective, it can be seen that both models show a better performance in detecting the piglet class compared to the tail class. In line with the previous results, the student model with a Recall score of 0.965 for the piglet class exceeds the performance of the teacher model, which achieved a Recall value 0.639. For the tail class the student model achieved a Recall 0.727 whereas the teacher achieved a value of 0.352. Fig 5 provides sample images from the manually generated evaluation set in which the results of the teacher as well as of the student model have been visualized.

thumbnail
Fig 5. Visualized results of the model comparison.

Samples from the evaluation set showing the detection results from the teacher model (red bounding box) and the student model (green bounding box). (a-b): In cases of a good spatial separation between the objects, both detection models achieve satisfying results in detecting the piglet and tail classes. (c-d): In situations of overlaying objects or close proximity the detection performance of the teacher model decreases. For the interpretation of the color information in this illustration, the reader is referred to the web version of this article.

https://doi.org/10.1371/journal.pone.0310818.g005

By visualizing samples of the detection results it can be seen that the low detection performance of the teacher model for both the tail as well as for the piglet classes occurs mainly in situations in which the object to be detected is in close proximity to another target object. The proximity effect is observable in the student model as well, albeit to a lesser extent. However, the problem of close proximity and object overlays is not exclusive for this study and is frequently reported to be one of the main obstacles in object detection studies as well as video object tracking applications [34, 7274]. In cases of good separation both models are able to successfully localize and classify most of the tail objects. Since the motivation for adding the tail class was to reduce the risk of false detection and a potential mixup with the piglet class, it was not the aim of this study to optimize detection performance for this particular class. In contrast, the situation is different when looking at the piglet detection performance. As the correct detection of newborn piglets is essential in the automatic assessment of the farrowing process, a robust model performance is required. Here, it can be seen that in many cases the close proximity of multiple piglets poses a major challenge for the teacher model.

To additionally investigate the potential of the presented extended Noisy Student method to improve the generalization properties, we performed another model comparison with different data set sizes. Here, we have created a subset of fixed size for each model type by randomly selecting an image from the respective training set. The data set sizes 100, 250, 500 and 1000 training images were defined as the object of investigation. A new teacher and student model was then trained for each level of the dataset size and evaluated on the evaluation data set presented above. The results of this second model comparison step are shown in Table 3 and illustrated in Fig 6.

thumbnail
Fig 6. Results of the model comparison with different dataset sizes.

When comparing different dataset sizes, the Noisy Student model (blue line) trained on the transformed target images shows a superior detection performance compared to the teacher model (red line).

https://doi.org/10.1371/journal.pone.0310818.g006

thumbnail
Table 3. Results of the model comparison with different dataset sizes.

https://doi.org/10.1371/journal.pone.0310818.t003

As Table 3 and Fig 6 show, the performance of the student model outperforms the teacher model for each level of data set size and each of the three evaluation metrics Recall, Precision and F1-score. This indicates the superior detection and generalization ability of the student model and demonstrates the positive effect of the proposed image transformation.

As this study proposes a complexity reduction of the image scene by transforming the original image structure for training the subsequent student model, the corresponding design choices have been motivated by the increasing need for monitoring free farrowing environments and precisely monitoring the birth process. With enhanced welfare requirements on the one hand and the current trend of increasing livestock intensity on the other, free farrowing systems are gaining in importance but animal observation pose greater challenges. Although earlier studies adopted a strategy of automatically monitoring the entire pen area [16, 75, 76], this approach leads to increased computationally complexity due to the problem of blind spots and occluded piglets. The problem of object occlusion is currently one of the biggest challenges in CV systems and can influence detection results and hinder the assessment of the farrowing status [7779]. Therefore, to reduce the need to monitor the entire pen and to increase the potential generalization ability of the student model to multiple pen structures, piglet detection was focused on the selected target area, which appears to be beneficial for the detection performance of the student.

In recent years, the method of object detection has become increasingly popular and is nowadays used for a variety of tasks in academic research. However, current approaches in the agricultural context mainly focus on fully-supervised learning methods, where data annotation is challenging and time consuming [80]. Although self-learning methods are also more frequently used, they are still underrepresented. When compared with other self-learning studies for object detection tasks our findings are in line with previous results, reporting an increased performance of the student model. For example, Zhang et al. (2023) focused on the knowledge transfer from the teacher to the student using knowledge distillation methods and reported an enhanced detection performance of the student as it learns to emphasize important pixels [81]. Similar findings are reported by Zhu et al. (2023) who applied a teacher-student relationship for the VisDrone [82] dataset which is mainly used for small object detection tasks [83]. In line with our results, the studies stated above report an improved detection rate of the student model for smaller and dense objects, which is beneficial for the detection and behavioral analysis of newborn piglets, as they are usually small in size and tend to appear in close proximity to each other.

The aim of this study was to improve the detection performance of newborn piglets by proposing an novel extension of the Noisy Student self-learning framework combined with the usage of a distinct key focus in the form of the selected target area. Furthermore, we wanted to address the limitation of the cumbersome annotation process of supervised object detection methods in the field of animal research. To the best of our knowledge, this is the first application of the Noisy Student methodology in the area of livestock science. We demonstrate the improved detection performance of the student model which is not exclusively limited to the use case presented in this work. The selection of the target area enables the methodological transfer to other species, application disciplines and research questions. In the context of farrowing, the focus of the framework could be shifted so that the student is trained on a different target area selection, for example, to gain deeper insights into lactation behavior and sow-piglet interaction. Furthermore, in a broader research setting, the student-teacher relationship could be used to first detect larger objects like whole animal instances and then analyze key body parts or monitor specific areas of interest to address current research questions such as agonistic interactions or automatic phenotyping.

Considering the background of increasing animal welfare regulations and more open farrowing systems, it can be assumed that the number of unfixed sows and piglets to be monitored will increase in the long term. In Germany in particular, a maximum restraining period of five days is prescribed by law from 2036 [84]. A larger number of free-ranging animals generally increases the complexity and thus the required detection performance of the model used. Most established methodological approaches for object detection usually address such challenges by manually re-annotating more heterogeneous image data. The approach presented here contributes to this issue by decreasing the need to manually annotate new data samples by specifically defining target regions of interest that could provide automatically generated annotation information.

Section 4: Conclusion

The presented study utilizes the innovative concept of Noisy Student training and leverages its characteristics to propose a novel approach for improving the early detection of newborn piglets within a free farrowing system. To the best of our knowledge this is the first application of the Noisy Student concept within the livestock domain. By using the self-learning paradigm, we were able to combine the distinct teacher-student model relationship with an image transformation technique to reduce the influence of the environment-specific pen infrastructure and the risk of object occlusion, thus demonstrating the applicability of our approach in a multi-pen environment. As a result, we could increase the detection performance from an initial Recall rate of 0.561 to 0.901 while at the same time improving the Precision from 0.838 to 0.944.

The findings of this study not only contribute to the current challenges of appropriately capturing critical information during farrowing, but can also be transferred to other species and application areas in livestock management. Furthermore, the utilization of the Noisy Student approach reduces the time and monetary costs inherent to manual image labeling, which usually acts as a limiting factor and weakens the potential of DL approaches.

Acknowledgments

The authors wish to thank the project partners and the farm staff of the experimental research farm of the Chamber of Agriculture Lower Saxony in Bad Zwischenahn—Wehnen, Germany for their support and providing animals and housing for the project.

References

  1. 1. Vargovic L, Athorn RZ, Hermesch S, Bunter KL. Improving sow welfare and outcomes in the farrowing house by identifying early indicators from pre-farrowing assessment. Journal of Animal Science. 2022;100(11):skac294. pmid:36062853
  2. 2. van Erp-van der Kooij E, de Graaf LF, de Kruijff DA, Pellegrom D, de Rooij R, Welters NI, et al. Using Sound Location to Monitor Farrowing in Sows. Animals. 2023;13(22):3538. pmid:38003155
  3. 3. Oliviero C, Pastell M, Heinonen M, Heikkonen J, Valros A, Ahokas J, et al. Using movement sensors to detect the onset of farrowing. Biosystems Engineering. 2008;100(2):281–285.
  4. 4. Maes DG, Dewulf J, Piñeiro C, Edwards S, Kyriazakis I. A critical reflection on intensive pork production with an emphasis on animal health and welfare. Journal of animal science. 2020;98(Supplement_1):S15–S26. pmid:31784754
  5. 5. Rose P, O’Brien M. Welfare assessment for captive Anseriformes: A guide for practitioners and animal keepers. Animals. 2020;10(7):1132. pmid:32635313
  6. 6. Ryan M, Waters R, Wolfensohn S. Assessment of the welfare of experimental cattle and pigs using the Animal Welfare Assessment Grid. Animals. 2021;11(4):999. pmid:33918263
  7. 7. Benjamin M, Yik S. Precision livestock farming in swine welfare: a review for swine practitioners. Animals. 2019;9(4):133. pmid:30935123
  8. 8. Yang R, Chen Z, Xu H, Shen M, Li P, Norton T, et al. Recognizing the rooting action of prepartum sow in free-farrowing pen using computer vision. Computers and Electronics in Agriculture. 2023;213:108167.
  9. 9. Küster S, Nolte P, Meckbach C, Stock B, Traulsen I. Automatic behavior and posture detection of sows in loose farrowing pens based on 2D-video images. Frontiers in Animal Science. 2021;2:758165.
  10. 10. Liu J, Xiao D, Liu Y, Huang Y. A Pig Mass Estimation Model Based on Deep Learning without Constraint. Animals. 2023;13(8):1376. pmid:37106939
  11. 11. Tan Z, Liu J, Xiao D, Liu Y, Huang Y. Dual-Stream Fusion Network with ConvNeXtV2 for Pig Weight Estimation Using RGB-D Data in Aisles. Animals. 2023;13(24):3755. pmid:38136793
  12. 12. Fernandes AF, Dórea JR, Fitzgerald R, Herring W, Rosa GJ. A novel automated system to acquire biometric and morphological measurements and predict body weight of pigs via 3D computer vision. Journal of animal science. 2019;97(1):496–508. pmid:30371785
  13. 13. Zhang J, Zhuang Y, Ji H, Teng G. Pig weight and body size estimation using a multiple output regression convolutional neural network: A fast and fully automatic method. Sensors. 2021;21(9):3218. pmid:34066410
  14. 14. Bhoj S, Tarafdar A, Chauhan A, Singh M, Gaur GK. Image processing strategies for pig liveweight measurement: Updates and challenges. Computers and Electronics in Agriculture. 2022;193:106693.
  15. 15. Nasirahmadi A, Sturm B, Edwards S, Jeppsson KH, Olsson AC, Müller S, et al. Deep learning and machine vision approaches for posture detection of individual pigs. Sensors. 2019;19(17):3738. pmid:31470571
  16. 16. Ho KY, Tsai YJ, Kuo YF. Automatic monitoring of lactation frequency of sows and movement quantification of newborn piglets in farrowing houses using convolutional neural networks. Computers and Electronics in Agriculture. 2021;189:106376.
  17. 17. Liu D, Parmiggiani A, Psota E, Fitzgerald R, Norton T. Where’s your head at? Detecting the orientation and position of pigs with rotated bounding boxes. Computers and Electronics in Agriculture. 2023;212:108099.
  18. 18. Kurita Y, Meguro S, Tsuyama N, Kosugi I, Enomoto Y, Kawasaki H, et al. Accurate deep learning model using semi-supervised learning and Noisy Student for cervical cancer screening in low magnification images. Plos one. 2023;18(5):e0285996. pmid:37200281
  19. 19. Englbrecht F, Ruider IE, Bausch AR. Automatic image annotation for fluorescent cell nuclei segmentation. PloS one. 2021;16(4):e0250093. pmid:33861785
  20. 20. Kölle M, Walter V, Schmohl S, Soergel U. Remembering both the machine and the crowd when sampling points: active learning for semantic segmentation of ALS point clouds. In: Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10-15, 2021, Proceedings, Part VII. Springer; 2021. p. 505–520.
  21. 21. Chauhan J, Varadarajan S, Srivastava MM. Semi-supervised Learning for Dense Object Detection in Retail Scenes. arXiv preprint arXiv:210702114. 2021;.
  22. 22. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. International journal of computer vision. 2015;115(3):211–252.
  23. 23. Qiao Y, Guo Y, He D. Cattle body detection based on YOLOv5-ASFF for precision livestock farming. Computers and Electronics in Agriculture. 2023;204:107579.
  24. 24. Jaoukaew A, Suwansantisuk W, Kumhom P. Robust individual pig tracking. International Journal of Electrical and Computer Engineering (IJECE). 2024;14(1):279–293.
  25. 25. Guo Q, Sun Y, Orsini C, Bolhuis JE, de Vlieg J, Bijma P, et al. Enhanced camera-based individual pig detection and tracking for smart pig farms. Computers and Electronics in Agriculture. 2023;211:108009.
  26. 26. Van der Zande LE, Guzhva O, Rodenburg TB. Individual detection and tracking of group housed pigs in their home pen using computer vision. Frontiers in animal science. 2021;2:669312.
  27. 27. Alameer A, Kyriazakis I, Bacardit J. Automated recognition of postures and drinking behaviour for the detection of compromised health in pigs. Scientific reports. 2020;10(1):13665. pmid:32788633
  28. 28. Zhang L, Gray H, Ye X, Collins L, Allinson N. Automatic individual pig detection and tracking in pig farms. Sensors. 2019;19(5):1188. pmid:30857169
  29. 29. Li J. and Chen D. and Qi X. and Li Z. and Huang Y. and Morris D. and Tan X. Label-efficient learning in agriculture: A comprehensive review. In: Computers and Electronics in Agriculture. 2023;215:108412
  30. 30. Xie Q, Luong MT, Hovy E, Le QV. Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. p. 10687–10698.
  31. 31. Keh, S S. Semi-supervised noisy student pre-training on efficientnet architectures for plant pathology classification. arXiv preprint arXiv:2012.00332. 2020;.
  32. 32. Duong L T, Tran T B, Le N H, Ngo V M, Nguyen P T. Automatic detection of weeds: synergy between EfficientNet and transfer learning to enhance the prediction accuracy. In: Soft Computing; 2021. p. 5029–5044.
  33. 33. Liu Y, Ma C, He Z, Kuo C, Chen K, Zhang P, et al. Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480. 2021;.
  34. 34. Wutke M, Heinrich F, Das PP, Lange A, Gentz M, Traulsen I, et al. Detecting animal contacts—A deep learning-based pig detection and tracking approach for the quantification of social contacts. Sensors. 2021;21(22):7512. pmid:34833588
  35. 35. Wutke M, Schmitt AO, Traulsen I, Gültas M. Investigation of pig activity based on video data and semi-supervised AgriEngineering. 2020;2(4):581–595.
  36. 36. Wada K. Labelme: Image polygonal annotation with python. Cambridge. 2016.
  37. 37. Wutke M. SowOrientation: An Image Dataset for the Automated detection of Sow Body Parts and Newborn Piglets. https://doi.org/10.57892/100-70
  38. 38. Scudder H. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory. 1965;11(3):363–371.
  39. 39. Chen Z, Li Z, Wang S, Fu D, Zhao F. Learning from Noisy Data for Semi-Supervised 3D Object Detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2023. p. 6929–6939.
  40. 40. Xu H, Liu F, Zhou Q, Hao J, Cao Z, Feng Z, et al. Semi-supervised 3d object detection via adaptive pseudo-labeling. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE; 2021. p. 3183–3187.
  41. 41. Sohn K, Zhang Z, Li CL, Zhang H, Lee CY, Pfister T. A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:200504757. 2020;.
  42. 42. Park HJ, Zhu P, Moreno IL, Subrahmanya N. Noisy student-teacher training for robust keyword spotting. arXiv preprint arXiv:210601604. 2021;.
  43. 43. Park DS, Zhang Y, Jia Y, Han W, Chiu CC, Li B, et al. Improved noisy student training for automatic speech recognition. arXiv preprint arXiv:200509629. 2020;.
  44. 44. Wang Z, Giri R, Isik U, Valin JM, Krishnaswamy A. Semi-supervised singing voice separation with noisy self-training. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2021. p. 31–35.
  45. 45. Mehmood H, Dobrowolska A, Saravanan K, Ozay M. FedNST: Federated Noisy Student Training for Automatic Speech Recognition. arXiv preprint arXiv:220602797. 2022;.
  46. 46. Chen Y, Ding W, Lai J. Improving Noisy Student Training on Non-Target Domain Data for Automatic Speech Recognition. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2023. p. 1–5.
  47. 47. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee; 2009. p. 248–255.
  48. 48. Ouali Y, Hudelot C, Tami M. An overview of deep semi-supervised learning. arXiv preprint arXiv:200605278. 2020;.
  49. 49. Liu Y, Lim H, Xie L. Exploration of chemical space with partial labeled noisy student self-training and self-supervised graph embedding. BMC bioinformatics. 2022;23(3):1–21. pmid:35501680
  50. 50. Beck N, Killamsetty K, Kothawade S, Iyer R. Beyond active learning: Leveraging the full potential of human interaction via auto-labeling, human correction, and human verification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2024. p. 2881–2889.
  51. 51. Weber V, Piovano E, Bradford M. It is better to Verify: Semi-Supervised Learning with a human in the loop for large-scale NLU models. In: Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances; 2021. p. 8–15.
  52. 52. Jakubik J, Weber D, Hemmer P, Vössing M, Satzger G. Improving the Efficiency of Human-in-the-Loop Systems: Adding Artificial to Human Experts. arXiv preprint arXiv:230703003. 2023;.
  53. 53. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research. 2014;15(1):1929–1958.
  54. 54. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:171009412. 2017;.
  55. 55. Simard P, LeCun Y, Denker J, Victorri B. Transformation invariance in pattern recognition—tangent distance and tangent propagation. In: Neural networks: tricks of the trade. 2002; 239–274.
  56. 56. Dingeto H, Kim J. Universal Adversarial Training Using Auxiliary Conditional Generative Model-Based Adversarial Attack Generation. Applied Sciences. 2023;13(15):8830.
  57. 57. Zhang Z, He T, Zhang H, Zhang Z, Xie J, Li M. Bag of freebies for training object detection neural networks. arXiv preprint arXiv:1902.04103.2019;.
  58. 58. Jocher G, Chaurasia A, Qiu J. YOLO by Ultralytics. URL: https://github.com/ultralytics/ultralytics. 2023;.
  59. 59. Wang G, Chen Y, An P, Hong H, Hu J, Huang T. UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors. 2023;23(16):7190. pmid:37631727
  60. 60. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 779–788.
  61. 61. Terven J, Cordova-Esparza D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv preprint arXiv:230400501. 2023;.
  62. 62. Xiao B, Nguyen M, Yan WQ. Fruit ripeness identification using YOLOv8 model. Multimedia Tools and Applications. 2023; p. 1–18.
  63. 63. Zhang M, Wang Z, Song W, Zhao D, Zhao H. Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network. Applied Sciences. 2024;14(3):1095.
  64. 64. Zhai X, Huang Z, Li T, Liu H, Wang S. YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection. Electronics. 2023;12(17):3664.
  65. 65. Wang Z, Liu Y, Duan S, Pan H. An efficient detection of non-standard miner behavior using improved YOLOv8. Computers and Electrical Engineering. 2023;112:109021.
  66. 66. Le HB, Kim TD, Ha MH, Tran ALQ, Nguyen DT, Dinh XM. Robust Surgical Tool Detection in Laparoscopic Surgery using YOLOv8 Model. In: 2023 International Conference on System Science and Engineering (ICSSE). IEEE; 2023. p. 537–542.
  67. 67. Liu R, Liu T, Dan T, Yang S, Li Y, Luo B, et al. AIDMAN: An AI-based object detection system for malaria diagnosis from smartphone thin-blood-smear images. Patterns. 2023;4(9). pmid:37720337
  68. 68. Ros D, Dai R. A Flexible Fall Detection Framework Based on Object Detection and Motion Analysis. In: 2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE; 2023. p. 063–068.
  69. 69. Inoue T, Maki S, Furuya T, Mikami Y, Mizutani M, Takada I, et al. Automated fracture screening using an object detection algorithm on whole-body trauma computed tomography. Scientific Reports. 2022;12(1):16549. pmid:36192521
  70. 70. Naing KM, Boonsang S, Chuwongin S, Kittichai V, Tongloy T, Prommongkol S, et al. Automatic recognition of parasitic products in stool examination using object detection approach. PeerJ Computer Science. 2022;8:e1065. pmid:36092001
  71. 71. Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:201016061. 2020;.
  72. 72. Doornweerd J, Veerkamp R, de Klerk B, van der Sluis M, Bouwman A, Ellen E, et al. Tracking individual broilers on video in terms of time and distance. Poultry Science. 2024;103(1):103185. pmid:37980741
  73. 73. Cai J, Cai H. Robust hybrid approach of vision-based tracking and radio-based identification and localization for 3D tracking of multiple construction workers. Journal of Computing in Civil Engineering. 2020;34(4):04020021.
  74. 74. Sharma N, Baral S, Paing MP, Chawuthai R. Parking time violation tracking using yolov8 and tracking algorithms. Sensors. 2023;23(13):5843. pmid:37447693
  75. 75. Huang E, Mao A, Gan H, Ceballos MC, Parsons TD, Xue Y, et al. Center clustering network improves piglet counting under occlusion. Computers and Electronics in Agriculture. 2021;189:106417.
  76. 76. Oczak M, Maschat K, Berckmans D, Vranken E, Baumgartner J. Automatic estimation of number of piglets in a pen during farrowing, using image analysis. Biosystems Engineering. 2016;151:81–89.
  77. 77. Kumar A, Varanasi S, Mital U, Patra D, Gajera A. Empirical Study of the Impact of Image Quality, Object Size, and Occlusion to Object Detection. EasyChair; 2023.
  78. 78. Kang K, Li H, Xiao T, Ouyang W, Yan J, Liu X, et al. Object detection in videos with tubelet proposal networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 727–735.
  79. 79. Wang A, Sun Y, Kortylewski A, Yuille AL. Robust object detection under occlusion with context-aware compositionalnets. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. p. 12645–12654.
  80. 80. Tseng G, Sinkovics K, Watsham T, Rolnick D, Walters T. Semi-Supervised Object Detection for Agriculture. 2nd AAAI Workshop on AI for Agriculture and Food Systems.2023;.
  81. 81. Zhang L, Ma K. Structured knowledge distillation for accurate and efficient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019;.
  82. 82. Zhu P, Wen L, Du D, Bian X, Ling H, Hu Q, et al. Visdrone-det2018: The vision meets drone object detection in image challenge results. Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 2018;.
  83. 83. Zhu Y, Zhou Q, Liu N, Xu Z, Ou Z, Mou X, et al. Scalekd: Distilling scale-aware knowledge in small object detector. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023;19723–19733.
  84. 84. TierSchNutztV. Verordnung zum Schutz landwirtschaftlicher Nutztiere und anderer zur Erzeugung tierischer Produkte gehaltener Tiere bei ihrer Haltung, §30 (2b). online available: https://www.gesetze-im-internet.de/tierschnutztv/BJNR275800001.html (accessed 25th July 2024).