Figures
Abstract
Background
In assisted reproductive technology, evaluating the quality of the embryo is crucial when selecting the most viable embryo for transferring to a woman. Assessment also plays an important role in determining the optimal transfer time, either in the cleavage stage or in the blastocyst stage. Several AI-based tools exist to automate the assessment process. However, none of the existing tools predicts upcoming video frames to assist embryologists in the early assessment of embryos. In this paper, we propose an AI system to forecast the dynamics of embryo morphology over a time period in the future.
Methods
The AI system is designed to analyze embryo development in the past two hours and predict the morphological changes of the embryo for the next two hours. It utilizes a novel predictive model incorporating Convolutional LSTM layers for recursive forecasting, enabling prediction of future embryo morphology by analyzing prior changes in the video sequence and predicting embryo development up to 23 hours ahead.
Results
The results demonstrated that the AI system could accurately forecast embryo development at the cleavage stage on day 2 and the blastocyst stage on day 4. The system provided valuable information on the cell division processes on day 2 and the start of the blastocyst stage on day 4. The system focused on specific developmental features effective across both the categories of embryos. The embryos that were transferred to the female, and the embryos that were discarded. However, in the ‘transfer’ category, the forecast had a clearer cell membrane and less distortion as compared to the ‘avoid’ category.
Conclusion
This study assists in the embryo evaluation process by providing early insights into the quality of the embryo for both the transfer and avoid categories of videos. The embryologists recognize the ability of the forecast to depict the morphological changes of the embryo. Additionally, enhancement in image quality has the potential to make this approach relevant in clinical settings.
Citation: Sharma A, Dorobantiu A, Ali S, Iliceto M, Stensen MH, Delbarre E, et al. (2025) Deep learning methods to forecasting human embryo development in time-lapse videos. PLoS One 20(9): e0330924. https://doi.org/10.1371/journal.pone.0330924
Editor: Sanaz Alaeejahromi, Shiraz University of Medical Sciences, IRAN, ISLAMIC REPUBLIC OF
Received: April 30, 2024; Accepted: August 7, 2025; Published: September 2, 2025
Copyright: © 2025 Sharma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Sensitive data from Norwegian patients. data contain potentially identifying or sensitive information. Who has imposed them: https://rekportalen.no/#hjem/home and https://sikt.no/en/find-data. Institutional body to which data requests may be sent: Trine B. Haugen (trine.b.haugen@oslomet.no).
Funding: Norwegian Research Council (FRIMEDBIO): 288727. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: No.
Introduction
The advancement of assisted reproductive technology (ART), including procedures like in-vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI), has provided new possibilities for treating infertility and enhancing the likelihood of conceiving a child with medical treatment [1,2]. In ART, eggs are retrieved from a woman, fertilized outside the body and the embryos are cultured in incubators. An ‘embryo’ represents the initial stage of human development [3] and is characterized by a sequence of cell divisions referred to as embryo cell stages. Fig 1 shows the development of an embryo spanning across different embryo cell stages.
After fertilization, embryo development spans across different cell stages, morula and blastocyst. The development is grouped into days. The blastocyst stage is magnified to depict the inner cell mass.
Each division occurs over a time span of 12 to 24 hours, and this progression is measured in hours post insemination (hpi). Initially, the embryo undergoes its first division, resulting in two cells, which then further divide into more cells. As the embryo develops, the cells gradually start to compact together, forming a compact mass known as a morula. Later, these cells begin to differentiate and evolve into a more specific structure called a blastocyst, characterized by distinctive features such as inner cell mass and trophectoderm.
Embryos are typically cultured in an incubator until the fifth day of embryo development. Then, an embryo is either transferred to the uterus for implantation, cryopreserved for subsequent transfer or discarded (avoid). This is process is referred to as ‘embryo evaluation’ and is conducted by embryologists after examining time-lapse videos capturing the progression of embryo development during the incubation period [4]. The embryologists base their assessment on several factors including the dynamics of change in embryo morphology [5,6], the time interval between adjacent cell stages [7,8] and the start of blastocyst stage [9,10]. These parameters indicate the implantation potential of an embryo.
Time-lapse incubators facilitate the assessment process by providing embryologists with nearly continuous monitoring of embryo development through frequent image acquisition [11]. These images can be collated into a time-lapse video depicting the entire development process. Traditionally, embryologists manually analyze these videos, a method that is often time-intensive and subjective [12]. To enhance efficiency, various artificial intelligence (AI) algorithms and tools are used within fertility clinics [13]. These AI systems assist embryologists by automating various tasks such as the annotation of cell stages [14,15], scoring and grading morphology stages [16], embryo selection [4], and the prediction of implantation potential and live-birth outcomes [17–19].
Most published AI-assisted approaches for embryo evaluation and selection focus on specific time points to assess implantation potential. Studies using classification-based networks to predict blastocyst formation have relied on static images at 70 hpi [20], 113 hpi [21], or 116 hpi [22]. An LSTM based approach has also been used for predicting blastocyst development based on the cytoplasm movement at 42 hpi [23]. While these studies focused on single time points in embryo development, another study assessed implantation potential using six morphokinetic parameters: t2, t3, t4, t5 (time in hpi) and the time intervals t2–t3 and t3–t4 [24]. Here,‘t2’ marks the start of the 2-cell stage, ‘t3’ the 3-cell stage, ‘t4’ the 4-cell stage and ‘t5’ the 5-cell stage. These AI-assisted approaches either analyze specific time point or parts of time-lapse videos to identify morphological patterns linked embryo implantation potential. These AI systems can assess embryo quality at specific time points, but none can predict future biomarkers by analyzing the current progression of embryo morphology. A recently introduced AI-based software system, referred to as Cultivating Human Life through Optimal Embryos (Chloe) has claimed to predict embryo development. It combines multiple AI models, including CNNs for embryo morphology and patient data analysis, dynamic programming, temporal action segmentation, and Gardner scoring, all within an LSTM framework. Together, these models predict timings of cell division in future, blastocyst grading and implantation scoring. Predictions update automatically as embryo development progresses, addressing time discrepancies. However, clinical adoption requires software licenses, making it cost-intensive. Additionally, Chloe predicts cell division timings, such as ‘t2’, rather than generating future frames to visually depict embryo development. Our framework aims to bridge this gap by producing upcoming frames in time-lapse videos, capturing developmental changes over subsequent hours.
Our approach forecasts embryo development in the following 12 to 23 hours, which allows for an earlier transfer of the embryo to the female. This allows for diminished epigenetic risks and for a reduction in the embryologist’s work load. Specific studies have linked epigenetic risks with prolonged period of embryo incubation [25,26] and an early assessment can reduce the impact of the modifications. Hence, in this study, we propose an AI model that predicts upcoming frames for time-lapse videos, covering developmental cell stages of the embryo from 31 to 43 hpi (day 2 of embryo development) and from 90 to 113 hpi (day 4 of embryo development). We focused on the training and evaluation of our AI model on day 2 and day 4 embryo development since existing research highlights the significance of the cleavage stage transfer (day 2 to 3) and the blastocyst stage transfer (day 4 to 6) [3,27]. By concentrating on these specific time interval, we aligned our proposed system with embryo development stages that are relevant for transfer decisions in clinical settings. Furthermore, previous studies on similar AI models observed high quality results around day 2 and day 4 [28]. Consequently, we chose embryo development activities on these particular days as the starting point for our study.
Our suggested AI system employs a forecasting strategy operating on an input video sequence consisting of seven frames to forecast the subsequent seven frames of the sequence depicting potential changes in embryo morphology. So, the system utilized the current morphology dynamics of two hours and forecast two hours in the future. Upon forecasting the last seventh frame, the input sequence shifts by one, incorporating a new frame, and continues to forecast the subsequent embryo’s development. This way, the AI system provides a detailed and progressive analysis of the embryo’s development over a specific time period, starting only with the initial seven frames. The development of embryos differed significantly between those selected for transfer and those discarded. The AI system effectively adjusted to these variations across different developmental cell stages of embryo morphology in both categories. To the best of our knowledge, this is the first study to use ConvLSTM for recursive video frame forecasting predicting future embryo morphology, moving beyond static timepoint analysis to a time-based prediction of developmental changes. The main contribution of this study lies in providing the embryologists with a visual tool to observe the varying progression of embryo morphology in transfer and avoid videos over subsequent hours into the future. The visualization facilitates the identification of key biomarkers important for evaluating embryo quality at an early stage of development.
Materials and methods
Ethics statement
The study adhered to the Regional Committee for Medical and Health Research Ethics–South East Norway (REK), and the General Data Protection Regulations, utilizing retrospective data gathered from the Volvat Spiren fertility clinic in Oslo between August 2013, and
June 2019. Patients information was already in the clinic’s internal record system and participants were informed about the study. However, an exemption from REK regarding an active participant consent was obtained. The data, including embryo videos and patient information, were fully anonymized, de-identified and used post-REK approval for training and evaluation purposes. The data were accessed for research purposes from
January 2023 to
November 2023.
Time lapse incubator
Time-lapse technology in human embryo incubation allows continuous monitoring of embryo development without affecting the conditions like temperature, pH, and gas concentration [11,29]. A time-lapse incubator combines an IVF incubator with an integrated microscope and camera to capture pictures of embryo development every 5 to 20 minutes at several focal planes [30]. These images are then compiled into a video. The video is used for assessing embryo viability and annotating parameters, such as the start of the blastocyst. In this study, we used a time-lapse system called Embryoscope™ by Vitrolife. The system is fitted with a plate referred to as an “embryoslide”. The embryoslide hosts several wells for culturing embryos individually. When a well enters the microscope’s field of view, an image is acquired by the inbuilt camera. The system used a camera with a LED light source (under 635 nm) passing through Hoffman’s contrast modulation optics. For each embryo, the system captured 8-bit images with a resolution of pixels on different focal planes (usually between 3 or 5) at intervals of 7, 15, or 20 minutes. However, in the study we considered images from the central focal plane only. In this study, embryologists annotated each video for the embryo cell stage timings.
Data
The dataset, used in this study, consisted of 365 time-lapse videos. All 365 videos had a low rate of fragmentation, i.e., up to 15%. Videos with a rate of fragmentation above 15% were excluded from the dataset. Fragmentation refers to the cytoplasm content that remains outside the daughter cells during cell division, and the rate of fragmentation quantifies the percentage of such material within an embryo. We divided the 365 videos into two datasets. The first dataset comprised of 220 videos corresponding to day 2 of embryo development between 31 hpi to 43 hpi (12 hours) and is referred to as ‘cells stage study’. The second dataset had the remaining 145 videos representing day 4 of embryo development between 90 hpi to 113 hpi (23 hours), referred to as ‘blastocyst study’.
Data preprocessing
We preprocessed the datasets in the cells stage study and the blastocyst study. As a first step of preprocessing, each video frame was resized from 250x250 to 128x128, normalized and additional image channels were stacked onto the frame. Fig 2 shows the data preprocessing workflow and stacking of the additional channels to the video frames.
A workflow diagram outlining the stacking of additional channels to the input frames of an embryo video. The channels represent development time (in hours post insemination) and cell count. Part a): Cells stage study: preprocessing a frame by stacking additional channels: cell count and time. Part b): Blastocyst study: preprocessing a frame by stacking only the time channel.
In the cells stage study, two additional channels represented the cell count and embryo development time, reported in hpi. For the blastocyst study, the additional channel represented only embryo development time. We omitted the cell count channel due to the difficulty in counting cells due to cell compaction. These additional channels information was derived from embryologist annotations accompanying the time-lapse videos. We normalized the channel information before stacking them with image channels.
Problem definition
Forecasting involves predicting the future frame sequence of a time-lapse embryo video. For sequence prediction, the initial step involves a single frame prediction task. The task aims to deduce the next video frame using the morphology from the previous frames. Let represent a video sequence from time t–F to time t–1, where the term
corresponds to a video frame with channels C, height H and width W. The frame prediction model is responsible for predicting the next frame; xt. Let
represent our prediction of xt. Let
denote the sequence shifted one time step and the prediction added. Let
define a parametric prediction model, providing a mapping function
. The mapping function will be optimized by minimizing the difference between the observed sequence and the predicted sequence, where
is the binary cross entropy loss function.
In this study, we built a video frame prediction model using Convolutional LSTM [31] layers that uniquely enables recursive generation of upcoming video frames of embryo development instead of relying on static timepoints predictions. The model architecture is explained in Section Video frame prediction model.
Convolutional LSTM
Recurrent neural network, explained in S1 Appendix, has introduced the concept of the hidden state for holding the network’s previous computations. The idea is refined by LSTM, explained in S2 Appendix, by introducing cell states ct and information processing using gates. Convolutional LSTM (ConvLSTM) is an extension of LSTM used for spatio-temporal prediction. Similar to LSTM, ConvLSTM also passes information from the previous hidden state to the next frame of the video sequence, but the matrix multiplications defining state to state transitions are the local convolution operations defined on spatial axes. The mathematical equations and technical details are provided in S3 Appendix. Let Xt be defined as a video frame, then ConvLSTM considers the pixels belonging to Xt as the cells on a spatial grid. ConvLSTM captures changes in pixel intensity across this grid. Next, ConvLSTM determines the future of a grid cell based on the current inputs and the past LSTM states (hidden state, cell state) around its local neighborhood. Assuming Xt is the third frame of a video sequence with the embryo 4-cell stage. The embryo was in the 2-cell stage in the first and 3-cell stage in the second frame. Now, ConvLSTM attempts to predict the next frame of the video (a 2D image matrix) based on the time-step progression of the image pixels’ convolutional feature vector in frame one, two and three. The inner structure of the ConvLSTM capturing the temporal dependencies in a video sequence is shown in S1 Fig. We used ConvLSTM to build the video frame prediction model because ConvLSTM has been widely applied for predicting the upcoming frames using previous frames [32–35].
Video frame prediction model
We refer to our video prediction model as the ‘FramePredictor’ in the subsequent text. Fig 3 shows the architecture of the FramePredictor using a block diagram representation.
A sequential model with ConvLSTM layers followed by a batch normalization layer or a dropout layer. The input consists of a video sequence and the output comprises two images. The first output’s image dimension varies depending upon the study type (here it is cell study), but the dimension of the second output is constant across both studies.
The network has four ConvLSTM layers with ReLU activation. The layers have 32 filters, 64 filters, 128 filters and 128 filters with kernel sizes to be 7, 5, 3 and 1 respectively. The first three layers are followed by batch normalization with the ‘return sequence’ set as true. The last layer is followed by a dropout layer (rate=0.25) and has the ‘return sequence’ set to false. The ‘return sequence’ parameter in ConvLSTM layers determines the length of the output (number of video frames). If the parameter is true, the length of the input and the output sequence is the same. As the FramePredictor predicts a single frame, the ‘return sequence’ is set to false in the last ConvLSTM layer. The dropout layer’s output splits between two convolution layers, referred to as convHead1 and convHead2, as shown in Fig 3. Both convHead1 and convHead2 have a kernel size of 3, employs sigmoid activation functions for predictions.
For both the cells stage and blastocyst studies, we set F = 7 and the dimensions W = H = 128. However, the channel depth was set to C = 5 for the cells stage study and C = 4 for the blastocyst study. It resulted in convHead1 predicting a frame with dimensions of 128 x 128 x 4 for the blastocyst study and 128 x 128 x 5 for the cells stage study. Whereas convHead2 predicts a frame with dimensions of 128 x 128 x 3. We used the Keras library for constructing the layers of the FramePredictor and the architecture was inspired by the ConvLSTM model in Keras Code examples [36].
Training of the FramePredictor
For training of the FramePredictor, the frames of a video are rearranged in a specific manner. Consider a video consisting of 12 frames. As FramePredictor uses an input sequence of seven frames length, the video is reorganized into several subsequences with overlapping frames. The first subsequence consists of frames x1 to x7, with the x8 frame as the ground truth. The second subsequence comprises frames x2 to x8, with the x9 frame as the ground truth. This pattern continues, with each subsequent sequence shifting by one frame until the last frame is used as the ground truth for training the FramePredictor. In this example, the last frame is x12.
The FramePredictor predicted the embryo morphology in time-lapse videos specifically for two periods: the cells stage study and the blastocyst study as explained in Section Data and Section Data preprocessing. The datasets comprises of two categories of videos: “transfer videos” of embryos transferred to females and “avoid videos” of embryos discarded by embryologists. We separately trained the FramePredictor on both transfer and avoid videos, resulting in four distinct models for prediction tasks: cells stage study with avoid videos (studyCA), cells stage study with transfer videos (studyCT), blastocyst study with avoid videos (studyBA), and blastocyst study with transfer videos (studyBT). Each study’s dataset was further divided with detailed descriptions provided in the following sections.
Dataset: Cells stage study.
Within this set, there were 110 transfer videos and 110 avoid videos. Among these 220 videos, 94 had a frame rate of 3 frames per hour, while 126 had a frame rate of 4 frames per hour. In all 220 videos, the first video frame always had the embryo in the 2-cell stage. We describe the exact cell stage data distribution within the dataset in S2 Fig.
We split the 110 avoid videos into two subsets: 100 for training the studyCA. The dataset is referred to as the trainCA set (train cells avoid). The remaining 10 referred to as the evalCA set (evaluation cells avoid) was used as an independent dataset for assessing studyCA. Similarly, the transfer videos were divided into two groups: 100 for training the studyCT, referred to as the trainCT set (train cells transfer), and the remaining 10, referred as the evalCT set (evaluation cells transfer) was used in the independent evaluation of the studyCT. Here, ‘independent dataset’ refers to videos that were not used in the model’s training or validation.
Dataset: Blastocyst study.
This dataset contained 71 transfer videos and 74 avoid videos, each with a frame rate of 8 frames per hour. The avoid videos had embryos beginning at the 9+ cell stage, whereas the transfer videos captured embryos from the start of cell compaction. Avoid videos started with embryos in the 9+ cell stage. Transfer videos started with embryos at the start of the compaction. We detail the distribution of embryo development stages across this dataset in S2 Fig.
Of the 74 avoid videos, 66 videos referred to as the trainBA set (train blastocyst avoid) were used in training the studyBA. The remaining 8 videos referred as the evalBA set (evaluation blastocyst avoid) was used as independent dataset for evaluating the studyBA. The 71 transfer videos were divided into two sets: 64 videos for training studyBT, referred to as the trainBT set (train blastocyst transfer) and 7 videos for evaluating the studyBT, referred to as the evalBT set (evaluation blastocyst transfer).
Training hyperparameters.
The datasets trainCA, trainCT, trainBA and trainBT were divided into a ratio of 80:20 for training and validation testing of studyCA, studyCT, studyBA and studyBT respectively.
For the cells stage study, we processed input sequences of size 7x128x128x5, producing two output images with the sizes 128x128x3 and 128x128x5 or 128x128x4. The training involved 50 epochs with a batch size of 2, utilizing the Adam optimizer with a learning rate 10−4 and binary cross entropy as the loss function. During training, we saved the model based on the highest validation accuracy, specifically for predicting the 128x128x3 images. For the blastocyst study, the input dimensions were 7x128x128x4 and the output consisted of two images with dimensions of 128x128x4 and 128x128x3. The training involved 35 epochs and had the other parameters to be the same as described in the cells stage study.
Embryo cropper
The predicted video frames have an embryoslide well visible in the background. However, because our primary focus is on the embryos (foreground), we removed the embryoslide wells from both the original and predicted video frames during the evaluation of the FramePredictor. Since we made many predictions, we automated the cropping by training a U-Net model, referred to as the ‘embryo cropper’ for generating the masks to separate embryos from the background. Fig 4 shows an example of a video frame after post-processing with the embryo cropper.
Part a): A frame extracted from a time-lapse embryo video. Part b): Embryo cropper locates the embryo region and predicts the segmentation mask. Part c): Embryo cropper mattes the mask (b) to the image (a) for extracting the embryo region.
Dataset: Embryo cropper.
Based on empirical testing, we used 1994 embryo images (video frames), extracted at random time intervals from the 365 videos (Section S1 Data). This dataset was referred to as the ‘trainCrop’ (train embryo cropper) dataset. Within the dataset, the distribution of embryo cell stages was as follows: 250 images each for the 2-cell, 3-cell, 4-cell, 9+ cell, start of compaction, morula, start of blastocyst stages, and 244 images for the full blastocyst stage. We used LabelBox [37] to annotate the dataset for obtaining the segmentation masks used for training the embryo cropper.
Training: Embryo cropper.
We used the standard vanilla U-Net architecture and trained it on the trainCrop dataset. The input image size was set to 128x128x3, and we applied both horizontal and vertical flipping as data augmentation techniques. The dataset was divided into training and validation set at the ratio of 80:20. The batch size used was 32, the number of epochs was 60. We used Adam optimizer with a learning rate of 10−3. The loss function was sparse categorical cross entropy. The trained embryo cropper generates a mask to crop out the embryo region from a video frame. That same mask is matted with the frame (ground truth) and the predicted frame to extract the embryo region. We used the vision transformer VitMatte [38,39] for the matting process.
Forecasting embryo development
For a video sequence as an input, the FramePredictor is capable of only predicting the next immediate frame following the last frame. However, the aim of this study is to forecast embryo development over a couple of hours into the future, not just single frame prediction. Hence, we devise two forecasting strategies based on the predictions from the FramePredictor. The strategies are referred to as ‘Forecasting till the end’ and ‘forecasting the next 7 frames’. Fig 5 shows a schematic representation of the workflow for both the forecasting strategies.
Considering an embryo video with 15 frames. Each frame is referenced with its location in the video: Frame x1 becomes 1 and predicted frame becomes
. ‘Forecasting till the end’ uses the first seven frames to forecast the remaining frames (
to
). The strategy appends
to the input for forecasting
and continues so forth. ‘Forecasting the next 7 frames’ uses the first seven frames but forecasts only the next seven frames (
to
). Later, when frame 8 is received from the embryo video, the strategy uses frames 2 to 8 to update the forecasts of frames
to
and in addition makes a forecast of frame
.
The frame sequences from a video are reorganized into subsequences, following a similar rearrangement scheme as explained in Section Training of the FramePredictor. The rearrangement is conducted irrespective of the chosen forecasting strategy.
In the Forecasting till the end strategy, we modify the input sequences by including the predicted frames within the sequence itself. This modified input is then used by the FramePredictor to forecast subsequent video frames. For example, for an input sequence with frames: x1, x2, x3, ..., x6, x7, the FramePredictor predicts the next frame . The forecasting scheme will transform the input sequence into x2, x3, x4, ..., x7,
for predicting
. The process continues with the predictions (
,
, and so on) until we reach the end of both studies.
Forecasting the next 7 frames strategy involves predicting the subsequent seven frames only. The approach is based on the idea that with each new frame obtained from the incubator for a time-lapse video, the system updates its forecast for the next seven frames. Given an initial sequence of frames (x1, x2, x3, ..., x6, x7), the strategy will forecast the upcoming seven frames (, ...,
). After predicting
, the prediction is paused until frame x8 is acquired. The pause period depends on the video’s frame rate. The input sequence changes to x2, x3, x4, ..., x7, x8. Now, using the modified input sequence, the forecasting process continues by updating the forecasts of
and in addition making the forecast
. The scheme continues to forecast until the end of the studies is reached.
Performance metrics
To evaluate the proposed system, we use state-of-the-art video quality evaluation metrics. The metrics included Peak Signal to Noise Ratio (PSNR), structural similarity index measure (SSIM) [40] and Fréchet Video Distance (FVD) [41]. We use SSIM and PSNR because of their effectiveness in measuring the deviation of the predicted sequence from the baseline ground truth sequence given a specific scenario [41]. In our case, the scenario is related to differences in embryo morphology. Essentially, these metrics penalize predictions that deviate from the ground truth [42]. Thus, we could effectively quantify the degree of divergence in the predicted embryo morphology. We used scikit-image [43] for calculating SSIM and PSNR.
The FVD score extends the underlying concept of Frechet Inception Distance [44]. The FVD metrics take into account the temporal consistency between two videos (ground truth and predicted). Besides measuring the video quality [41] and a lower FVD score is indicative of higher video quality. By calculating the FVD scores we investigated the maximum duration for which the dynamics of the predicted sequence align with the ground truth.
Results
We evaluated the performance of the FramePredictor. Later, we evaluated the performance of the FramePredictor with the two forecasting strategies. Below, we present the evaluation results for both studies: the cells stage study and the blastocyst study.
Evaluation of FramePredictor
We evaluated the performance on studyCA, studyCT, studyBA, and studyBT using both validation testing sets (from trainCA, trainCT, trainBA and trainBT) and independent sets (evalCA, evalCT, evalBA and evalBT). We computed PSNR and SSIM metrics on a per-frame basis and averaged them, but the FVD score was computed for sequences of frames. We based metric calculations on the predictions with a dimension of 128x128x3. For evaluation, we reorganized video sequences into sub-sequences using the sliding window technique outlined in Section Training of the FramePredictor.
The per-frame evaluation results are reported in Table 1, which includes outcomes before and after applying the embryo cropper post-processing. We observed consistent PSNR and SSIM values across both the validation and independent sets for the transfer and avoid videos, except in the avoid category’s blastocyst study. There was a drop in performance numbers from validation to independent set, indicating the FramePredictor’s limited generalization for studyBA. For transfer videos, both PSNR and SSIM reported slightly higher value in comparison to avoid videos. The metric values further increased after performing the post-processing with embryo cropper, suggesting that FramePredictor focuses on relevant embryo features to predict morphological changes. The highest performance was observed in the transfer videos of the independent set following embryo cropper post-processing.
We assessed development using the FVD score at three different time intervals for the cells stage study: 2-3 hours with 10 frames, 4-5 hours with 14 frames, and 7-10 hours with 28 frames. In the blastocyst study, we analyzed videos trimmed to lengths of 60 frames (8 hours), 120 frames (15 hours), and 180 frames (22 hours). The duration was calculated based on the videos’ frame rate. We calculated FVD score on the independent sets only because the validation set had frame sequences covering partial portions from different videos and cannot be evaluated for representing a complete video context. We calculated the FVD score only on independent sets because the validation set consists of frame sequences that represent only partial segments from different videos. These sequences do not provide a full video context, therefore unsuitable for evaluation.
The FVD score at different video lengths is reported in Table 2. The table also includes the results after the predictions were post-processed with the embryo cropper.
We observed that FVD scores decreased as the video sequences became longer, indicating an improved prediction quality with a higher coverage of embryo development. Thus, the FramePredictor accurately predicts the embryo’s development over longer periods. The highest performance was observed in the 180-frame sequences of the blastocyst study. For the cells stage study, the best FVD score was in the 28-frame sequences for transfer videos and in the 14-frame sequences for avoid videos. Post-processing with the embryo cropper led to a significant decrease in FVD scores. This suggested that the background artifacts might have contributed to the context mismatch between predictions and ground truth.
Fig 6 shows a video sequence predicted by FramePredictor in the transfer category for both the cells stage and blastocyst studies.
a) A transfer video from the independent set (evalCT) in the cells stage study b) A transfer video from the independent set (evalBT) in the blastocyst stage study. The actual (ground truth) frames are on the left, while the predicted frames are on the right, with time points highlighted in blue and reported in hours post insemination.
Upon assessing the image quality of the predictions, we observed that the visual quality of the predicted frames was lower in comparison to the ground truth frame, across both the studies. By the term “quality,” we referred to embryo images with clear cell membranes, lack of noise or artifacts. For the cells stage study, predicted frames sometimes had the presence of artifacts such as gray-scale color distortions in the background or around the zona pellucida. The zona pellucida is a circular ring surrounding the embryo region. For the blastocyst study, the initial predictions had the presence of gray-scale color distortions in the background. The blurred cell membranes was another issue. For the cells stage study, the cell membranes were always visible, but, the blurriness and artifacts reduced the overall visual appeal of the predictions. These issues with the image quality were more pronounced for the predictions in the avoid category. We present a few more examples of the FramePredictor’s predictions for cells stage study in S3 Fig and for the blastocyst study in S4 Fig.
Regardless of the video category, the image quality at the beginning was much lower when compared to other segments of the video. The quality improved significantly towards the end segment of the video.
The quantitative evaluation metrics for FramePredictor can be directly linked to its clinical utility in embryo assessment. Higher PSNR (greater than 24) and SSIM (greater than 0.83) values in the transfer category suggest its effectiveness in feature detection during embryo selection. For example, these metrics enable embryologists to assess cell symmetry with visualization of cell boundaries (PSNR), while ensuring the preservation of embryo morphological structure(SSIM), helping in identifying anomalies in embryo developmental patterns. Additionally, a lower FVD score (below 800 for the transfer category) signifies more accurate predictions of cell division timing in the forecasted sequence, a key parameter in embryo quality assessment.
Assessment by the embryologists.
The predicted frames for the cells stage study and the blastocyst study were assessed by four embryologists. According to their assessment of the predictions, transfer videos provided enhanced visibility of embryo morphology over avoid videos. Few predicted frames in the transfer category had structures similar to cell nuclei. However, validating these predictions for clinical use is undermined by the presence of gray-scale distortions around the zona pellucida and the difficulty in distinguishing individual cells due to blurred cell membranes. But, the predictions were effective in signaling the beginning of the blastocyst stage.
Clinical utility of the results.
An embryologist selects the highest-quality embryo from a local cohort based on embryos’ developmental profiles. If an AI model can replicate this process accurately, it holds significant clinical value. FramePredictor’s performance assessment on transfer videos showed enhanced visibility of embryo morphological structures, with cell nuclei visible in several cases. This clinical relevance extends beyond morphology prediction:
- – Timing predictions for important events: FramePredictor forecasted critical developmental point on day 2 and day 4 blastulation onset, aiding embryologists in selecting embryos with higher implantation potential [3,27].
- – Optimized Transfer Timing: By predicting embryo development 12-23 hours ahead, embryologists can make earlier transfer decisions.
- – Improved Embryo Selection: FramePredictor differentiates between transfer and avoid embryos, with transfer-category forecasts showing clearer cell membranes, aligning with established quality markers [5,9].
- – Early Identification of Poor-Quality Embryos: By detecting poor developmental trajectories, the model optimizes fertility clinic resources.
Evaluation of the forecasting strategies
We re-evaluated the FramePredictor on independent sets using the exact settings as described in Section Evaluation of FramePredictor, but the predictions were recursively utilized by the forecasting strategies to further predict the embryo’s development. Below, we present evaluation results for the FramePredictor with both forecasting strategies.
Forecasting till the end.
The evaluation of the ‘Forecasting till the end’ strategy is reported in Table 3, which includes outcomes before and after applying the embryo cropper post-processing. For the cells stage study, PSNR and SSIM metrics were higher, whereas the blastocyst study reported lower FVD scores. The strategy performed better for transfer videos than for avoid videos. Post-processing the predictions with the embryo cropper resulted in improving the metrics values across all datasets.
Forecasting the next 7 frames.
The evaluation results of the ‘Forecasting the next 7 frames’ strategy is reported in Table 4. The table also includes the results after the predictions were post-processed with the embryo cropper. Metric analysis revealed that the ‘Forecasting the next 7 frames’ strategy outperformed the ‘Forecasting till the end’ approach. In comparison to the latter, the former had higher PSNR and SSIM, and a lower FVD score. This is as expected, since forecasting to the end is a more challenging problem than forecasting just the next seven frames.
The performance of the ‘Forecasting the next 7 frames’ strategy improved significantly after applying embryo cropper post-processing, with the highest metric scores recorded post-processing in both the cells stage and blastocyst studies. This enhancement was consistent across both transfer and avoid videos. Additionally, the forecasting strategy performed better for transfer videos.
Fig 7 shows the ‘Forecasting the next 7 frames’ strategy’s predicted sequence for the transfer category in the cells stage and blastocyst study.
a) A transfer video from the independent set (evalCT) in the cells stage study b) A transfer video from the independent set (evalBT) in the blastocyst stage study. The actual (ground truth) frames are on the left, while the predicted frames are on the right, with time points highlighted in blue and reported in hours post insemination.
Since the forecasting approach used predictions from FramePredictor as an input, there was noticeable grayscale level distortions in the forecast. Similarly, the blurriness associated with cell membranes was also more pronounced. Furthermore, we compared the forecasted embryo development to the embryologists’ annotations for the embryo development in the ground truth videos. The forecasting strategy typically had an average time delay of 1 to 1.5 hours in predicting transitions between cell stages.
The evaluation metrics for our forecasting strategies demonstrate different clinical applications in ART. The consistently higher metrics in transfer versus avoid embryos across both strategies align with the clinical goal of selecting embryos with higher implantation potential. For ‘Forecasting the Next 7 Frames’ strategy PSNR value greater than 17 and SSIM value greater than 0.75 was sufficient for reliable identification of cell division patterns. Also, FVD score less than 800 represents accurate detection of blastocoel, an important parameters predicting the start of blastocyst.
Assessment by the embryologists.
The forecast from the ‘Forecasting the next 7 frames’ strategy was also evaluated by same embryologists. Noise such as artifacts, distortions, and blurred cell membranes made it challenging for them to confidently determine the beginning of cell stages in forecasts, although transfer videos were more clearer than avoid videos. Embryologists could observe changes within the embryo in the forecasted sequences, but for clinical validation, a noise-free prediction is necessary. In the cells stage study, forecasts between 36 to 38 hpi and between 40 to 41 hpi were exceptionally clear. The start of the blastocyst could always be accurately identified in the blastocyst study, but the transition to a full blastocyst with an inner cell mass was not as clear. For avoid videos, noise and blurriness prematurely suggested cell or blastocyst degeneration and sometimes cells were mistaken for large fragments, but correct differentiation was possible upon reviewing longer in the forecast.
Clinical utility of the results.
The evaluation of our forecasting strategies reveals a technical advantage that can directly be translated to clinical benefits. The analysis demonstrated that the ‘Forecasting the next 7 frames’ strategy provided a clearer visualization of critical morphological features, improving embryo selection. This results in clearer cell membrane definition and fewer artifacts, enabling more accurate identification of embryos following optimal developmental patterns. The 2-hour advance prediction window streamlines workflows by allowing early transfer preparation, minimizing handling, and exposure to suboptimal conditions. Additionally, our temporal analysis revealed forecasts maintained approximately 1-1.5 hours accuracy for developmental transitions, providing a metric for assessing embryo’s developmental pace.
The collaboration with embryologists highlighted key timeframes (36-38 hpi, 40-41 hpi) where forecasts were most clinically useful. The system’s ability to distinguish transfer from avoid embryos based on clear cell membrane and start of blastulation aligns with morphological grading criteria, offering an objective support for subjective assessments.
Clinical scenarios for selecting a forecasting strategy
The ‘Forecasting till the end’ strategy enables embryologists to visualize the whole embryo development, tracking key events such as cleavage progression, compaction and blastulation. Clinics with limited incubator capacity could use this approach to make earlier embryo selection decisions. In cases where early transfer is preferred (e.g., patients with previous failed ART treatments), this strategy can help predict which embryos are likely to maintain normal development.
The ‘Forecasting the next 7 frames’ strategy offers higher prediction accuracy, making it important for time-sensitive decisions, such as forecasting the next two hours. It can serves as a quality control mechanism by comparing short-term predictions with actual development. As new frames become available, forecasts update, allowing embryologists to track the development trajectory and detect deviations. This strategy aligns well with an embryologist’s routine workflow, where embryo development is monitored in regular intervals.
For clinics practicing Day 3 transfers, ‘Forecasting till the end’ provides a long-term view, while ‘Forecasting the next 7 frames is more suitable for continuous monitoring in extended culture. The latter approach is particularly beneficial for clinics with 24/7 monitoring capabilities, offering a lower risk profile. In contrast, ‘Forecasting till the end’ approach may be useful in settings with limited staffing but carries greater uncertainty, reflecting a different risk-benefit tradeoff.
Discussion
The application of ConvLSTM for forecasting embryo morphology represents a novel advancement over existing tools. Unlike current AI systems that predict discrete outcomes or assess static timepoints, our approach predicts the progression of embryo development over time, enabling embryologists to anticipate future developmental trajectories and making more informed decisions on embryo selection and transfer timing.
Our proposed system demonstrated a strong understanding of embryo morphology dynamics. The system’s FramePredictor predicted video frames that accurately captured the developmental context of embryos in both the cells stage and blastocyst studies, across ‘transfer’ and ‘avoid’ video categories. The FramePredictor was particularly more effective in the transfer category. Furthermore, post-processing the predicted frames with the embryo cropper to eliminate background led to significant improvements in video quality metrics (PSNR, SSIM, and FVD scores). This indicates that the predicted outcome focused on the relevant regions of the embryo rather than merely replicating the background across frame sequences.
Current limitations and proposed solutions
Impact of dataset size on model generalization.
A limitation of FramePredictor’s performance in the blastocyst study for the avoid video category was its limited generalization, as depicted in the performance drop on the independent dataset. It is due to overfitting to the training data. To resolve it, we experimented with various model architectures, including different layer configurations, number of filters, dropout rates and regularization techniques. However, these adjustments had minimal impact on the training loss. At present, we are constrained by the availability of additional videos that meet the less than 15% fragmentation rate criteria, limiting further training on a more diverse dataset. We acknowledge this as a shortcoming of our study and recognize the need for improving generalization in future work.
Proposed solutions
- – Implementing data augmentation techniques for introducing variations in brightness and contrast to enhance the robustness of FramePredictor training.
- – Using transfer learning, fine-tune models pretrained on high-resolution embryo images for improving feature representation in FramePredictor’s predictions.
- – Expanding data diversity through training and evaluating FramePredictor on datasets from multiple clinics with varying laboratory conditions and imaging protocols.
Assessment of image quality in forecasted frames.
The quality of the predicted frames did not meet the standard of the ground truth. At times, the predicted morphology was blurry and had grayscale color distortions. Consequently, this lowered the chance of clinically validating the predictions. These distortions were primarily due to adding extra channels to the input frame sequences. The additional channels provided time period and embryo cell count information enabling the FramePredictor to better understand embryo morphology changes. However, this additional information affected the image feature distribution, leading to image distortions.
Proposed solutions
- – Embedding additional channels in the latent space and evaluate the impact on image quality.
- – Using a pre-trained 3D CNN as a feature extractor for frame sequences. Combining the extracted features with additional channels before inputting them into the FramePredictor to improve feature representation.
- – Integrating an attention module immediately after the first ConvLSTM layer with ReLU activation and 32 filters. This module will incorporate both channel-wise and spatial attention mechanisms, enabling the model to dynamically focus on key features from cell boundaries (cleavage stages) to inner cell mass (blastocyst formation) during development.
- – Including perceptual loss functions, such as SSIM, instead of relying solely on mean squared error, as used in the current FramePredictor training process.
- – Addressing the performance gap between the transfer and avoid categories, implementing a contrastive learning approach to help FramePredictor better differentiate their developmental patterns.
- – Implementing a secondary refinement network that takes the initial ConvLSTM predictions and improves image quality through super-resolution specifically trained on embryo images.
- – Using post-processing strategies such as contrast-enhancement techniques and edge-preserving filters to improve the clarity of critical structures like the zona pellucida, inner cell mass, and cell membranes
Clinical validation and assessment.
The results showed that distortions primarily impacted the background, while embryologists were still able to identify key developmental markers, maintaining the system’s clinical relevance. For instance, predicting blastocyst formation up to 23 hours in advance enables earlier assessment of developmental potential compared to standard protocols. This can support more timely and informed decisions on embryo selection, especially in cases of asynchronous development. However, ensuring high image quality in the predicted video sequence remains important, as the forecasts have clear clinical implications.
Proposed solutions
- – Establish an assessment protocol in collaboration with embryologists to interpret lower-quality predictions by identifying consistently visible features despite distortions. This will reduce visual artifacts and enhance the clarity of generated images, ultimately improving the clinical utility.
Performance analysis and future perspectives
The AI system implemented two forecasting strategies for predicting embryo morphology changes: ‘Forecasting till the end’ and ‘Forecasting the next 7 frames’, across 12 hours in the cells stage study and 23 hours in the blastocyst study. The strategy ‘Forecasting the next 7 frames’ was more effective than the ‘Forecasting till the end’ strategy for both transfer and avoid videos. In the cells stage study, the ‘Forecasting the next 7 frames’ strategy provided valuable insights into the future of the embryo’s development on day 2. It included information such as whether an embryo divides by the end of the study, the synchronization of cell division, and the timing between adjacent cell stages. For the blastocyst study, the ‘Forecasting the next 7 frames’ strategy successfully determined whether the embryo will start to blastulate or not, which is crucial for assessing the embryo’s viability on day 4. However, forecasts were blurred, a limitation that could be improved with further training of the FramePredictor.
Future research should focus on advancing embryo forecasting in several key areas. Integrating patient-specific clinical parameters such as maternal age, hormone levels, previous ART outcomes, and genetic markers, could enable predictions tailored to individual patient profiles in ART. Beyond our ConvLSTM approach, future research should investigate alternative deep learning models. Approaches such as vision transformers tailored for temporal embryo analysis, hybrid spatio-temporal models and multimodal architectures integrating imaging with clinical data may better capture the complexity of embryonic development. Clinical validation is also essential. Long-term studies tracking the predicted embryo development through to pregnancy outcomes are needed to assess the true predictive value and clinical utility of our proposed solution. Moreover, future efforts should focus on real-time deployment of such forecasting systems with existing lab infrastructure and clinical protocols. This would provide continuously updated predictions as new embryo data becomes available throughout the culture period for optimizing laboratory workflows.
Conclusion
The AI system accurately forecasted embryo morphology dynamics on day 2 (cells stage study) and day 4 (blastocyst study) of development. Using a sequence of seven video frames, it predicted the next seven frames, projecting two hours into the future of embryo development. The system effectively forecasted the progression of embryo morphology in both ‘transfer’ and ‘avoid’ categories. By separating the embryo from the background in the forecast and evaluating the predicted morphology, we demonstrated the AI system’s focus on crucial embryo features. In the ‘transfer’ category videos, the AI system accurately predicted the start of the blastocyst stage, with forecasts showing clearer cell membranes, fewer image gray-scale distortions and artifacts than in ‘avoid’ videos. Despite initial issues with blurriness and distortions, the morphological clarity in forecasts improved significantly as the sequence progressed.
Upon evaluating the forecasts, embryologists concluded that changes in embryo morphology were detectable through the forecast sequence, but, image quality required enhancement for clinical validation. Therefore, the proposed AI system demonstrated a potential in assisting embryologists by providing insights into the future dynamics of embryo development.
The current study focused solely on the capabilities of a discriminative AI model in forecasting embryo development. As a next step in research, we plan to explore the use of generative AI models [45] for forecasting the embryo development.
Supporting information
S1 Fig. Inner structure of Convolutional LSTM (ConvLSTM)
ConvLSTM processes an input matrix (Xt) by modeling spatial distribution with temporal dependencies. ConvLSTM achieves this task through element-wise multiplication (hadamard product) of the feature vector of Xt with the LSTM states at time t. Xt corresponds to a video frame at t.
https://doi.org/10.1371/journal.pone.0330924.s001
(PDF)
S2 Fig. Embryo cell stages distribution in the datasets.
The videos used for training and evaluating the AI system contained distributions of embryo development stages as: Part a): Cell stages in the transfer videos for the cells stage study. Part b): Cell stages in the avoid videos for the cells stage study. Part c): Embryo development stages in the transfer videos for the blastocyst study. Part d): Embryo development stages in the avoid videos for the blastocyst study.
https://doi.org/10.1371/journal.pone.0330924.s002
(PDF)
S3 Fig. Cells stage study: FramePredictor’s predicted embryo development at various time points.
a) A transfer video from the independent set (evalCT). b) and c) Avoid videos from the independent set (evalCA). The actual (ground truth) frames are on the left, while the predicted frames are on the right, with time points highlighted in blue and reported in hours post insemination.
https://doi.org/10.1371/journal.pone.0330924.s003
(PDF)
S4 Fig. Blastocyst study: FramePredictor’s predicted embryo development at various time points.
a) A transfer video from the independent set (evalBT). b) and c) Avoid videos from the independent set (evalBA). The actual (ground truth) frames are on the left, while the predicted frames are on the right, with time points highlighted in blue and reported in hours post insemination.
https://doi.org/10.1371/journal.pone.0330924.s004
(PDF)
S5 Fig. Cells stage study: forecasted embryo development at multiple time points with the ‘Forecasting the Next 7 Frames’ strategy.
a) A transfer video from the independent set (evalCT). b) and c) Avoid videos from the independent set (evalCA). The actual (ground truth) frames are on the left, while the predicted frames are on the right, with time points highlighted in blue and reported in hours post insemination.
https://doi.org/10.1371/journal.pone.0330924.s005
(PDF)
S6 Fig. Blastocyst study: forecasted embryo development at multiple time points with the ‘Forecasting the Next 7 Frames’ strategy.
a) A transfer video from the independent set (evalBT). b) and c) Avoid videos from the independent set (evalBA). The actual (ground truth) frames are on the left, while the predicted frames are on the right, with time points highlighted in blue and reported in hours post insemination.
https://doi.org/10.1371/journal.pone.0330924.s006
(PDF)
S1 Appendix. Recurrent neural network: information processing, mathematical definitions and formulas.
https://doi.org/10.1371/journal.pone.0330924.s007
(PDF)
S2 Appendix. Long short-term memory: Information processing, mathematical definitions and formulas.
https://doi.org/10.1371/journal.pone.0330924.s008
(PDF)
S3 Appendix. Convolutional long short-term memory: Information processing, mathematical definitions and formulas.
https://doi.org/10.1371/journal.pone.0330924.s009
(PDF)
Acknowledgments
Claude AI (Claude), family of large language models developed by Anthropic, was used to suggest restructuring of the Discussion section; however, all suggestions and the final structure were thoroughly assessed and revised by the authors.
References
- 1. de Ziegler D, Pirtea P, Carbonnel M, Poulain M, Ayoubi JM. Assisted reproductive technology strategies in uterus transplantation. Fertil Steril. 2019;112(1):19–23. pmid:31277762
- 2. Cui W. Mother or nothing: the agony of infertility. Bull World Health Organ. 2010;88(12):881–2. pmid:21124709
- 3. Glujovsky D, Quinteiro Retamar AM, Alvarez Sedo CR, Ciapponi A, Cornelisse S, Blake D. Cleavage-stage versus blastocyst-stage embryo transfer in assisted reproductive technology. Cochrane Database Syst Rev. 2022;5(5):CD002118. pmid:35588094
- 4. Kragh MF, Rimestad J, Lassen JT, Berntsen J, Karstoft H. Predicting embryo viability based on self-supervised alignment of time-lapse videos. IEEE Trans Med Imaging. 2022;41(2):465–75. pmid:34596537
- 5.
Elder K, Dale B, Ménézo Y, Harper J, Huntriss J. In-vitro fertilization. 3rd ed. Cambridge University Press; 2010.
- 6. Van Royen E, Mangelschots K, De Neubourg D, Valkenburg M, Van de Meerssche M, Ryckaert G, et al. Characterization of a top quality embryo, a step towards single-embryo transfer. Hum Reprod. 1999;14(9):2345–9. pmid:10469708
- 7. Milewski R, Ajduk A. Time-lapse imaging of cleavage divisions in embryo quality assessment. Reproduction. 2017;154(2):R37–53. pmid:28408705
- 8. Dal Canto M, Coticchio G, Mignini Renzini M, De Ponti E, Novara PV, Brambillasca F, et al. Cleavage kinetics analysis of human embryos predicts development to blastocyst and implantation. Reprod Biomed Online. 2012;25(5):474–80. pmid:22995750
- 9. Rehman KS, Bukulmez O, Langley M, Carr BR, Nackley AC, Doody KM, et al. Late stages of embryo progression are a much better predictor of clinical pregnancy than early cleavage in intracytoplasmic sperm injection and in vitro fertilization cycles with blastocyst-stage transfer. Fertil Steril. 2007;87(5):1041–52. pmid:17336973
- 10. ALPHA Scientists In Reproductive Medicine, ESHRE Special Interest Group Embryology. Istanbul consensus workshop on embryo assessment: proceedings of an expert meeting. Reprod Biomed Online. 2011;22(6):632–46. pmid:21481639
- 11. Ciray HN, Campbell A, Agerholm IE, Aguilar J, Chamayou S, Esbert M, et al. Proposed guidelines on the nomenclature and annotation of dynamic human embryo monitoring by a time-lapse user group. Hum Reprod. 2014;29(12):2650–60. pmid:25344070
- 12. Raudonis V, Paulauskaite-Taraseviciene A, Sutiene K, Jonaitis D. Towards the automation of early-stage human embryo development detection. Biomed Eng Online. 2019;18(1):120. pmid:31830988
- 13. Simopoulou M, Sfakianoudis K, Maziotis E, Antoniou N, Rapani A, Anifandis G, et al. Are computational applications the “crystal ball” in the IVF laboratory? The evolution from mathematics to artificial intelligence. J Assist Reprod Genet. 2018;35(9):1545–57. pmid:30054845
- 14.
Khan A, Gould S, Salzmann M. Automated monitoring of human embryonic cells up to the 5-cell stage in time-lapse microscopy images. In: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI); 2015. p. 389–93.
- 15. Dirvanauskas D, Maskeliunas R, Raudonis V, Damasevicius R. Embryo development stage prediction algorithm for automated time lapse incubators. Comput Methods Programs Biomed. 2019;177:161–74. pmid:31319944
- 16. Kragh MF, Rimestad J, Berntsen J, Karstoft H. Automatic grading of human blastocysts from time-lapse imaging. Comput Biol Med. 2019;115:103494. pmid:31630027
- 17. Chavez-Badiola A, Flores-Saiffe-Farías A, Mendizabal-Ruiz G, Drakeley AJ, Cohen J. Embryo Ranking Intelligent Classification Algorithm (ERICA): artificial intelligence clinical assistant predicting embryo ploidy and implantation. Reprod Biomed Online. 2020;41(4):585–93. pmid:32843306
- 18. VerMilyea M, Hall JMM, Diakiw SM, Johnston A, Nguyen T, Perugini D, et al. Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF. Hum Reprod. 2020;35(4):770–84. pmid:32240301
- 19. Kan-Tor Y, Zabari N, Erlich I, Szeskin A, Amitai T, Richter D, et al. Automated evaluation of human embryo blastulation and implantation potential using deep-learning. Advanced Intelligent Systems. 2020;2(10).
- 20. Thirumalaraju P, Hsu JY, Bormann CL, Kanakasabapathy M, Souter I, Dimitriadis I, et al. Deep learning-enabled blastocyst prediction system for cleavage stage embryo selection. Fertility and Sterility. 2019;111(4):e29.
- 21. Bormann CL, Kanakasabapathy MK, Thirumalaraju P, Gupta R, Pooniwala R, Kandula H, et al. Performance of a deep learning based neural network in the selection of human blastocysts for implantation. Elife. 2020;9:e55301. pmid:32930094
- 22. Wang S, Zhou C, Zhang D, Chen L, Sun H. A deep learning framework design for automatic blastocyst evaluation with multifocal images. IEEE Access. 2021;9:18927–34.
- 23. Coticchio G, Fiorentino G, Nicora G, Sciajno R, Cavalera F, Bellazzi R, et al. Cytoplasmic movements of the early human embryo: imaging and artificial intelligence to predict blastocyst development. Reprod Biomed Online. 2021;42(3):521–8. pmid:33558172
- 24. Milewski R, Kuczyńska A, Stankiewicz B, Kuczyński W. How much information about embryo implantation potential is included in morphokinetic data? A prediction model based on artificial neural networks and principal component analysis. Adv Med Sci. 2017;62(1):202–6. pmid:28384614
- 25. Maher ER, Afnan M, Barratt CL. Epigenetic risks related to assisted reproductive technologies: epigenetics, imprinting, ART and icebergs?. Hum Reprod. 2003;18(12):2508–11. pmid:14645164
- 26. Mani S, Ghosh J, Coutifaris C, Sapienza C, Mainigi M. Epigenetic changes and assisted reproductive technologies. Epigenetics. 2020;15(1–2):12–25. pmid:31328632
- 27. Martins WP, Nastri CO, Rienzi L, van der Poel SZ, Gracia C, Racowsky C. Blastocyst vs cleavage-stage embryo transfer: systematic review and meta-analysis of reproductive outcomes. Ultrasound Obstet Gynecol. 2017;49(5):583–91. pmid:27731533
- 28.
Sharma A, Stensen M, Delbarre E, Haugen T, Hammer H. Explainable artificial intelligence for human embryo cell cleavage stages analysis. 2022. p. 1–8.
- 29. Márquez-Hinojosa S, Noriega-Hoces L, Guzmán L. Time-lapse embryo culture: a better understanding of embryo development and clinical application. JBRA Assist Reprod. 2022;26(3):432–43. pmid:35001523
- 30.
Medical E. IVF Time-Lapse Technology for Human Embryo Culture; 2020. https://www.esco-medical.com/news/ivf-time-lapse-technology-for-human-embryo-culture
- 31.
Shi X, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. NIPS’15. Cambridge, MA, USA: MIT Press; 2015. p. 802–10.
- 32.
Lu C, Hirsch M, Scholkopf B. Flexible spatio-temporal networks for video prediction. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017; p. 2137–45.
- 33.
Patraucean V, Handa A, Cipolla R. Spatio-temporal video autoencoder with differentiable memory. CoRR. 2015. https://doi.org/abs/1511.06309
- 34. Eide SS, Riegler MA, Hammer HL, Bremnes JB. Deep tower networks for efficient temperature forecasting from multiple data sources. Sensors (Basel). 2022;22(7):2802. pmid:35408416
- 35. Sun F, Li S, Wang S, Liu Q, Zhou L. CostNet: a concise overpass spatiotemporal network for predictive learning. IJGI. 2020;9(4):209.
- 36.
Joshi A. Next-Frame Video Prediction with Convolutional LSTMs; 2023. https://keras.io/examples/vision/conv_lstm/.
- 37.
LabelBox. https://labelbox.com/
- 38.
Yao J, Wang X, Yang S, Wang B. ViTMatte: boosting image matting with pretrained plain vision transformers. 2023.
- 39.
Rogge N. Transformers tutorials. 2020. https://github.com/NielsRogge/Transformers-Tutorials
- 40. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12. pmid:15376593
- 41.
Unterthiner T, van Steenkiste S, Kurach K, Marinier R, Michalski M, Gelly S. Towards accurate generative models of video: a new metric & challenges. 2019.
- 42. Oprea S, Martinez-Gonzalez P, Garcia-Garcia A, Castro-Vargas JA, Orts-Escolano S, Garcia-Rodriguez J, et al. A review on deep learning techniques for video prediction. IEEE Trans Pattern Anal Mach Intell. 2022;44(6):2806–26. pmid:33320810
- 43. van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;2:e453. pmid:25024921
- 44.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. 2018.
- 45.
Blattmann A, Rombach R, Ling H, Dockhorn T, Kim SW, Fidler S. Align your latents: high-resolution video synthesis with latent diffusion models. 2023.