Hum. Reprod. Advance Access originally published online on April 10, 2006
Human Reproduction 2006 21(8):2141-2148; doi:10.1093/humrep/del106
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Interobserver agreement and intraobserver reproducibility of embryo quality assessments
1 Ferring Pharmaceuticals A/S, Clinical Research and Development, Copenhagen, Denmark 2 The Fertility Clinic, Copenhagen University Hospital, Copenhagen, Denmark 3 Reproductive Medicine, Sahlgrenska University Hospital, Gothenburg, Sweden and 4 Centre for Reproductive Medicine, Academisch Ziekenhuis Vrije Universiteit Brussel, Brussels, Belgium
5 To whom correspondence should be addressed at: Obstetrics and Gynecology, Clinical Research and Development, Ferring Pharmaceuticals A/S, Kay Fiskers Plads 11, Copenhagen S 2300, Denmark. E-mail: joan-carles.arce{at}ferring.com
| Abstract |
|---|
|
|
|---|
BACKGROUND: The objective of this investigation was to determine the inter- and intraobserver agreement when assessing embryo quality. METHODS: This investigation included 4002 cleaved embryos from 7535 oocytes retrieved in 688 patients undergoing IVF cycles in a multicentre trial. Embryos were evaluated locally at the inverted microscope at 28, 44 and 68 h (±1 h) post-insemination. Digital images of the embryos were assessed centrally by three blinded embryologists. To assess reproducibility, 215 randomly selected cleaved embryos from 33 patients were re-evaluated by the three central embryologists. RESULTS: The interobserver agreement among the central embryologists (using the same method of evaluation; 2D images) was good for classification of top-quality embryos (kappa 0.710.73), excellent for classification of normally developed embryos (kappa 0.830.86) and goodexcellent for classification of transferable embryos (kappa 0.780.82). The interobserver agreement between local and consolidated central assessment (different methods of evaluation, inverted microscopy versus 2D images) was good for all three embryo classifications (kappa 0.640.79). The intraobserver reproducibility for all three overall embryo classifications was excellent for the consolidated central assessment (kappa 0.800.91). CONCLUSION: Embryo quality can be determined with a good degree of interobserver agreement independently of the method of evaluation. Embryologists classify embryos with excellent intraobserver reproducibility.
Key words: agreement/embryo quality/interobserver/intraobserver/reproducibility
| Introduction |
|---|
|
|
|---|
Embryo quality is considered an important predictor of implantation and pregnancy (Terriou et al., 1995
There are several obstacles for adequately establishing the impact of an intervention on embryo quality, including the existence of numerous classification systems, many confounding laboratory factors and the subjective component of the evaluation. Cell number at successive days and morphological features of the fertilized oocyte and developing embryo (e.g. pronuclei morphology, fragmentation, blastomere uniformity, etc.) are parameters included in the evaluation of embryo quality (Ziebe et al., 1997
; Van Royen et al., 1999
; Montag and van der Ven, 2001
; De Placido et al., 2002
; Gámiz et al., 2003
; Gianaroli et al., 2003
; Nagy et al., 2003
; Hnida et al., 2004
; Rienzi et al., 2005
). Several classifications and grading systems have been proposed to assess embryo quality based on a composite of cleavage and morphological embryo parameters (Cummins et al., 1986
; Puissant et al., 1987
; Terriou et al., 1995
; Van Royen et al., 1999
; Fisch et al., 2001
), but there is no consensus on which system is the most valid approach for classification of embryo quality. Confounding factors could limit the interpretation of embryo quality findings, such as timing of assessment (Sakkas et al., 2001
), fertilization method (Lundin et al., 2001
), type of media or culture conditions (Bavister, 1995
; Quinn, 2004
) as well as variability in the laboratory processing of oocytes/embryos. Furthermore, the need for conducting trials with a substantial number of patients will require a multicentre approach, and the potential intraobserver and interindividual (and intercentre) variations in embryo scoring may compromise interpretation of embryo quality end-points. Scoring of embryo quality parameters contains a considerable subjective component, and therefore some variation in embryo scoring would be expected among different assessors, even within the same clinic. There is no relevant literature on the very critical issue of interobserver variability and intraobserver reproducibility in embryo assessment, but the value of minimizing variability is recognized and web-based embryo scoring training systems have recently been introduced (e.g., QAP-online, Quality Assurance Program for the Reproductive Sciences).
In a routine laboratory setting, embryo assessments are done in a few seconds at the inverted microscope in order to not compromise embryo culture conditions and embryo development. However, new technologies are becoming available allowing assessment of the same embryo by multiple observers without compromising the viability of the embryo. Imaging systems which capture one still image or a series of still images of the embryo at determined time points permit detailed evaluation of the individual embryos (Hnida and Ziebe, 2004
; Hnida et al., 2004
, 2005
). There is limited knowledge regarding potential biases associated with an evaluation of embryos, with these new technologies providing a 2D image compared to a real-life routine laboratory situation in which a 3D exploration is possible but time limited. The present investigation evaluated the reproducibility of scoring embryo quality as well as the interobserver agreement when embryo quality parameters were evaluated by similar (2D images) or by different (2D images versus 3D live microscopy) methods. In the present study, the impact of new ways of evaluating and documenting embryo quality compared to routine procedures is addressed.
| Materials and methods |
|---|
|
|
|---|
Subjects
This investigation is based on the embryo data collected in the Menotrophin versus rFSH in vitro fertilisation trial (MERIT), which was a randomized, open-label, assessor-blind, multicentre and multinational trial comparing highly purified menotrophin (MENOPUR, Ferring Pharmaceuticals, Copenhagen, Denmark) and rFSH (GONAL-F, Serono, Geneva, Switzerland). The study was conducted at 37 clinics in 10 countries during 2004, following approval by the local independent ethics committees. All subjects provided written informed consent before they were included in the study. The study included 731 subjects undergoing controlled ovarian stimulation for IVF using the long protocol with GnRH agonist down-regulation. Subjects underwent similar pre-randomization and post-randomization procedures across centres. The population included in this study had a mean age of 30.8 years (ranging from 21 to 37 years), with 15% being 3537 years. Their primary reason for infertility was mainly unexplained infertility (43%) and tubal disease (35%) and, to a lesser extent, mild male factor (12%), endometriosis grade I/II (8%) or other factors (2%). Of the 731 patients who initiated controlled ovarian stimulation in the study, 688 patients had oocyte retrieval. From these patients, 7535 oocytes were obtained and inseminated via IVF procedures.
Local evaluation of embryo quality and capture of 2D digital images
Local evaluation of the embryos was done by the embryologists at the participating clinics at 28, 44 and 68 h after insemination using the inverted microscope. The assessments were restricted to a narrow span of ±1 h outside the specified time point. The embryo quality parameters evaluated were cell number, degree of fragmentation, localization of fragments, blastomere uniformity, multinucleation and cytoplasmic appearance (see the section Embryo quality parameters and embryo classification below for detailed description of the parameters). An atlas with representative embryo pictures was prepared as a visual aid for all morphological parameters and distributed to all local embryologists before start of the clinical trial. Furthermore, a common training session with scoring of embryos and production of digital images was held, with the responsible embryologist from each of the clinics participating in the trial. For the majority of clinics, the number of local embryologists involved in embryo scoring throughout the trial was limited to one or two. The responsible local embryologist ensured training of any additional embryologist at the clinic.
The local embryologists took a representative 2D photograph of each embryo at each assessment time point using a digital camera attached to the inverted microscope and computer equipment provided to each centre participating in the study. A custom-made edition of the commercially available software system FertiGRAB (IHMedical, Copenhagen, Denmark) keeping track of the images for each oocyte and time point was used for processing and storage of the images. A representative from the software manufacturer installed the camera and the computer at each clinic, ensuring optimal settings and further technical training of the local embryologists. For each participating clinic, a CD-ROM with embryo pictures from the first patient was immediately sent for review to ensure adequate picture quality at the clinic and, if necessary, feedback on technical improvements was given.
Central evaluation of embryos
A panel of three central embryologists was established for the trial. The central embryologists were blinded to the evaluation made by each other and by the local embryologist. The central embryologists were also unaware of the identity of the clinic from which they were reviewing the specific embryo pictures. Pictures were presented on a patient-by-patient basis, and pictures were shown for each individual embryo in a chronological order before proceeding to the next embryo (i.e. 28, 44 and 68 h for embryo 1, followed by 28, 44, and 68 h for embryo 2 etc.). Pictures were shown simultaneously on three monitors (one for each embryologist), allowing each central embryologist to make an independent evaluation. The central embryologists were not allowed to interact during the evaluations. The embryo quality parameters evaluated by the panel of central embryologists were the same as those assessed by the local embryologists (see description below) and were based on the same definitions and visual aid (i.e. atlas) used by the local embryologists.
Embryo quality parameters and embryo classification
The embryo quality parameters evaluated at the three time points were cell number (0, 1, 2, 3... and compaction) and the following morphology aspects: degree of fragmentation [0,
10, 1120, 2150 and >50% fragmentation or totally fragmented (no blastomeres recognized)], localization of fragments (locally or dispersed), blastomere uniformity [equally sized or unequally sized (largest cell >25% larger in diameter compared to the smallest cell)], visual sign of multinucleation (yes or no) and cytoplasmic appearance (homogeneous or dark, granulated and vacuolated). If the cell number was 1, none of the embryo morphology parameters were assessed.
The overall classification of embryos was established based on the individual embryo scoring parameters at different time points according to pre-established definitions. Three classifications (not mutually exclusive) were specified a priori: (i) top-quality embryos were defined as embryos with 45 cells at 44 h,
7 cells at 68 h, equally sized blastomeres and
20% fragmentation at 68 h and no multinucleation at any time point, (ii) normally developed embryos were defined as embryos with
6 cells at 68 h and
20% fragmentation at 68 h and (iii) transferable embryos were defined as embryos with
4 cells at 68 h, no cleavage arrest (i.e. cleavage must have occurred within the last 24 h) and
20% fragmentation at 68 h.
The consolidated central score for overall embryo classification and each embryo quality parameter was defined as the majority decision (or the median in case the three central embryologists assessed differently). All embryos were evaluated by all the three central embryologists. In the rare event that one of the central embryologists considered a parameter for a specific embryo to be non-assessable and the other two embryologists disagreed, the consolidated central score for that parameter was based on the worst-case scenario of the available assessments.
Basis of analysis
Each embryo contributed with data at 28 h (±1 h), 44 h (±1 h) and 68 h (±1 h) after insemination, or until the local embryologist regarded the oocyte/embryo as out-of-study, in case of laboratory issues, degeneration or otherwise considered non-viable. Oocytes with three or more pronuclei at 20 h after insemination were considered as having a fertilization problem, and data on embryos derived from such oocytes were disregarded from this analysis. Also, this analysis was restricted to embryos that according to the local observations had cleaved during the 3 days of evaluation.
Of the 7535 oocytes retrieved, 5736 embryos were by the local embryologists considered to not have fertilization problems, and of these 4012 cleaved during the evaluation period. Pictures from at least one of the three assessments time points that were of adequate quality for central assessment were available for 99.8% of the cleaved embryos (n = 4002). The number of centrally evaluated images was 3970 at day 1 (28 h post-insemination), 3970 at day 2 (44 h post-insemination) and 3881 at day 3 (68 h post-insemination). The investigation of interobserver agreement among the central embryologists and between the local and central consolidated assessment is based on these images.
To assess the level of reproducibility, embryos from a random sample of 5% of the patients with embryo images were at a later time point re-evaluated by the three central embryologists blinded to their previous assessment and following the same procedures as for the initial assessments. This covered 33 patients with 286 embryos, of which 215 were cleaved embryos.
Statistical methods
All data were analysed pairwise, providing the possibility for using similar approach for comparison between local and central consolidated assessments and between the central assessments.
The level of agreement in evaluation of the individual embryo quality parameters at each time point was evaluated using kappa statistics. The strength of agreement was interpreted as follows: excellent (kappa
0.80), good (0.600.79), moderate (0.400.59), poor (0.200.39) and very poor (<0.20). The kappa score indicates the agreement exceeding chance. Test for equal kappa statistics at different time points was performed using a chi-square test. Owing to the high frequency of 1-cells on day 1 (66% of the embryos assessed by the local embryologists), the level of agreement was only calculated for cell number and not for any of the morphology parameters on day 1.
For the overall embryo classifications, kappa statistics was also used as the primary analysis method. However, three additional measurements of agreement were also used: proportion of positive agreement, proportion of overall agreement and correlation (i.e. tetrachoric correlation coefficient which is the same as the latent trait model approach for multiple assessors).
| Results |
|---|
|
|
|---|
Description of embryo quality findings
Table I describes the main development characteristics of the embryos evaluated based on the local and the consolidated central assessment. The data should be interpreted according to the biological/clinical relevance of the relative difference between observers, as even small differences were statistically significant because of the many observations. At all time points, a slightly lower number of cells was noted for the consolidated central assessment (2D) compared to the local embryologists (3D). Embryos with no fragmentation were found more frequently in the consolidated central assessment. The finding of equally sized blastomeres was more frequent in the consolidated central assessment compared to the local embryologists on day 2 but was reported at a similar frequency on day 3. The incidence of multinucleation was very low, especially for the consolidated central assessment. The local embryologists reported more embryos with localized fragments and fewer with homogeneous cytoplasm, compared to the central consolidated assessment. For all three overall embryo classifications, that is, top-quality embryos, normally developed embryos and transferable embryos, the local embryologists had slightly higher frequencies compared to the consolidated central assessment.
|
Interobserver agreement
Table II displays the kappa values reflecting the level of agreement for each of the parameters evaluated and for each day among the three central embryologists (2D versus 2D) and between the local embryologist and the consolidated central assessment (3D versus 2D). For cell number, there were high kappa values at all days among the central embryologists representing from excellent agreement on day 1 (0.930.94) and day 2 (0.800.81) to good agreement on day 3 (0.65). The kappa scores for cell number for local versus consolidated central assessment indicated excellent agreement on day 1 (0.88), good agreement on day 2 (0.70) and moderate agreement on day 3 (0.46) (Table II, Figure 1). There was a statistically significant (P < 0.001) decrease in kappa scores for cell number over time, as shown in Figure 1. The decrease was more marked from day 2 to day 3 for the kappa values comparing the local and the central embryologists than among the central embryologists (pairwise). The difference in mean cell number between the local and consolidated central assessment became larger as embryo development progressed, increasing from 0.05 on day 1 to 0.18 on day 2 and 0.56 on day 3. On day 2, the local and consolidated central assessments agreed on cell number for 78% of the embryos, while the local embryologist reported one cell more in 10% of the embryos and >1 cell more in 5%. On day 3, they agreed on cell number for 53% of the embryos, and the local embryologist reported the cell number to be one higher for 23% and >1 higher for 15%.
|
|
The kappa values for degree of fragmentation were in the moderate range (0.530.59) among the central embryologists as well as between the local and consolidated central assessment (0.410.44) (Table II). All scores evaluating agreement for degree of fragmentation between local and consolidated central assessment were generally lower than those among the central embryologists. The agreement on degree of fragmentation between assessors was in the same range on days 2 and 3, suggesting that the increase in cell number over time did not affect the agreement on degree of fragmentation. For localization of fragments, the kappa scores were in the poormoderate range among the central embryologists (0.350.53) and also between the local and central consolidated assessment (0.270.40) (Table II). In relation to blastomere uniformity, the kappa values among the central embryologists for days 2 and 3 were above 0.60 (0.610.71), indicating good agreement. All scores evaluating agreement for this parameter between local and consolidated central assessment (0.430.53) were lower than those among the central embryologists. The agreement between the local and consolidated central assessment concerning localization of fragments and blastomere uniformity was better on day 2 than on day 3.
Kappa scores for cytoplasmic appearance indicated moderate agreement (0.430.52) among the central embryologists; however, comparisons of the local versus the consolidated central assessment suggested poor or very poor agreement (0.140.21) (Table II). The agreement on cytoplasmic appearance between local and consolidated central assessment improved from day 2 to day 3.
The kappa values for multinucleation indicated good interobserver agreement among the central embryologists on day 2 (0.640.70) and moderate agreement on day 3 (0.400.46) (Table II). A decrease in agreement for multinucleation from day 2 to day 3 was also observed for the kappa scores between the local and consolidated central assessment, where the concordance level was lower with moderate agreement on day 2 (0.44) and poor agreement on day 3 (0.25).
The classification of embryos by the local embryologists led to 736 top-quality embryos. Central embryologists 1, 2 and 3 classified 716, 573 and 596 embryos as top-quality embryos, respectively, with the consolidated central assessment resulting in 625 top-quality embryos. The proportion of overall agreement between the central embryologists (i.e. 1 and 2, 1 and 3 and 2 and 3) on top-quality embryo classification was 9293%, and all three embryologists agreed on 89%. The proportion of overall agreement between the local and consolidated central assessment on top-quality embryo classification was 90%. Table III displays the different indices of level of interobserver agreement for the three overall embryo classifications (unless otherwise specified, the numbers in the text refer to kappa scores). A good interobserver agreement was found for classification of top-quality embryos among the central embryologists (0.710.73) and between the local and consolidated central assessment (0.64). The best agreement was noticed for classifying embryos as normally developed embryos, with excellent interobserver agreement (0.830.86) among the central embryologists and good agreement (0.79) between the local and consolidated central assessment. The agreement for classifying embryos as transferable was at a level between that for classifying embryos as top-quality embryos and normally developed embryos, with good/excellent interobserver agreement among the central embryologists (0.780.82) and good agreement between the local and consolidated central assessment (0.71). The other indices of level of agreement supported the high interobserver agreement in overall embryo classification between the local and consolidated central assessment and among the central panel of embryologists. The correlation between the local and consolidated central assessment was in the range of 0.880.95 for the overall embryo classifications, and among the central embryologists the correlation was above 0.90.
|
An analysis of the subset of those embryos that were actually transferred in the study found that the agreement between local and consolidated central assessment was very similar to that for the overall data set. The proportion of positive agreement for classifying an embryo as transferable was 94% for the transferred embryos.
Intraobserver agreement (reproducibility)
The kappa scores related to the reproducibility of individual embryo quality parameter assessments done by the central embryologists are displayed in Table IV. The kappa values for each of the three central embryologists indicated a goodexcellent reproducibility at all time points when re-assessing cell number. With increasing day of embryo development, the kappa scores for reproducibility of cell number decreased significantly (P < 0.001). For degree of fragmentation, the kappa values for each of the three central embryologists indicated a good intraobserver agreement (0.640.77). The reproducibility of assessing localization of fragments ranged from moderate to excellent (0.490.90), depending on central embryologist and assessment time point. A good/excellent range of intraobserver agreement (0.630.81) was found for blastomere uniformity. The reproducibility of cytoplasmic appearance was found to be good (0.610.79) for days 2 and 3, except for day 2 for central embryologist 3 (0.24). Regarding multinucleation, the level of reproducibility ranged from moderate to excellent on day 2 (0.530.88) and moderate to good on day 3 (0.480.66).
|
Table V displays different indices of level of intraobserver agreement for the overall embryo classifications (unless otherwise specified, the numbers in the text refer to kappa scores). The level of reproducibility for classifying top-quality embryos was excellent for two of the three central embryologists (0.86 and 0.81) and good for the third embryologist (0.79). The kappa value indicated excellent reproducibility of the top-quality embryo classification for the consolidated central assessment (0.80). An excellent level of reproducibility was seen for all three central embryologists (0.840.89) and the consolidated central assessment (0.90) with respect to classification of normally developed embryos. When classifying embryos as transferable, the kappa scores showed excellent reproducibility for the consolidated central evaluation (0.91) and for two of the three embryologists (0.820.89) and good reproducibility for the third central embryologist (0.79). The estimated correlation for reproducibility of the overall embryo classifications was above 0.95 for the central embryologists, supporting the high level of intraobserver agreement indicated by the kappa statistics.
|
| Discussion |
|---|
|
|
|---|
This study has provided a unique set-up to evaluate the inter- and intraobserver agreement among embryologists when scoring selected embryo quality parameters. The kappa scores of the evaluations made by the central embryologists provide information about the degree of interobserver agreement when assessments are based on identical 2D digital pictures and with adequate time to evaluate the images. The kappa scores between the local and consolidated central assessment provide information about the degree of interobserver agreement when assessments are based on different conditions, that is the central conditions as described above, and the local conditions based on a 3D exploration at the inverted microscope and with limited evaluation time in order not to compromise embryo quality. These major methodological questions have not been addressed so far in the literature, and, to our knowledge, this is also the first study assessing the reproducibility of embryo quality scoring based on a large dataset and with trained embryologists using a standardized scoring system. It should be noted that this investigation included only embryos that had no fertilization problems and that cleaved during the evaluation period, and that the kappa values for cell number and the overall embryo classifications would have been higher if the analysis had included all evaluated embryos, irrespective of cleavage, as many observations of 1-cells will increase agreement. In addition to the kappa scores, the level of interobserver agreement and intraobserver reproducibility on overall embryo classifications was also analysed by less conservative indices of agreement, that all supported the conclusion of good-to-excellent agreement in overall embryo classification.
This investigation showed that the interobserver agreement of embryo quality parameters when based on the same method of evaluation, 2D digital images, was high. The interobserver agreement on cell number, blastomere uniformity, degree of fragmentation and presence of multinucleation ranged from moderate to excellent among the central embryologists. As these four embryo quality parameters were included in the classification of top-quality embryos, it is not surprising that the interobserver agreement among the central embryologists on classifying top-quality embryos was found to be good. The central embryologists only disagreed on top-quality/not top-quality classification in 78% of the embryos evaluated. This represents an extremely high level of concordance among observers when using a similar technique of evaluation.
Comparison of local assessment versus consolidated central assessment, representing different evaluation conditions (3D versus 2D), also resulted in a moderate-to-excellent level of interobserver agreement for cell number, degree of fragmentation and blastomere uniformity. Some parameters, however, had a relatively low level of agreement when evaluated by different methods, and it is interesting to speculate on the potential reasons for the discrepancy. For example, lower kappa values in the poor-to-moderate range were noted for localization of fragments. The atlas provided for this investigation included definitions of local and dispersed fragments based on whether the fragments were confined to a well-defined local area of the embryo or distributed in a scattered pattern possibly originating from several blastomeres. However, despite this attempt to describe local and dispersed fragments, localization of fragments remained a less strictly defined parameter. It is also speculated that the evaluation of this parameter is dependent upon conditions of the images, such as focusing and light. Likewise, the assessment of multinucleation is also thought to be influenced by the observer setting, with a higher frequency of multinucleation observed by the local embryologists. Overall, multinucleation was a very infrequent observation, and therefore the agreement expected just by chance is high, impacting the kappa values. The low level of agreement associated with multinucleation did not have a major impact on the top-quality embryo rates, as there were very few observations of multinucleation for both local and consolidated central assessment. The poorest concordance rate across observers was found for cytoplasmic appearance. The poor-to-very poor level of agreement between the local and consolidated central assessment, but not among the central embryologists, in rating the cytoplasmic appearance suggests a technical limitation of evaluating this parameter by 2D images. Cytoplasmic appearance was not included in any of the overall embryo classifications, and the low agreement between local and consolidated central assessment on cytoplasmic appearance did therefore not influence the classifications.
A concern could be that the kappa scores in this investigation are not representative of the true level of agreement associated with the evaluation methods, but that the level of agreement would vary according to the quality of the embryos being studied. A subanalysis of embryos classified as not normally developed by both the local and consolidated central assessments showed that the kappa values for the individual embryo quality parameters (data not shown) were in line with those found for the overall set of embryos, indicating that assessment of poor quality embryos is not associated with a poorer level of agreement compared to that of good-quality embryos.
Despite the differences in evaluation conditions between the local and central embryologists, the classification of top-quality embryos was in good agreement between local and consolidated central assessment, and the disagreement on classifying embryos as top quality or not top quality was limited to 10% of the embryos evaluated. It is assumed that the initial training of participating embryologists and the atlas serving as reference guidance throughout the evaluation period have resulted in this high level of agreement. A further analysis of the individual parameters for those embryos where the local and central embryologists disagreed in overall classification (top quality or not top quality) revealed that the main differences were that the central embryologists reported a lower frequency of
7 cells on day 3 and a higher frequency of unequally sized blastomeres on day 3. Concerning the assessment of blastomere uniformity, it should be noted that the panel of central embryologists had rulers to measure the diameter of the blastomeres directly on the computer monitor facilitating measurable determination of cell diameter and thus blastomere uniformity. In contrast, the local embryologists evaluated blastomere uniformity based on the more subjective observation at the inverted microscope.
As would be expected with the increasing differentiation of embryos, the kappa scores for most of the parameters tended to decrease from day 2 to day 3. This was particularly noticeable for cell number, for which the concordance rates at day 3 dropped further between the local and consolidated central assessment compared to among the central embryologists. The different conditions of the observers may be a reasonable explanation for this differential pattern between local and central scores. The impact of evaluating embryos in 2D images compared to the actual in vivo laboratory situation had never been established. The cell number could be expected to be lower in the consolidated central assessment compared to the local assessment as the local embryologist has the possibility to evaluate all potential superimposed cells, being able to focus in several layers. In this investigation, a systematic difference between the local and consolidated central assessment was noted for cell number, with the difference increasing each day. On day 2, the consolidated central assessment score was on average 0.2 cells less than that reported by the local embryologist. By day 3, the difference had increased to around 0.5 cells. These mean differences represent a consistent pattern in evaluation of cell number between the local and consolidated central assessments. The difference in cell number is very small, and therefore it should not be considered a major limitation of an evaluation of cell number based on 2D pictures. Furthermore, implementation of an assessment system based on 2D images in large trials comparing interventions would not be affected by this minor cell difference as the effect would be similar across intervention groups. Thus, the validity of a digital imaging system for comparing cell number across interventions in the same study would remain unaffected. Another hypothesis for the lower number of cells in the central assessment could be that what had been considered as cells in the local evaluation may have been considered as large fragments when evaluated in the 2D images by the central embryologists. However, the lower number of cells in the consolidated central assessment compared to the local assessment did not result in an increased degree of fragmentation, suggesting that the lower mean cell number was not linked to an increased reporting of a higher degree of fragmentation, but rather to the difficulty of seeing superimposed cells in a 2D picture.
The definition of top-quality embryos comprised the evaluation of seven parameters (cell number on day 2, cell number on day 3, blastomere uniformity on day 3, fragmentation on day 3, multinucleation on days 1, 2 and 3). The definitions of normally developed embryos and transferable embryos were simpler, as they contained only two and three parameters, respectively. It is therefore not surprising that the agreements on classification of normally developed embryos and transferable embryos were higher than that seen with top-quality embryos. The highest kappa values, both for interobserver agreement and for intraobserver agreement, were observed for normally developed embryos. The agreement for transferable embryos was slightly lower, probably because cell number on both day 2 and day 3 was considered for this classification.
Regarding the evaluation of intraobserver agreement, the findings suggest that most of the embryo quality assessments based on digital images have a goodexcellent reproducibility. In clinical practice, the decision on which embryo(s) to transfer is based on an overall evaluation of embryo quality, rather than the individual parameters. It is therefore interesting that the identification of top-quality embryos, normally developed embryos and transferable embryos based on scoring of selected embryo quality parameters can be reproduced to an excellent level when implementing the evaluation conditions of the centralized assessment.
For protocols that affect embryo quality either positively or negatively, the direction of the findings would be expected to be consistent between local and consolidated central assessment. Understanding the advantages and limitations of the mode of assessment for each embryo quality parameter should facilitate the interpretation of the findings. When the goal is to obtain an actual incidence of a cell number associated with a particular intervention, local assessment may be optimal. It appears from this investigation that the centralized embryo assessment could lead to a lower mean cell number on day 3 and number of top-quality embryos, but this should not be a limitation when the intent is to compare the effect of two interventions. Further, using new techniques like multilevel analysis (Hnida and Ziebe, 2004
; Hnida et al., 2004
, 2005
) may overcome this limitation. The centralized evaluation provides an easy assessment with a high interobserver agreement of blastomere uniformity and degree of fragmentation, both of which are sufficiently evaluated in the 2D setting. Considerations should therefore be given to an integration of the evaluations best made in the in vivo 3D setting and the parameters which gain from the detailed and time-consuming assessments which can be made based on 2D images. Thus, a combined approach of local assessment of cell number, central assessment of degree of fragmentation and blastomere uniformity, and both local and central assessment of multinucleation would provide a strong estimate for determining the quality of each embryo.
In conclusion, a satisfactory level of inter- and intraobserver agreement has been documented for the main embryo quality morphological parameters, and it has been shown that embryologists can determine overall embryo classifications with a goodexcellent interobserver agreement and with excellent intraobserver reproducibility. Training sessions and reference guidelines on standardized embryo scoring systems are recommended for increasing observer agreement.
| Acknowledgements |
|---|
|
|
|---|
The authors thank Vibeke Breinholt, RN and Louise Koefoed Steen, MSc, Ferring Pharmaceuticals, for contribution to the conduct of this investigation.
| Conflicts of interests |
|---|
|
|
|---|
Joan-Carles Arce, Lisbeth Helmgaard and Per Sørensen are employees of Ferring Pharmaceuticals. Søren Ziebe, Kersti Lundin and Ronny Janssens have conducted clinical research sponsored by Ferring Pharmaceuticals.
| References |
|---|
|
|
|---|
Arce JC, Nyboe Andersen A, Collins J. (2005) Resolving methodological and clinical issues in the design of efficacy trials in assisted reproductive technologies: a mini-review. Hum Reprod 20:17571771.
Bavister BD. (1995) Culture of preimplantation embryos: Facts and artifacts. Hum Reprod Update 1:91148.
Cummins JM, Breen TM, Harrison KL, Shaw JM, Wilson LM, Hennesey JF. (1986) A formula for scoring human embryo growth rates in in vitro fertilization: Its value in predicting pregnancy and in comparison with visual estimates of embryo quality. J In Vitro Fert Embryo Trans 3:284295.[CrossRef][Medline]
De Placido G, Wilding M, Strina I, Alviggi E, Alviggi C, Mollo A, Varicchio MT, Tolino A, Schiattarella C, Dale B. (2002) High outcome predictability after IVF using a combined score for zygote and embryo morphology and growth rate. Hum Reprod 17:24022409.
Fisch JD, Rodriguez H, Ross R, Overby G, Sher G. (2001) The Graduated Embryo Score (GES) predicts blastocyst formation and pregnancy rate from cleavage-stage embryos. Hum Reprod 16:19701975.
Gámiz P, Rubio C, de los Santos MJ, Mercader A, Simón C, Remohí J, Pellicer A. (2003) The effect of pronuclear morphology on early development and chromosomal abnormalities in cleavage-stage embryos. Hum Reprod 18:24132419.
Gianaroli L, Magli MC, Ferraretti AP, Fortini D, Grieco N. (2003) Pronuclear morphology and chromosomal abnormalities as scoring criteria for embryo selection. Fertil Steril 80:341349.[CrossRef][ISI][Medline]
Hnida C and Ziebe S. (2004) Total cytoplasmic volume as biomarker of fragmentation in human embryos. J Assist Reprod Genet 21:335340.[CrossRef][ISI][Medline]
Hnida C, Agerholm I, Ziebe S. (2005) Traditional detection versus computer-controlled multilevel analysis of nuclear structures from donated human embryos. Hum Reprod 20:665671.
Hnida C, Engenheiro E, Ziebe S. (2004) Computer-controlled, multilevel, morphometric analysis of blastomere size as biomarker of fragmentation and multinuclearity in human embryos. Hum Reprod 19:288293.
Lundin K, Bergh C, Hardarson T. (2001) Early embryo cleavage is a strong indicator of embryo quality in human IVF. Hum Reprod 16:26522657.
Montag M and van der Ven H. (2001) Evaluation of pronuclear morphology as the only selection criterion for further embryo culture and transfer: Results of a prospective multicentre study. Hum Reprod 16:23842389.
Nagy ZP, Dozortsev D, Diamond M, Rienzi L, Ubaldi F, Abdelmassih R, Greco E. (2003) Pronuclear morphology evaluation with subsequent evaluation of embryo morphology significantly increases implantation rates. Fertil Steril 80:6774.[ISI][Medline]
Puissant F, Van Rysselberge M, Barlow P, Deweze J, Leroy F. (1987) Embryo scoring as a prognostic tool in IVF treatment. Hum Reprod 2:705708.
Quinn P. (2004) The development and impact of culture media for assisted reproductive technologies. Fertil Steril 81:2729.[ISI][Medline]
Rienzi L, Ubaldi F, Iacobelli M, Romano S, Minasi MG, Ferrero S, Sapienza F, Baroni E, Greco E. (2005) Significance of morphological attributes of the early embryo. RBM Online 10:669681.
Sakkas D, Percival G, DArcy Y, Sharif K, Afnan M. (2001) Assessment of early cleaving in vitro fertilized human embryos at the 2-cell stage before transfer improves embryo selection. Fertil Steril 76:11501156.[CrossRef][ISI][Medline]
Terriou P, Giorgetti C, Hans E, Spach JL, Salzmann J, Billé V, Roulier R. (1995) Intracytoplasmic sperm injection and embryo quality: Comparison with conventional IVF. Contracept Fertil Sex 23:471473.[ISI][Medline]
Thurin A, Hardarson T, Hausken J, Jablonowska B, Lundin K, Pinborg A, Bergh C. (2005) Predictors of ongoing implantation in IVF in a good prognosis group of patients. Hum Reprod 20:18761880.
Van Royen E, Mangelschots K, De Neubourg D, Valkenburg M, Van de Meerssche M, Ryckaert G, Eestermans W, Gerris J. (1999) Characterization of a top quality embryo, a step towards single-embryo transfer. Hum Reprod 14:23452349.
Ziebe S, Petersen K, Lindenberg S, Andersen AG, Gabrielsen A, Nyboe Andersen A. (1997) Embryo morphology or cleavage stage: How to select the best embryos for transfer after in-vitro fertilization. Hum Reprod 12:15451549.
Submitted on November 28, 2005; resubmitted on February 22, 2006; accepted on March 14, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Ziebe, K. Lundin, R. Janssens, L. Helmgaard, J.-C. Arce, and for the MERIT (Menotrophin vs Recombinant FSH in v Influence of ovarian stimulation with HP-hMG or recombinant FSH on embryo quality parameters in patients undergoing IVF Hum. Reprod., September 1, 2007; 22(9): 2404 - 2413. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

