Hum. Reprod. Advance Access originally published online on December 20, 2006
Human Reproduction 2007 22(4):1156-1160; doi:10.1093/humrep/del460
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prospective cross-validation of three methods of predicting failing pregnancies of unknown location
1 Early Pregnancy, Gynaecological Ultrasound and MAS Unit, St. George's University of London, UK 2 Department of Electrical Engineering (ESAT), K.U. Leuven, Belgium 3 Department of Obstetrics and Gynaecology, University Hospital Gasthuisberg, K.U. Leuven, Belgium 4 Early Pregnancy Unit, Nepean Centre for Perinatal Care and Research, Nepean Clinical School, University of Sydney, Nepean Hospital, Sydney, Australia
5 To whom correspondence should be addressed at: Tel: +61 423 789 127; E-mail: gcondous{at}hotmail.com
| Abstract |
|---|
|
|
|---|
BACKGROUND: We compared the performance of each of three tests for predicting pregnancy failure in the pregnancy of unknown (PUL) population.
METHODS: In a prospective observational study, we compared the performance of three models for the prediction of pregnancy failure in women with a PUL: (i) logistic-regression model incorporating vaginal bleeding, endometrial thickness (ET), initial serum progesterone and hCG levels; (ii) serum progesterone at 0 h; and (3) the hCG ratio.
RESULTS: A total of 5942 consecutive pregnant women attending the Early Pregnancy Unit were scanned and 439 (7.4%) were classified as PULs. Of these women, 420 had complete data for serum hCG at 0 and 48 h, the hCG ratio, serum progesterone at 0 h, vaginal bleeding and ET. The final outcomes were 219 (52.1%) failing PULs, 167 (39.8%) intra-uterine pregnancies and 34 (8.1%) ectopic pregnancies. For the prediction of pregnancy failure in the PUL population, the area under the receiver operating characteristic (ROC) curve (AUC) for the logistic-regression model was 0.907 (Standard error (SE) 0.015), the AUC for serum progesterone was 0.952 (SE 0.010) and the AUC for the hCG ratio was 0.980 (SE 0.004). This improved performance of the hCG ratio was significant when compared with that of initial serum progesterone (P = 0.0076) and the logistic-regression model (P < 0.0001). Using the hCG ratio cut-offs of < 0.87 and
0.79, the serum progesterone cut-off of < 20 nmol l1 and the logistic-regression cut-off of >70% for the prediction of failing PUL, sensitivities were 86.3 and 82.2% (S), 87.2% (NS) and 78.1% (S), specificities were 97.0 and 98.0% (NS), 89.6% (S) and 88.6% (S), positive-likelihood ratios were 28.91, 41.30, 8.34 and 6.82 and negative-likelihood ratios were 0.14, 0.18, 0.14 and 0.25, respectively.
CONCLUSION: The hCG ratio seems to be an optimal test for the prediction of pregnancy failure in a PUL population. The hCG ratio cut-off of 0.79 is recommended on the basis of minimizing risk to those PULs discharged at 48 h. Most importantly, when the hCG ratio is below a particular cut-off, these women can be discharged at 48 h without intervention and need for further follow-up.
Key words: failing pregnancy of unknown location/hCG ratio/logistic regression/progesterone
| Introduction |
|---|
|
|
|---|
Failing pregnancies of unknown location (PULs) or trophoblast in regression accounts for 4469% of the PUL population and is never visualized using transvaginal ultrasonography (TVS) (Hajenius et al., 1995
More recently, the hCG ratio, defined as the hCG at 48 h/hCG at 0 h, outperformed single values of serum hCG taken at 0 and 48 h, for the prediciton of failing PULs (Condous et al., 2006
). When using an hCG ratio cut-off of < 0.87, the sensitivity and specificity for the prediction of failing PULs were 92.7 and 96.7%, respectively. These results were obtained when the hCG ratio was tested in a different population to the one in this study. Barnhart et al. (2004) previously described a rate of decline in serum hCG >21% to define spontaneous resolution of PULs this equates to an hCG ratio cut-off of 0.79.
The aim of this study was to compare the diagnostic performance of the hCG ratio, serum progesterone and the previously developed logistic-regression model with regard to predicting failing PULs (Banerjee et al., 2001
; Barnhart et al., 2004
; Condous et al., 2006
).
| Materials and methods |
|---|
|
|
|---|
Data collection
We undertook a non-interventional prospective study of 5942 consecutive first trimester pregnant women attending the Early Pregnancy Unit (EPU) at St. George's Hospital, London between the 18th of July 2003 and the 9th of October 2004 inclusive. All women underwent a TVS by midwife trained ultrasonographers who had at least four years' scanning experience in the EPU or Clinical Fellows (after hours) who had at least two years' experience in gynaecology and were proficient at scanning. The diagnosis of a PUL was made at the initial visit and was defined on the basis of a TVS as there being no signs of either an intra- or extrauterine pregnancy or retained products of conception in a woman with a positive pregnancy test.
If any of the following was present on TVS, these women were excluded from the PUL population:
- Visualisation of any evidence of an intrauterine sac at the initial scan or
- Identification of an adnexal mass thought to be an EP at the initial scan or
- The presence of heterogeneous, irregular tissues within the uterus thought to be an incomplete miscarriage at the initial scan or
- Women who were clinically unstable or had an acute abdomen or had a blood in the pouch of Douglas (POD) according to the scan images at the time of the initial scan.
Indications for sonography included lower abdominal pain, with or without vaginal bleeding, poor obstetric history or the need to determine gestational age. Each woman with a PUL had a thorough history taken (including the presence or absence of lower abdominal pain with or without vaginal bleeding), and examination was performed using a 5-MHz transvaginal probe (Aloka SSD 900, 2000 or 4000, Keymed Ltd, Southend, UK and Aloka Co. Ltd., Tokyo, Japan). The absence or presence of vaginal bleeding was expressed as a bleeding score of 0 or 1. Demographic data including the woman's age and gestational age (estimated on the basis of last menstrual period) at presentation were recorded. Ultrasonographic features including endometrial thickness (ET), fluid in the POD and the character of the midline endometrial echo were also recorded. The ET was measured on ultrasound in the sagittal plane at the point of maximal thickness. Any fluid noted in the POD was measured in two planes. These measurements were recorded in millimetres (mm). The character of the midline endometrial echo was assessed at the first scan and described as being either intact or disrupted.
All women classified with a PUL had peripheral blood taken for serum hCG (World Health Organization, Third International Reference 75/537) and progesterone measurements (Roche Elecsys 2010 Progesterone II test, Roche Diagnostics, Lewes, UK) using automated electrochemiluminescence immunoassays. The first blood was drawn on the same day the woman presented and these levels were measured 48 h later, according to the unit protocol. If the woman presented before 17:00 h, the phlebotomy service took the first blood sample and the second was arranged 48 h thereafter. Any women who presented after 17:00 h had the first blood sample taken by the on-call Clinical Fellow who was on duty to perform ultrasound scans and take blood samples until 22:00 h 7 days a week. No scans were performed between 22:00 h and 08:00 h and therefore the second blood sample could be taken at or close to 48 h. This system ensured that variations in the timing of the 48-h blood sample were kept to a minimum. Women with complete demographic, ultrasonographic and biochemical data were included in the final analysis.
Women were followed up until a final diagnosis was established: a failing PUL, an IUP, EP or persisting PUL. The persisting PUL group is defined as those PULs whose serum hCG levels fail to decline, tend to be low ( < 500 U l1) and have reached a plateau. The location of the persisting PUL group was never ascertained and a proportion of these represent either IUPs or EPs. In order to give the reader the worse-case scenario, these were incorporated into the EP group.
The diagnosis of a failing PUL according to the unit's protocol was defined as a serum progesterone at presentation <20 nmol l1 with a subsequent fall in serum hCG levels to <5 U l1, and the location of these pregnancies remained unknown. Note that all other diagnostic models for failing PUL, including the hCG ratio and logistic regression, were not used in the clinical work-up of PUL women. The diagnosis of an IUP was made on TVS when a gestational sac was visualized within the endometrial cavity (Warren et al., 1989
). EPs were diagnosed using TVS and/or laparoscopy with confirmatory histology of chorionic villi. In our unit, the use of ultrasound as the primary way to diagnose EPs is accurate in 93.2% of cases (Condous et al., 2005c
). The diagnosis of EP was based on the positive visualization at TVS examination of an adnexal mass. Ultrasonographic diagnosis of an EP was made if one of the following grey-scale appearances was present in the adnexal region: (i) an inhomogeneous mass adjacent to the ovary and moving separate to this known as the blob sign (Condous et al., 2005c
) or (ii) a mass with a hyper-echoic ring around the gestational sac referred to as the bagel sign (Condous et al., 2005c
) or (iii) a gestational sac with a fetal pole with or without cardiac activity (Condous et al., 2005c
).
Statistical analysis
Three diagnostic models were then applied retrospectively to the same data set in order to determine the best test for the prediction of pregnancy failure in the PUL population.
- Logistic-regression model where the probability to predict failing PUL = 1/(1 + ez), where z = 2.20 0.15 * progesterone (nmol l1) + 3.36 * bleeding score 0.0013 * serum hCG (U l1) + 0.45 * endometrial thickness (mm). A probability score of 70% was used as the cut-off level for the positive prediction of a spontaneously resolving pregnancy (Banerjee et al., 1999
).
- Serum progesterone at presentation. A level <20 nmol l1 was used as the cut-off to predict failing PUL (Banerjee et al., 2001
).
- hCG ratio at 48 h, including a comparison of the performance of hCG ratio cut-offs of
0.79 versus <0.87 to predict failing PUL (Barnhart et al., 2004
, Condous et al., 2006
).
The performance of each of the different cut-offs, hCG ratio <0.87, hCG ratio
0.79, serum progesterone <20 nmol l1 at 0 h and logistic-regression model cut-off of >70%, was also evaluated in terms of sensitivity, specificity, positive-likelihood ratio (LR(+)) and negative-likelihood ratio (LR()). Percentages are followed by their 95% confidence intervals (CI) that were computed using the continuity-corrected efficient score method as described by Newcombe (1998). McNemar's test was used to investigate whether the differences in sensitivity and specificity for the different cut-offs were significant. P-values of <0.05 were considered to indicate statistical significance.
Statistical analyses were conducted with Statistical Analysis System (SAS) Version 9.1, Cary, North Carolina, USA.
| Results |
|---|
|
|
|---|
A total of 5942 consecutive pregnant women were scanned during the study period and 439 (7.4%) consecutive women were classified as PULs. Of the 439 eligible women, 420 had complete data for serum hCG at 0 and 48 h, the hCG ratio, serum progesterone at 0 h, vaginal bleeding and ET. These were therefore used in the final analysis. The final outcomes were 219 (52.1%) failing PULs, 167 (39.8%) intrauterine pregnancies and 34 (8.1%) EPs.
Table I shows the descriptive statistics for all 420 PULs and for each of the three outcome groups: failing PULs, IUPs and EPs.
|
For the prediction of failing PULs, the AUC for the logistic-regression model was 0.907, the AUC for serum progesterone was 0.952 and the AUC for hCG ratio was 0.980. (Figure 1). According to the AUCs, the hCG ratio performed significantly better than both serum progesterone and the logistic-regression model, with P-values of 0.0076 and 0.0001, respectively.
|
Table II demonstrates the sensitivities, specificities, LR(+) and LR() for each of the diagnostic model cut-offs for the prediction of failing PUL. Table III shows the statistical significance of the differences in sensitivity and specificity between the two hCG ratio cut-offs, <0.87 and
0.79, and those of the serum progesterone cut-off of <20 nmol l1 and the logistic-regression model cut-off of >70%.
|
|
| Discussion |
|---|
|
|
|---|
On the basis of this comparative study, the hCG ratio seems to be the best of the three diagnostic tests for predicting failing PULs. The hCG ratio and serum progesterone significantly outperformed the previously developed logistic-regression model. We believe that clinicians can rely on biochemical markers alone and in particular the hCG ratio to predict spontaneously resolving PULs which do not need intervention. Clinical information in the form of vaginal bleeding and ultrasonographic measurements including the ET are not necessary in the work-up of these women.
This is the first study to compare biochemical markers of pregnancy failure in the PUL population, i.e. the hCG ratio versus initial serum progesterone. Both the hCG ratio and serum progesterone are simple diagnostic tests to apply to a PUL population; however, the higher laboratory expense is another factor against the routine use of serum progesterone.
There was a large overlap in the CI for the LR(+) and LR() for the different hCG ratio cut-offs (0.87 and 0.79). EPUs can adopt either hCG ratio cut-off (0.87 versus 0.79) depending upon what is clinically important to that individual unit, i.e. to predict failing PULs or not to miss non-failing PULs. The LR which is clinically most relevant will determine the hCG ratio cut-off. If a unit believes that clinically the most important PUL outcome to predict is the failing PUL group, then they choose the hCG cut-off with the highest LR(+), i.e. 0.79. Conversely, if it is more important to identify those pregnancies that are not failing, then they choose the hCG ratio cut-off with the lowest LR(), i.e. 0.87. We believe that a diagnostic test which allows women with a PUL to be discharged at 48 h should ideally have a high LR(+) and high specificity. This is to ensure that we do not discharge a woman with a false-positive test and therefore potentially cause harm. Thus, the hCG ratio cut-off of 0.79 is recommended.
Individual hCG ratio cut-offs, either 0.87 or 0.79, can be used for clinical decision-making in the PUL population. We believe that women whose hCG ratio is below a particular cut-off can be discharged at 48 h without intervention (Condous et al., 2006
). Conversely, those women who have an hCG ratio above a particular cut-off, either 0.87 or 0.79, need follow-up in the form of repeat ultrasound scan in 7 days. If this scan does not locate the pregnancy, then repeating the hCG ratio would be appropriate in order to plan further management. The relative simplicity and easy application of the hCG ratio when dealing with a PUL population mean that there are few limitations to its use by the majority of health professionals.
The hCG ratio performs very well when applied to the PUL population at St. George's Hospital EPU (Condous et al., 2006
). The authors believe that EPUs which use the World Health Organization, Third International Standard (75/537), for the quantification of serum hCG levels, can adopt this rule. However, prospective studies in other institutions are needed to validate its general application. In conclusion, the hCG ratio is the optimal diagnostic test for the prediction of failing PULs. The hCG ratio cut-off of 0.79 is recommended on the basis of minimizing risk to those PULs discharged at 48 h. The hCG ratio can be the basis for clinical decision-making in the management of PULs. Most importantly, when the hCG ratio is below a particular cut-off, these women can be discharged at 48 h without intervention and need for further follow-up.
| References |
|---|
|
|
|---|
Banerjee S, Aslam N, Zosmer N, Woelfer B, Jurkovic D. (1999) The expectant management of women with pregnancies of unknown location. Ultrasound Obstet Gynecol 14:231236.[CrossRef][Web of Science][Medline]
Banerjee S, Aslam N, Woelfer B, Lawrence A, Elson J, Jurkovic D. (2001) Expectant management of early pregnancies of unknown location: a prospective evaluation of methods to predict spontaneous resolution of pregnancy. Br J Obstet Gynaecol 108:158163.[CrossRef][Web of Science]
Barnhart KT, Sammel MD, Chung K, Zhou L, Hummel AC, Guo W. (2004) Decline of serum human chorionic gonadotropin and spontaneous complete abortion: defining the normal curve. Obstet Gynecol 104:975981.[CrossRef][Web of Science][Medline]
Condous G, Okaro E, Bourne T. (2005a) Pregnancies of unknown location: diagnosis dilemmas and management. Curr Opin Obstet Gynecol 17:568573.[Web of Science][Medline]
Condous G, Okaro E, Khalid A, Lu C, Van Huffel S, Timmerman D, Bourne T. (2005b) A prospective evaluation of a single-visit strategy to manage pregnancies of unknown location. Hum Reprod 20:13981403.
Condous G, Okaro E, Khalid A, Lu C, Van Huffel S, Timmerman D, Bourne T. (2005c) The accuracy of transvaginal ultrasonography for the diagnosis of ectopic pregnancy prior to surgery? Hum Reprod 20:14041409.
Condous G, Kirk E, Van Calster B, Van Huffel S, Timmerman D, Bourne T. (2006) Failing pregnancies of unknown location: a prospective evaluation of the human chorionic gonadotrophin ratio. Br J Obstet Gynaecol 113:521527.
DeLong ER, DeLong DM, Clarke-Pearson DL. (1988) Comparing the areas under two or more correlated receiver operating characteristics curves: a nonparametric approach. Biometrics 44:837845.[CrossRef][Web of Science][Medline]
Hajenius PJ, Mol BW, Ankum WM, van der Veen F, Bossuyt PM, Lammes FB. (1995) Suspected ectopic pregnancy: expectant management in patients with negative sonographic findings and low serum hCG concentrations. Early Pregnancy 1:258262.[Medline]
Hanley JA and McNeil BJ. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:2936.
Newcombe RG. (1998) Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 17:857872.[CrossRef][Web of Science][Medline]
Warren WB, Timor-Tritsch IE, Peisner DB, Raju S, Rosen MG. (1989) Dating the early pregnancy by sequential appearance of embryonic structures. Am J Obstet Gynecol 161:747753.[Web of Science][Medline]
Submitted on May 20, 2006; resubmitted on August 10, 2006; resubmitted on October 18, 2006; accepted on October 25, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. Kirk, A. T. Papageorghiou, B. Van Calster, G. Condous, N. Cowans, S. Van Huffel, D. Timmerman, K. Spencer, and T. Bourne The use of serum inhibin A and activin A levels in predicting the outcome of 'pregnancies of unknown location' Hum. Reprod., October 1, 2009; 24(10): 2451 - 2456. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Sagili and K. Mohamed Pregnancy of unknown location: an evidence-based approach to management Obstet Gynaecol (Lond), October 1, 2008; 10(4): 224 - 230. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


