Accuracy and reproducibility of automated estradiol-17
and progesterone assays using native serum samples: results obtained in the Belgian external assessment scheme
1 Department of Clinical Biology, Scientific Institute of Public Health, Juliette Wytsmanstraat 14, 1050 Brussels, Belgium 2 Laboratory of Clinical Chemistry and Radioimmunology, UZ Brussel, 1090 Brussels, Belgium 3 UZA Department ofHormonology, Therapeutic Drug Monitoring and Clinical Toxicology, University Hospital Antwerp, 2650 Edegem, Belgium
4Correspondence address. E-mail: wim.coucke{at}iph.fgov.be
| Abstract |
|---|
|
|
|---|
BACKGROUND: In 2005, a special survey in the Belgian External Quality Assessment focused on the performance of six automated immunoassay analysers most frequently used in Belgium for estradiol-17
(E2) and progesterone. Results obtained were compared with values determined by reference method, isotope dilution–gas chromatograph/mass spectrometry (ID–GC/MS). METHODS: Five fresh frozen serum samples, without additives, from single donors and three pools from pregnant women were distributed to all registered Belgian laboratories. Total variation, bias, linear relationship within the reported range and linear regression were investigated.
RESULTS: Inter-laboratory coefficients of variation ranged from 4 to 49% for E2 and from 6 to 45% for progesterone. Bias ranged from –26 to 239% for E2 and from –23 to 81% for progesterone. Several systems showed an upward bias for one particular sample of at least 25%. Weighted linear regression showed overall bias ranging from –8% to 32% for E2 and from 7% to 41% for progesterone.
CONCLUSIONS: Few automated methods succeed in having an excellent reproducibility for E2 and progesterone. Given the high bias values it is suggested that, for performance testing, results be compared whenever possible with a reference method. The linear relationship as assessed by comparing results with those obtained by ID–GC/MS using samples from different donors was not assured for most methods.
Key words: immunoassay/estradiol/progesterone/external quality assessment/automation
| Introduction |
|---|
|
|
|---|
Measuring serum estradiol-17
(E2) and progesterone concentrations is of particular interest in controlling female reproduction as such concentrations characterize ovarian follicle growth, corpus luteum function and placental function. Automated random access immunoassay analysers are widely used to diagnose reproductive disorders and monitor ovulation induction and ovarian hyperstimulation treatments. Performance characteristics for E2 and progesterone have been described previously; however, external quality assessment (EQA) studies using fresh samples are scarce. Sample material is rarely serum from a single patient and target value setting is mostly based on consensus values (Wheeler, 2001In 2005, a special survey in the Belgian EQA focused on the performance of the six most widely used automated immunoassay analysers using samples from different donors. For each sample, a target value of E2 and progesterone was determined with isotope dilution–gas chromatograph/mass spectrometry (ID–GC/MS) as reference method, and compared with the results obtained in the different laboratories. By using fresh frozen samples without additives or preservatives mainly originating from single donors, matrix effects commonly present in EQA processed sample material were excluded. Samples are assumed to mimic genuine patient material.
| Materials and Methods |
|---|
|
|
|---|
The samples under investigation were sent to the participants on dry ice during a special Belgian external quality survey. The samples were "off the clot serum" of commercial origin (Scipac, Sittingbourne, Kent, UK and Bio-Dev, Sempione, Milano, Italy). Five samples were obtained from normal cycling women and three (those having the highest values of E2 and progesterone) were from a pool of donations by at least 10 different pregnant women—all women had given their informed consent. The sera from the normal cycling women were prepared with samplings from one single donor; they were rapidly isolated to avoid hemolysis and kept in sterile conditions. The sera were delivered in bulk without any additives or preservatives to the EQA organizer. Participating laboratories were unaware of E2 and progesterone concentrations in the samples.
The lipemia of the samples was evaluated by measuring the triglycerides on a Vitros® system (Ortho-Clinical Diagnostics, Raritan, NJ, USA). The presence of other steroids was also analysed: Estrone (Biosource Europe, Nivelles, Belgium), Estriol (Gamma, Angleur, Belgium) and 17
-hydroxyprogesterone (Biosource Europe).
In order to control for possible bias caused by human anti-mouse antibodies (HAMA) in the two assays with the highest bias, two samples (E2 concentrations of 1841 and 2026.4 pmol/l) were each measured in triplicate on a Vitros and Vidas® (Biomerieux, Marcy lEtoile and Paris, France) system before and after incubation with Heterophilic Blocking Tubes from Scantibodies Laboratory Inc. (Santee, CA, USA).
Samples were stored at –70°C until the control samples were prepared for the trial. Sterile aliquots of 500 µl were prepared and stored at –70°C until they were distributed to the participating laboratories on dry ice.
Determination of E2 and progesterone reference value was performed by previously described, ID–GC/MS methods (Thienpont et al., 1994
, 1995
, 1996
; Tai and Welch, 2005
).
The expanded uncertainty, which stands for the maximum deviation with a confidence interval of 95%, of the reference method value (RMV) of E2, based on the mean of six measurements by ID–GC/MS, is estimated to be of the order <1% (for concentrations >220 pmol/l) and 1.2% (for concentrations <220 pmol/l). For the extremely low concentrations (<18 pmol/l), the expanded uncertainty is of the order of 4%. For progesterone, the expanded uncertainty of the RMV is estimated at <2%.
The immunoassay systems taken into account were those most frequently used in Belgium: Advia Centaur® (Bayer, Tarrytown, NY, USA)—using assay ACS Centaur E2 6, Immulite® (DPC, Los Angeles, CA, USA), Elecsys® (Roche, Basel, Switzerland), Access® (Beckman-Coulter, Fullerton, CA, USA), Vitros and Vidas.
Based on the concentration of E2 and progesterone, two samples were categorized as early follicular; three other samples as periovulatory or luteal, and the last three as belonging to a pregnancy. An overview of E2 and progesterone concentrations with their clinical interpretation is given in Table I.
|
Results were reported via the on-line reporting system of the Department of Clinical Biology (Institute of Public Health, Brussels, Belgium) (Libeer et al., 1994
| Statistical analysis |
|---|
|
|
|---|
Total inter-laboratory variance and accuracy were calculated using coefficient of variation (CV) and bias after data cleaning. For this purpose, strongly deviating values were omitted from the analysis: one value deviating >10 000% from its true value, and the results of one lab of which more than half the results deviated for >50% were disregarded. As in a clinical setting, where results are interpreted according to the patients physiological status at the time of sampling, CV and bias are discussed per physiological condition with the emphasis on the highest CV or bias. It should be remembered that the variability considered in the study is a reproducibility measure, since it is based on values from the different laboratories that use the same analyser. The total variation can expected to be larger than reported intra-lab variation, due to different sources of variability that are not always present in intra-lab studies, such as the use of different calibrations, possible different reagent lots or different environmental conditions in the laboratories.
Besides, a weighted linear regression analysis and lack-of-fit test were performed for each method separately to assess a linear relationship and the relation between concentration and bias (Sen and Srivastava, 1997
). The null distribution of the lack-of-fit test was developed by a 1000-fold bootstrap (Efron and Tibshirani, 1993
), which showed significance only if the means deviated >10% from a linear relationship. In this sense, the test would not show significant results if the deviation from a linear relation was <10%—something a regular lack-of-fit might have done. Calculations were performed in R for Windows version 2.2.1.
In order to achieve satisfactory power (80%) only results from those immunoassay methods that had at least seven users were considered in the study. As a result, 140 laboratories were considered for E2 and 155 for progesterone.
For the eight samples in the study, none of the laboratories reported data below detection or quantification limit. Samples from different donors of which the concentration was defined by a RMV allow a regression model to be used with the reference value as independent and the results reported by the laboratories as the dependent variable. Three parameters are considered from the regression analysis: (i) P-value of a liberal lack-of-fit as a test for linear relationship: samples were omitted until a linear relationship was found; (ii) intercept of the weighted linear regression method with the hypothesis test whether there is significant difference from 0 and (iii) slope of the weighted linear regression method with the hypothesis test whether there is significant difference from 1.
| Results |
|---|
|
|
|---|
Inter-laboratory CV, bias and regression
Coefficient of variation
Considering the lack of precision for E2 measurements (Table II) of women in the early follicular phase, Elecsys, Vidas and Immulite had the lowest variability (with maximum values at, respectively, 11, 16 and 21%). Access had the highest variability, with CV up to 49%. CVs for Advia Centaur and Vitros were intermediate (24%). For progesterone, CV values and bias for concentrations <1 nmol/l are considered of no clinical importance and will not be discussed.
|
E2 measurements of samples from women in the periovulatory or luteal phase demonstrated a similar trend. Elecsys, Immulite and Vidas had the lowest variability (respectively, 8, 12 and 12%), while Advia Centaur was the least precise (22%). Access and Vitros were intermediate with, respectively, 18 and 16%.
Progesterone immunoassays (Table III) demonstrate maximum CVs <10% for Elecsys and Vitros (respectively, 7 and 9%). Access peaks at 33%, while the other systems show intermediate CVs (Advia Centaur: 16%; Immulite: 11%; Vidas: 12%).
|
For the samples that are mixtures of single blood draws from pregnant women, Elecsys and Immulite have the lowest variability for E2 (5 and 12%), while Vitros, Access and Vidas have, respectively, 15, 18 and 22%. Here, Advia Centaur had the highest variability, i.e. 29%.
For this category, CVs of progesterone are below or near 10% for Elecsys, Vitros and Vidas (respectively, 11, 10 and 10%). Notice that Access also has the highest CV (45%); Advia Centaur and Immulite are intermediate (16 and 14%).
Bias
The relative bias for E2 for one sample (E2 concentration 2026.4 pmol/l) was substantially higher than that seen for any other sample. For this sample, four methods showed a bias of 20% or higher: Access (20%), Immulite (25%), Vitros (96%) and Vidas (239%). Applying a theoretical correction for possible interfering substances (triglycerides, 2 mmol/l; estrone, 7013 pmol/l; estriol, 18 pmol/l) did not reduce the bias of the measurements to within acceptable limits (bias after correction by multiplying the cross reactivity coefficient with the concentration of interfering substances: Immulite: 25%, Access: 16%; Vitros: 78%; Vidas: 87%). Advia Centaur also shows high bias for progesterone (145%). Results of this sample were not included in the discussion of the bias.
For E2 (Table II), Immulite has negative bias for all other samples, while Elecsys and Vidas have overall positive bias. Concerning magnitude, Immulite has distinctively lower bias for the samples from women in early follicular phase (–5%). Advia Centaur, Elecsys and Vitros have slightly higher bias (respectively, up to –12, 15 and 18%). Access and Vidas have up to 30 and 20%, respectively. For the samples from women in the periovulatory or luteal phase, Vidas performs best (bias <10%). Bias values for Advia Centaur and Immulite are slightly higher (respectively, 14 and –17%), while other systems have bias values >20% (Elecsys: 22%; Vitros: –26%; Access: 36%).
The picture for samples from pregnant women is different: here, Vidas performs worst (43%). All other systems have intermediate bias (between 10 and 20%), and there is no clear difference between methods.
Bias values for progesterone are generally positive (Table III). Advia Centaur, Access and Vidas have overall positive bias. For all other methods there is only one sample with negative bias.
All systems show bias values for all phases of at least 15%. Values peak at 81% (Access), 64% (Advia) or 52% (Vidas).
Regression
For E2, only Advia Centaur and Elecsys had a linear relationship between reference and reported values over the whole range (Fig. 1). For the other systems, the previously mentioned sample (concentration 2026.4 pmol/l) had significant bias and had to be omitted from the regression analysis to obtain a linear relationship.
|
Considering intercept and slope (Table IV) Immulite, Elecsys, Access and Vidas had an intercept that significantly differed from 0, indicating a concentration-independent bias especially important for the low concentrations. The slope differed significantly from 1 for all systems except the Advia Centaur. Slopes ranged from 0.92 (Immulite and Vitros) to 1.31 (Vidas). Here too, it should be noted that a non-significant difference for Advia Centaur may have been caused by the high uncertainty of the method.
|
For progesterone (Fig. 2), one system did not seem to suffer from a lack of linear relationship between RMV and reported values (Access). For the other systems, one (Advia Centaur, Elecsys, Vidas) or two (Immulite, Vitros) samples had to be omitted to obtain a linear relationship. Intercepts were all significantly higher than 0, ranging from 0.19 (Elecsys) to 3.56 (Access). Slopes were all significantly higher than 1, yielding deviations from 7% (Immulite) to 41% (Vidas).
|
| Discussion |
|---|
|
|
|---|
The results show that performance was poor for some methods. The use of RMV in order to compare methods sheds new light on bias. Sending samples of different donors to different laboratories reveals information about variability, in this case, inter-laboratory variability.
Mean bias values of >20% occurred in 19% of E2 and in 90% of progesterone measurements. European Union Directive 98/79/EC (European Parliament and Council, 1998
) states that calibrators and control material should be traceable to a standard of higher order. A more suitable requirement may be to strive to compare every method with the highest possible standard (in this case ID–GC/MS as reference method).
The data from this multicenter study required a more thorough statistical analysis than regularly used methods. Since the data included excentric values along the x-axis and since response values did not have equal variance, the data were not bivariately normally distributed. For these reasons, weights were built up to control for excentricity of the x-values and unequal variance in the y-values, and correlation coefficients were not calculated. Linear regression lines did not fit all the data.
Variation differed considerably between methods; it also differed considerably from reported intra-lab variance. Only for Elecsys were the reported intra-lab CVs (Bieglmayer et al., 2004
; Yang et al., 2004
) for both E2 and progesterone comparable to the inter-lab CVs in this study. This indicates that, for this system, the inter-laboratory contribution of variance to the total was very small. For other systems, the total inter-laboratory spread was clearly wider than the reported intra-lab uncertainties (Rodriguez-Espinoza et al. 1998
; Wilson et al., 1998
; Hendriks et al., 2000
; Tello and Hernandez, 2000
; Anckaert et al., 2002
; Taieb et al., 2003
; Yang et al., 2004
; Massart et al., 2006
), which points to a significant difference between results for the same sample obtained in different laboratories.
The bias of several methods for particular samples jeopardize a linear relationship between RMV and routine methods for E2 and progesterone. Vitros systems reported almost twice, and Vidas more than three times the concentration. Cross-reactivities reported by manufacturers were too low, however, to explain the behavior of the sample by interferences from another substance. Neither could lipemia or hemolysis be the reason, because the samples showed no tendency to behave oddly. A test in which possible HAMA (Check et al., 1995
; Kricka, 1999
) were excluded before analysis did not help: Vidas still reported bias >200% and Vitros still showed >90% bias. To date we can offer no explanation for the abnormal bias seen for this sample with some methods. However, the sample consisted of a mixture of serum samples.
Apart from the results recorded for that particular sample, it should be noted that no single method had bias values <10% for all samples.
Considering progesterone, the results generally showed higher CVs and bias values. All methods had overall positive bias values, ranging to >40%.
In conclusion, we can say that for E2 and progesterone measurements, a linear relationship between RMV and reported values was not assured for most methods in a range for E2 from 198.13 to 3417.3 pmol/l and for progesterone from 0.56 to 117.85 nmol/l.
Overall precision for progesterone was better than for E2 for all automated analysers except for Access.
Because of the considerable bias differences between methods, a clinical follow-up of a patient should always be performed using the same, fit-for-purpose and well-validated assay. Considering the large bias of some methods, it is recommended to use method-specific reference intervals for the different physiopathological conditions.
| Acknowledgements |
|---|
|
|
|---|
The authors are grateful to Professor L. Thienpont (Ghent University, Belgium) for analysing the samples by ID–GC/MS for E2 and progesterone.
| References |
|---|
|
|
|---|
Anckaert E, Mees M, Schiettecatte J, Smitz J. Clinical validation of a fully autmated 17
-estradiol and progesterone assay (VIDAS) for use in monitoring assisted reproduction treatment. Clin Chem Lab Med (2002) 40:824–831.[CrossRef][Web of Science][Medline]Bieglmayer C, Chan DW, Sokoll L, Imdahl R, Kobayashi M, Yamada E, Lilje D, Luthe H, Messner J, Messeri G, et al. A. Multicentre performance evaluation of the E170 module for MODULAR ANALYTICS. Clin Chem Lab Med. (2004) 42:1186–1202.[CrossRef][Web of Science][Medline]
Check JH, Ubelacker L, Lauer CC. Falsely elevated steroidal assay levels related to heterophile antibodies against various animal species. Gynecol Obstet Investig (1995) 40:139–140.[Web of Science][Medline]
Efron B, Tibshirani R. An Introduction to the Bootstrap. (1993) Boca Raton: CRC Press LLC.
European Parliament and Council. Directive 98/79/EC of the European Parliamant and of the Council of 27 October 1998 on in vitro diagnostic medical devices. (1998) Office for Official Publications of the European Communities, Consleg: 1998L0079-20/11/2003.
Hendriks HA, Kortlandt W, Verweij WM. Analytical performance comparison of five new generation immunoassay analyzers. Ned Tijdschr Klin Chem (2000) 25:170–177.
Kricka L. Human anti-animal antibody interferences in immunological assay. Clin Chem (1999) 45:942–956.
Libeer JC, De Moor GJE, Albert A. Open electronic data interchange and improvement of external quality assessment programmes. 13e Medisch informatica congres 1993. Proceedings: communication and integration in healthcare informatics. De Moor GJE, ed. (1994) 51–59.
Massart C, Gibassier J, Laurent MC, Le Lannou D. Analytical performace of a new two-step ADVIA Centaur® estradiol immunoassay during ovarian stimulation. Clin Chem Lab Med (2006) 44:105–109.[CrossRef][Web of Science][Medline]
Rodriguez-Espinosa J, Otal-Entraigas C, Gascon-Roche N, Mora-Brugues J, Urgell-Rull E, Bordas-Serrat JR, Viscasillas-Molins P. Analytical and clinical performance of an automated immunoassay system (Immulite) for estradiol in serum. Clin Chem Lab Med (1998) 36:969–974.[CrossRef][Web of Science][Medline]
Sen A, Srivastava M. Regression Analysis: Theory, Methods, and Applications. (1997) New York: Springer-Verlag.
Tai SS, Welch MJ. Development and evaluation of a reference measurement procedure for the determination of estradiol-17
in human serum using isotope-dilution liquid chromatography-tandem mass spectrometry. Anal Chem (2005) 77:6359–6363.[Medline]
Taieb J, Benattar C, Birr AS, Poüs C. From ACS-180 to Advia-Centaur (Bayer diagnostics): assessment of estradiol, progesterone, LH and FSH assays. Ann Biol Clin (2003) 61:223–228.[Medline]
Tello FL, Hernandez DM. Performance evaluation of nine hormone assays on the Immulite 2000 immunoassay system. Clin Chem Lab Med (2000) 38:1039–1042.[CrossRef][Web of Science][Medline]
Thienpont LM, De Brabandere VL, Stöckl D, De Leenheer AP. Use of cyclodextrins for prepurification of progesterone and testosterone from human serum prior to determination with isotope dilution-gas chromatography/mass spectrometry. Anal Chem (1994) 66:4116–4119.[Medline]
Thienpont LM, Franzini C, Kratochvila J, Middle J, Ricós C, Siekmann L, Stockl D. Analytical quality specifications for reference methods and operating specifications for networks of reference laboratories. Recommendations of the European EQA-Organizers Working Group B. Eur J Clin Chem Clin Biochem (1995) 33:949–957.[Web of Science][Medline]
Thienpont LM, Van Nieuwenhove B, Stöckl D, Reinauer H, De Leenheer AP. Determination of RMV by isotope dilution-gas chromatography/mass spectrometry: a five years experience of two European reference laboratories. Eur J Clin Chem Clin Biochem (1996) 34:853–860.[Web of Science][Medline]
Wheeler MJ. Automated immunoassay analysers. Ann Clin Biochem (2001) 38:217–219.[CrossRef][Web of Science][Medline]
Wilson D, Groskoff W, Hsu S, Caplan D, Langner MB, DeManno D, Williams G, Payette D, Dagel C, Lynch D, et al. Rapid, automated assay for progesterone on the Abbott AxSYMTM analyzer. Clin Chem (1998) 44:86–91.
Yang D, Owen W, Ramsay C, Xie H, Roberts W. Performance characteristics of eight estradiol immunoassays. Am J Clin Pathol (2004) 122:332–337.[CrossRef][Web of Science][Medline]
Submitted on May 2, 2007; resubmitted on September 4, 2007; accepted on September 14, 2007.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. Fleming Progesterone elevation on the day of hCG: methodological issues Hum. Reprod. Update, June 2, 2008; (2008) dmn019v1. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


