Diagnostic accuracy of FibroScan and comparison to liver fibrosis biomarkers in chronic viral hepatitis: A multicenter prospective study (the FIBROSTIC study) - published pdf attached
Download the PDF here
Diagnostic accuracy of FibroScan and comparison to liver fibrosis biomarkers in chronic viral hepatitis: A multicenter prospective study (the FIBROSTIC study) - published pdf attached
Download the PDF here
Journal of Hepatology (December 2010)
Francoise Degos1Corresponding Author Informationemail address, Paul Perez2, Bruno Roche3, Amel Mahmoudi4, Julien Asselineau2, Helene Voitot5, Pierre Bedossa6, for the FIBROSTIC study group
Received 15 February 2010; received in revised form 5 May 2010; accepted 22 May 2010. published online 23 August 2010.
"The overall accuracy of FibroScan® was as good as or better than that of other non-invasive methods......The diagnostic accuracy of non-invasive tests was high for cirrhosis, but poor for significant fibrosis. A clinically relevant gain in the likelihood of diagnosis was achieved in a low proportion of patients. Although the diagnosis of cirrhosis may rely on non-invasive tests, liver biopsy is warranted to diagnose intermediate stages of fibrosis.......The overall accuracy of FibroScan® was high (AUROC 0.89 and 0.90, respectively) and significantly higher than that of biomarkers in predicting cirrhosis (AUROC 0.77-0.86). All non-invasive methods had a moderate accuracy in predicting significant fibrosis (AUROC 0.72-0.78).........According to current practice guidelines, significant fibrosis (F2) is a frequent selection criterion for antiviral treatment of HCV and HBV chronic hepatitis, but the use of non-invasive tests to stage liver fibrosis remains highly controversial. In summary, our study has established that non-invasive tests, especially FibroScan®, may be useful in the prediction of cirrhosis. However, it supports guideline conclusions that non-invasive tests should not replace liver biopsy in routine clinical practice for the detection of significant fibrosis that may warrant treatment , ."
Background & Aims
The diagnostic accuracy of non-invasive liver fibrosis tests that may replace liver biopsy in patients with chronic hepatitis remains controversial.
We assessed and compared the accuracy of FibroScan® and that of the main biomarkers used for predicting cirrhosis and significant fibrosis (METAVIR >F2) in patients with chronic viral hepatitis.
A multicenter prospective cross-sectional diagnostic accuracy study was conducted in the Hepatology departments of 23 French university hospitals. Index tests and reference standard (METAVIR fibrosis score on liver biopsy) were measured on the same day and interpreted blindly. Consecutive patients with chronic viral hepatitis (hepatitis B or C virus, including possible Human Immunodeficiency Virus co-infection) requiring liver biopsy were recruited in the study.
The analysis was first conducted on the total population (1839 patients), and after excluding 532 protocol deviations, on 1307 patients (non-compliant FibroScan® examinations). The overall accuracy of FibroScan® was high (AUROC 0.89 and 0.90, respectively) and significantly higher than that of biomarkers in predicting cirrhosis (AUROC 0.77-0.86). All non-invasive methods had a moderate accuracy in predicting significant fibrosis (AUROC 0.72-0.78). Based on multilevel likelihood ratios, non-invasive tests provided a relevant gain in the likelihood of diagnosis in 0-60% of patients (cirrhosis) and 9-30% of patients (significant fibrosis).
The diagnostic accuracy of non-invasive tests was high for cirrhosis, but poor for significant fibrosis. A clinically relevant gain in the likelihood of diagnosis was achieved in a low proportion of patients. Although the diagnosis of cirrhosis may rely on non-invasive tests, liver biopsy is warranted to diagnose intermediate stages of fibrosis.
Diagnosis and treatment of patients with chronic hepatitis mostly rely on the staging of liver fibrosis. Antiviral therapy is proposed if moderate to severe (METAVIR stages F2 and F3)  fibrosis is present. If cirrhosis is present, specific surveillance is initiated, in particular for the early detection of hepatocellular carcinoma. Despite its limitations, liver biopsy is the usual procedure for staging fibrosis and is recommended by the international guidelines , .
Non-invasive procedures such as transient elastography (FibroScan®, Echosens, Paris, France) and serum biomarkers (particularly Fibrometre®, Fibrotest®, Hepascore and APRI) have been developed in order to avoid biopsy. Transient elastography is a new imaging technique measuring liver stiffness, i.e. its elasticity. As highlighted by recent meta-analyses , its diagnostic accuracy is difficult to judge because of the small sample sizes of most published studies and because of the inter-study variability of accuracy estimates and elasticity thresholds, varying in the literature from 8.4 to18.2kPa for the diagnosis of cirrhosis and from 5.0 to 11.8 for the diagnosis of significant fibrosis (METAVIR fibrosis score F2) , . In addition, there are few direct comparisons of transient elastography and of serum biomarkers  and the reported accuracy parameters do not always allow clinicians to estimate the probability of a condition in a given patient .
We evaluated the diagnostic accuracy and clinical usefulness of FibroScan® in predicting two conditions - significant histological liver fibrosis (METAVIR F2) and cirrhosis (F4) - in patients with chronic viral hepatitis, in a multicenter prospective study funded by the French Ministry of Health (the FIBROSTIC study). The secondary objectives were: (i) to assess variations in the diagnostic accuracy of FibroScan® by cause of infection and by patient selection criteria for liver biopsy; (ii) to compare the diagnostic accuracy of the most common liver fibrosis serum biomarkers with that of transient elastography; (iii) to assess the gain in likelihood of target conditions provided by non-invasive tests.
Patients and methods
Design overview, setting, and participants
This was a multicenter prospective cross-sectional diagnostic accuracy study. From June 15, 2006, to July 15, 2008, the hepatology departments of 23 French university hospitals included all consecutive patients with chronic viral hepatitis due to hepatitis C virus (HCV) or hepatitis B virus (HBV) (with or without Human Immunodeficiency Virus - HIV co-infection) in whom liver fibrosis assessment was indicated. Chronic hepatitis C was defined as the presence of anti-HCV antibodies for more than 6months and positivity (except in documented successfully treated patients) of HCV RNA (Amplicor HCV Monitor®). Chronic hepatitis B was defined as the presence of serum HBsAg (commercially available enzyme immunoassays) for more than 6months. Each hospital applied their usual criteria (clinical criteria or discordant results in prior non-invasive tests) to determine whether liver biopsy was indicated.
Patients underwent liver biopsy and non-invasive tests the same day or within a maximum of 30days. Raters of each test were blind to the results of the other tests. The diagnostic accuracy of tests was estimated versus liver biopsy as the reference standard. In the accuracy comparisons, patients with HCV and HBV co-infection were included in the HCV infection subgroup, and those with HIV co-infection in the HIV subgroup. The protocol was approved by the Institutional Review Board (CCPPRB, Hopital Pitie-Salpetriere). Participants gave their informed consent.
Liver biopsy procedures  were performed by experienced hepatologists or radiologists according to current French practice , either by the transjugular, or intercostal approach under ultrasound guidance using disposable 1.8mm diameter Menghini needles. The liver tissue was processed routinely and read on site by expert liver pathologists. The length of each liver specimen, the number of fragments, and portal tracts were recorded. Fibrosis stage and necro-inflammatory activity were evaluated using METAVIR scores .
Transient elastography was performed by physicians or trained technicians with experience of at least 50 transient elastography procedures, as previously recommended  (Appendix 2). At least 10 valid measurements, a 60% success rate, an interquartile range of less than 30% of the median elasticity, and a Body Mass Index (BMI) <30kg/m2 were required for eligibility . Elasticity thresholds were chosen on study data, for achieving 90% sensitivity in the prediction of significant fibrosis and 90% specificity in the prediction of cirrhosis. One of the thresholds previously published and commonly proposed was also used .
Serum biomarker assays
The usual liver function tests were performed on site. Frozen sera (-80°C) were collected for other assays to be included in serum biomarkers scores (Appendix 3). The following recommended cut-off levels were used to define a positive test result for significant fibrosis and cirrhosis, respectively: Fibrometre® (0.411 and 0.442 , Fibrotest® (: 0.48 and 0.74 , APRI 0.5 and 2 , 0.50 and 0.84 . Hepascore and APRI were calculated from biological results, Fibrotest®, and Fibrometre® are licensed (cost 50 euros per test) and were calculated by the courtesy of the manufacturers (Biopredictive, BioLiveScale). Biochemical scores could not be calculated in patients with at least one biological value missing, and when they were considered by the manufacturer as non-interpretable, they were not included in the analysis.
The diagnostic accuracy parameters of the non-invasive tests were estimated by comparison with liver biopsy used as the reference standard. Multilevel likelihood ratios (LR) and post-test probabilities of significant fibrosis and cirrhosis were calculated for the deciles of elasticity and biomarker scores. Published reference values of LR  were used to establish test result categories. Confidence intervals (95% CI) were calculated by the exact binomial method for proportions, and by the asymptotic Gaussian distribution method for AUCs . Variations in diagnostic accuracy according to patient subgroup (viral etiology, patient selection for liver biopsy) were investigated by comparing AUCs , specificities for the diagnosis of significant fibrosis, and sensitivities for the diagnosis of cirrhosis (Chi-square tests). If there was no statistically significant variation of accuracy according to viral etiology, subsequent analyses were not stratified according to this characteristic. Differences in the diagnostic accuracy of transient elastography and biomarkers were studied by comparing AUCs (Delong et al.'s method for correlated data) . Agreement between METAVIR scores, as measured by each center and by the central pathology laboratory was assessed using Kappas coefficients (Appendix 4). Type I error was fixed at 0.01 for comparing subgroups or non-invasive methods and at 0.05 for other comparisons. Analyses were conducted with SAS v. 9.1.3 and Stata v. 9.2 softwares.
Flow diagram and patients' characteristics
Among the 1839 potentially eligible patients, 532 (29%) were not eligible, mostly because their transient elastography examinations did not comply with the recommendations for high measurement reproducibility  (Fig. 1). The more frequent reason of exclusion was an interquartile range of elasticity >30% of the median (269 patients, 14%). We conducted the main analysis on the group of patients complying with the adequate technique, but results on the whole group were analyzed in a sensitivity analysis.
The clinical and histopathological characteristics of the 1307 eligible patients are given in Table 1 according to their viral etiology; they were found to have similar characteristics to those of the whole group: median age 47.2 vs. 47.6, male 69.2% vs. 67.2%, median BMI 23.8 vs. 24.3, mean ALT 2.4±3.1 vs. 2.4±3.1, significant fibrosis 57.1% vs. 55.3%, cirrhosis 13.8% vs. 13.3%.
The median size of liver biopsies was 24mm (1st-3rd quartiles: 18-30) with 71% of the biopsies measuring at least 20mm. The median number of portal tracts was 16 (10-22). The reproducibility of the METAVIR fibrosis score is reported in Appendix 4.
Diagnostic accuracy of transient elastography
The results of stiffness according to the five stages of fibrosis in the total group and in the group of eligible patients are given in Fig. 2A and B.
The median number of measurements was 10 (10-10) and the IQR/stiffness ratio was 15% (11-20). Median stiffness was in the same range for F0, F1, and F2 patients and slightly higher in F3 patients (Fig. 2). Patients with cirrhosis had markedly higher stiffness values. The distribution of stiffness values differed according to METAVIR stage (p<0.0001).
Results for the diagnostic accuracy of TE for eligible patients are shown in Fig. 3A-D and Table 2a, Table 2b). The AUC in the prediction of cirrhosis was 0.90. The desired specificity level of 90% was achieved for a 12.9kPa threshold, and the sensitivity was 70.2%. AUCs and sensitivity were slightly higher in the HIV subgroup but the difference between subgroups was not statistically significant (p=0.04 for both) (Fig. 3A). The AUC in the prediction of significant fibrosis was 0.76. The desired sensitivity level of 90% was achieved for a 5.2kPa threshold, and the specificity was 34.0%. AUCs and specificity were similar in the viral infection subgroups (AUC p=0.08, sensitivity p=0.45) (Fig. 3B).
Fourteen centers selected patients for liver biopsy on the basis of prior discordant non-invasive test results (37% of patients) and eight centers used customary clinical criteria (63% of patients). The general characteristics of patients were similar and the overall diagnostic accuracy of TE was the same, but the METAVIR stage distribution differed (F0: 5% vs. 11%, F1: 34% vs. 34%, F2: 28% vs. 28%, F3: 20% vs. 12%, F4: 13% vs. 14%, p<0.0001) (Table 2a, Table 2b) and (Appendix 5).
When applying elasticity cut-offs that are previously published,  the accuracy of TE in predicting cirrhosis was similar to that obtained in our work (Table 2a). For the diagnosis of significant fibrosis, overall accuracy was the same, sensitivity was lower when using published cut-offs and specificity was higher (Table 2b).
TE accuracy was slightly higher in patients eligible for the primary analysis (AUC 0.90 and 0.76 for cirrhosis and significant fibrosis, respectively) than in those who were excluded (AUC 0.87 and 0.73) (Appendix 5).
Comparative diagnostic accuracy of biomarkers and transient elastography
The comparison of the accuracy of biomarkers and TE is given in Table 2a, Table 2b and Fig. 3C and D. Results for all biomarkers were not available for all patients because of missing biological values and 44 additional Fibrotest® scores which were considered non-interpretable by the manufacturer. Therefore, the comparison with FibroScan was based on results of 1197 (Fibrotest), 1204 (Fibrometre), 1272 (APRI), and 1238 patients (Hepascore). The diagnostic accuracy of TE was significantly higher than that of any biomarker except Fibrometre® in diagnosing cirrhosis (p=0.007 to p<0.0001). However, there were no differences in the detection of significant fibrosis. The diagnostic accuracy of biomarkers in non eligible patients was similar than in patients included in these paired comparisons with FibroScan® (data not shown).
Gain in likelihood of significant fibrosis and cirrhosis
Because better LRs (10.0 and 0.1) were rarely achieved, we used threshold values of 5.0 and 0.2 for defining three test result categories of each test that would indicate a relevant gain in the likelihood of presence ("high" result value) and absence ("low" result value) of the target conditions - or no relevant gain (LR between 0.2 and 5.0, "intermediate" result value).
The post-test probability of significant fibrosis and cirrhosis (See details in Appendix 7) is summarized for each of these three test result categories (Table 3). For example, when the elasticity was >17.1kPa the probability of cirrhosis was high: 72% of these patients had cirrhosis. The gain in likelihood of cirrhosis was deemed relevant, as the probability of cirrhosis was 13.8% before the test. However, only 10% of patients fell into this class of results. Overall, a high percentage of patients had intermediate values, ranging from 40% to 100% for the diagnosis of cirrhosis and from 70% to 91% for the diagnosis of significant fibrosis (Table 3).
The FIBROSTIC study has compared the performance of transient elastography (FibroScan®) with that of the main non-invasive biological methods of liver fibrosis assessment in a large representative sample of patients suffering from viral chronic hepatitis and consecutively selected for liver biopsy in routine clinical practice. The performance of FibroScan® in predicting cirrhosis was high and higher than that of biomarkers. However, the performance of all the non-invasive methods in predicting significant fibrosis was moderate to poor. There was no difference in the overall accuracy of FibroScan®: (i) when studying the whole population recruited, or only the eligible patients who complied with the prerequisites for a valid procedure, (ii) when thresholds were chosen to obtain a targeted specificity (cirrhosis) or sensitivity (significant fibrosis) and when using previously published thresholds , (iii) according to the cause of the viral disease, (iv) according to whether patients had, or had not, been selected for biopsy on the basis of prior abnormal non-invasive test results. Post-test probability estimates indicated that the non-invasive methods are able to rule in or rule out cirrhosis and significant fibrosis in a low proportion of patients.
Liver biopsy has a distinct advantage in that commonly associated liver lesions, such as steatohepatitis and iron overload which can impact on fibrosis progression and treatment response, can be diagnosed and investigated . However, it also has its limitations in staging fibrosis because of the heterogeneous distribution of fibrosis in the liver and the moderate reproducibility of readings . This can constitute a bias when liver biopsy is used as the reference standard in the evaluation of diagnostic tests. This problem is common in diagnostic studies and has attracted much, but not very successful, statistical research , , . The performance can, however, be improved by using an appropriate specimen length and number of portal tracts, as we have done in this study . According to a central review of a random sample of biopsies, our study's METAVIR score reproducibility was acceptable (Appendix 4). Moreover, when biopsies were read by local pathologists or by a single one (at the central laboratory, PB, avoiding the inter-observer variation), the diagnostic accuracy of transient elastography did not differ markedly (Appendix 5). Finally, as liver biopsy remains the basis of patient management when it is available, it is wise to estimate the accuracy of markers in predicting its results.
The design of our study, which is the first to compare major non-invasive tests for liver fibrosis in a large population , , , has several strong points and was conducted according to current guidelines , , : (i) all patients underwent the reference standard test; (ii) the persons performing transient elastography or interpreting biomarkers were blinded to the results of liver biopsy, and the pathologists were blinded to the results of the non-invasive tests; (iii) to ensure the validity of diagnostic accuracy estimates of non-invasive tests , , the study recruited a large number of consecutive patients in whom liver fibrosis assessment by biopsy was indicated for clinical purposes, and who were thus potential candidates for non-invasive methods in routine practice. All centers applied their usual decision rules for performing a liver biopsy and thus including patients. In some centers, the decision was based on prior results between non-invasive markers , . This led to a better selection of patients with avoidable biopsies (4% vs. 11% patients with F0) but did not affect the diagnostic accuracy of the tests. Moreover, the comparison between tests could not be biased by the distribution of fibrosis stages as it was done in the same population.
The representativity of the included population is an important feature, as the study population characteristics may influence diagnostic accuracy measures , , . The fact that, in a prospective wide evaluation of the technique, 29% of patients were not eligible, needs particular attention. However, most of these exclusions were due to non compliance with technical specifications of transient elastography , , making the validity of measures uncertain. Complying with these specifications is needed for a correct interpretation of elasticity results by physicians. Therefore, the population that we analyzed will be representative of the real life population in whom the FibroScan® technique will be considered interpretable.
Physicians or technicians performing the technique were previously trained (more than 50 procedures), and the technical and clinical prerequisites were mentioned in the protocol. Pitfalls of liver stiffness measurement were recently reported from a single center with a large experience of over 13,000 examinations, showing that unreliable measures due to non compliance with technical specifications of FibroScan® (fewer than 10 valid shots, success rate <60, or IQR/median stiffness <30%) were reported in 15.8% of the procedures . These figures are comparable to ours, where the transient elastography was performed in 23 different centers, by less trained operators. Our results, as those of Castera, should be used very carefully when recommending a large use of this technique, since even in a very specialized center, pitfalls were frequently observed. Moreover, one must also keep in mind the cost of a non valid technique and the possible consequences of misinterpretation of the histological liver status. FibroScan® users should comply with these conditions of use in order to exclude undesirable variability and guarantee the excellent reproducibility reported for the technique when used in those conditions .
Our underlying objective was to compare the accuracy of the non-invasive tests in informing clinicians about the probability of disease in individual patients. As thresholds for FibroScan® were variable in previous studies, elasticity thresholds were based on study data in the present work and chosen in order (i) to achieve high sensitivity when looking for significant fibrosis, so as not to miss patients needing treatment, and (ii) to achieve high specificity when looking for cirrhosis, so as not to submit patients to unnecessary detection of complications. Despite these choices, there were 30% false-negatives for significant fibrosis and 47% false-positives for cirrhosis. This is partly explained by the prevalence of these conditions in patients undergoing fibrosis assessment. Differences with other reported accuracy estimates may be due to the use of different thresholds, different patient spectrums, or the small number of patients in earlier studies , , , , , . Variation of accuracy related to the fed state is unlikely to have occurred, as 87% of patients were fasting or had a light meal only . We obtained the same estimates for AUC as the largest previously reported study which enrolled 955 patients with chronic hepatitis C . The overall accuracy of FibroScan® was as good as or better than that of other non-invasive methods. The choice of different elasticity thresholds for FibroScan® did not bias this comparison, as it was based on ROC curves AUC, which take into account every possible threshold. The comparisons of diagnostic accuracy between FibroScan® and biomarkers were based on results available and interpretable at the same time for both non-invasive techniques. Paired comparisons are more valid than comparisons between different groups of subjects and we checked that the diagnostic accuracy of biomarkers was not different when assessed in the non eligible group. However, biomarkers will provide results in more patients than with FibroScan®, as they usually do not generate non-interpretable results (with the exception of Fibrotest®: 3.5% of the eligible group) if the results of biological assays are available.
Multilevel diagnostic LRs inform on different classes of measurements and not just on positive and negative results according to a single cut-off level . Their calculation allowed us to define three classes of test results providing a rough guide of the values that determine whether the result rules in or rules out significant fibrosis and cirrhosis. These probabilities enable clinicians to directly interpret the results of patients tested in similar settings. However, non-invasive tests are strongly predictive of the target conditions in a limited proportion of patients only (cirrhosis: 0-60% significant fibrosis: 9-30% of patients, depending on the test). Results will thus not prove useful for ruling in or ruling out the target conditions in most patients.
According to current practice guidelines, significant fibrosis (F2) is a frequent selection criterion for antiviral treatment of HCV and HBV chronic hepatitis, but the use of non-invasive tests to stage liver fibrosis remains highly controversial. In summary, our study has established that non-invasive tests, especially FibroScan®, may be useful in the prediction of cirrhosis. However, it supports guideline conclusions that non-invasive tests should not replace liver biopsy in routine clinical practice for the detection of significant fibrosis that may warrant treatment , .
French health authorities (National STIC grant).
Conflict of interests
The authors do not have a relationship with the manufacturers of the drugs involved either in the past or present and did not receive funding from the manufacturers to carry out their research.