Blood Tests to Diagnose Fibrosis or Cirrhosis in Patients
With Chronic Hepatitis C Virus Infection: A Systematic Review
Download the PDF here
Download the PDF here
Ann Intern Med. June 4 2013
Roger Chou, MD; and Ngoc Wasson, MPH
"Our results suggest that blood tests can help to identify HCV-infected patients with clinically significant fibrosis, with somewhat greater accuracy for identifying cirrhosis than less advanced fibrosis.
......Studies that evaluate the virologic and clinical outcomes of antiviral treatment in HCV-infected patients who have not had liver biopsy are needed (214) to further define optimum work-up strategies.
.....Although liver biopsy is still regarded as the most accurate method for assessing the histologic stage of HCV infection, it has limitations and is an invasive test with some risk for serious harms (201-202). This has spurred interest in noninvasive tests as a potential alternative to biopsy. We found many blood tests associated with an AUROC of 0.70 or greater (range, 0.70 to 0.86) for fibrosis (generally classified as fair to good [48-49]) and 0.80 or greater (range, 0.80 to 0.91) for cirrhosis (generally classified as good to excellent) when compared with liver biopsy (the strength-of-evidence ratings are summarized in Appendix Table 3). Among tests meeting these AUROC thresholds, those that were associated with positive likelihood ratios of 5 to 10 (generally classified as moderately useful ) at commonly used cutoffs were platelet counts, age-platelet index, APRI, FibroIndex, FibroTest, and the Forns index for fibrosis and platelet counts, age-platelet index, APRI, and Hepascore for cirrhosis. For diagnosing cirrhosis, GUCI and the Lok index had positive likelihood ratios just below the threshold. Only FibroIndex and FibroTest were also associated with negative likelihood ratios for fibrosis in the moderately useful range (0.10 to 0.20) at commonly used cutoffs, suggesting that blood tests may be somewhat more useful for ruling in than ruling out fibrosis."
This article has been corrected. The original version (PDF) is appended to this article as a supplement.
Background: Many blood tests have been proposed as alternatives to liver biopsy for identifying fibrosis or cirrhosis.
Purpose: To evaluate the diagnostic accuracy of blood tests to identify fibrosis or cirrhosis in patients with hepatitis C virus (HCV) infection.
Data Sources: MEDLINE (1947 to January 2013), the Cochrane Library, and reference lists.
Study Selection: Studies that compared the diagnostic accuracy of blood tests with that of liver biopsy.
Data Extraction: Investigators abstracted and checked study details and quality by using predefined criteria.
Data Synthesis: 172 studies evaluated diagnostic accuracy. For identifying clinically significant fibrosis, the platelet count, age-platelet index, aspartate aminotransferase-platelet ratio index (APRI), FibroIndex, FibroTest, and Forns index had median positive likelihood ratios of 5 to 10 at commonly used cutoffs and areas under the receiver-operating characteristic curve (AUROCs) of 0.70 or greater (range, 0.71 to 0.86). For identifying cirrhosis, the platelet count, age-platelet index, APRI, and Hepascore had median positive likelihood ratios of 5 to 10 and AUROCs of 0.80 or greater (range, 0.80 to 0.91). The Göteborg University Cirrhosis Index and the Lok index had slightly lower positive likelihood ratios (4.8 and 4.4, respectively). In direct comparisons, the APRI was associated with a slightly lower AUROC than the FibroTest for identifying fibrosis and a substantially higher AUROC than the aspartate aminotransferase-alanine aminotransferase ratio for identifying fibrosis or cirrhosis.
Limitation: Only English-language articles were included, and most studies had methodological limitations, including failure to describe blinded interpretation of liver biopsy specimens and inadequate description of enrollment methods.
Conclusion: Many blood tests are moderately useful for identifying clinically significant fibrosis or cirrhosis in HCV-infected patients.
Primary Funding Source: Agency for Healthcare Research and Quality.
The prevalence of anti-hepatitis C virus (HCV) antibody in the United States is about 1.6% (1). Approximately three quarters of persons with anti-HCV antibody have viremia, indicating chronic infection. Hepatitis C virus-related liver disease is the most common reason for liver transplantation among American adults and a leading cause of hepatocellular carcinoma, and it is associated with about 15 000 deaths annually (2-5).
The natural course of HCV infection varies. The best predictor of disease progression is the degree of liver fibrosis. In patients with minimal or no fibrosis and inflammation, the risk for progression to severe fibrosis or cirrhosis over the next 10 to 20 years is low (6). Those with bridging fibrosis are at high risk for progression to cirrhosis. Most major complications of chronic HCV infection occur in patients with cirrhosis (7).
The goal of antiviral therapy is to eradicate viremia and prevent the long-term complications associated with chronic HCV infection. Previously, liver biopsy was recommended before antiviral therapy because treatment was primarily targeted to patients at higher risk for disease progression (8). Although biopsy remains the reference standard for assessing liver histology, it is subject to sampling error; variability in interpretation; and such complications as bleeding, severe pain, and infection (9-10). In addition, the increased effectiveness of antiviral treatments has resulted in broadening of treatment indications to encompass patients at lower risk for disease progression, calling into question the need to obtain detailed pretreatment prognostic information with an invasive test. Therefore, biopsy is no longer recommended in all HCV-infected patients before antiviral treatment (11). However, given the adverse effects and costs associated with current antiviral therapies, knowing the degree of liver fibrosis can still provide important information and allow for more informed treatment decisions. Ideally, methods for assessing liver fibrosis would be accurate without exposing patients to the potential harms and discomfort of biopsy. Many blood tests have been proposed as alternatives to liver biopsy, ranging from single tests to more complicated indices based on multiple tests (Table 1).
We developed a review protocol with the following key question: What is the accuracy of blood tests for diagnosing fibrosis or cirrhosis in patients with chronic HCV infection? Detailed methods and data for the review are available in the full report (42). The protocol was developed using a standardized process with input from experts and the public.
Data Sources and Searches
We searched Ovid MEDLINE (1947 to January 2013), EMBASE, the Cochrane Library, Scopus, and PsycINFO. The MEDLINE search strategy for blood tests is shown in Appendix Table 1. We supplemented electronic searches by reviewing reference lists of retrieved articles.
At least 2 reviewers independently evaluated each study to determine inclusion eligibility. We selected studies of HCV-infected patients that compared the accuracy of blood tests with that of liver biopsy for diagnosing fibrosis or cirrhosis. We restricted inclusion to English-language articles and excluded studies published only as abstracts. We also excluded studies of posttransplant patients, patients co-infected with HIV or hepatitis B virus, patients receiving hemodialysis, and children.
Data Abstraction and Quality Rating
One investigator abstracted details about study design, patient population, setting, interventions, analysis, follow-up, and results, and a second investigator reviewed data for accuracy. Two investigators independently applied predefined criteria (43-45) to assess the quality of each study as good, fair, or poor. Discrepancies were resolved through consensus. We rated the quality of each diagnostic accuracy study on the basis of whether it evaluated a representative spectrum of patients, enrolled a random or consecutive sample of patients meeting predefined criteria, used a credible reference standard, applied the same reference standard to all patients, reported the proportion of patients with uninterpretable or unobtainable reference standard tests, interpreted the reference standard independently from the test under evaluation, and predefined test cutoff thresholds (44-45). For studies of blood indices, we also recorded whether results were from the original sample and analysis used to develop the index. Such results can overestimate diagnostic accuracy because the model and test cutoffs are fitted to the observed data. If such studies then applied the index to a separate validation sample, we abstracted results for the development and validation samples separately.
For studies on diagnostic accuracy, we created 2 x 2 tables based on the sample size, prevalence of fibrosis or cirrhosis, sensitivity, and specificity and compared calculated measures of diagnostic accuracy from the tables with reported results. We focused on results for clinically significant fibrosis (defined as a score of 3 to 6 on the Ishak scale or F2 to F4 on the Meta-analysis of Histologic Data in Viral Hepatitis [METAVIR], Knodell, Hytiroglou, Batts-Ludwig, Scheuer, or Desmet scale, as determined from biopsy specimen) and cirrhosis (defined as a score of 5 or 6 on the Ishak scale or F4 on the METAVIR or similar scale) (see Appendix Table 2 for further descriptions of METAVIR and Ishak stages) (46-47). We also abstracted the reported area under the receiver-operating characteristic curve (AUROC) (48-49), which is based on sensitivities and specificities across a range of test results and is a measure of discrimination, or the ability of a test to distinguish persons with a condition from those without it. An AUROC of 1.0 indicates perfect discrimination, and an AUROC of 0.5 indicates complete lack of discrimination. Interpretation of values between 0.5 and 1.0 is somewhat arbitrary, but a value of 0.90 to less than 1.0 has been classified as excellent, 0.80 to less than 0.90 as good, 0.70 to less than 0.80 as fair, and less than 0.70 as poor (48-49).
We did not pool results because of differences across studies in populations evaluated, differences in how fibrosis and cirrhosis were defined, and methodological limitations in the studies. Instead, we created descriptive statistics with the median sensitivity and specificity at specific cutoffs and reported AUROCs and their associated ranges. The total range rather than the interquartile range was chosen to highlight the greater variability and uncertainty in the estimates, some of which were based on few studies. We calculated likelihood ratios based on the median sensitivities and specificities and reported the range of likelihood ratios from individual studies (50). The positive likelihood ratio [sensitivity/(1 - specificity)] is the odds of fibrosis or cirrhosis among patients with a positive test result (51). The negative likelihood ratio [(1 - sensitivity)/specificity] is the odds among patients with a negative result. We separately summarized the difference in AUROCs from the subgroup of studies that directly compared 2 or more blood tests in the same population.
To avoid double counting of data when calculating medians, we excluded duplicative results from the same population reported in different publications.
When the degree of overlap was partial or unclear, we included both sets of data but performed sensitivity analyses that excluded studies with potential overlap. We also performed sensitivity analyses that excluded poor-quality studies, results based on the original sample and analysis used to develop an index, studies with discrepancies between calculated and reported measures of diagnostic accuracy, studies of patients with normal aminotransferase levels, and studies that did not restrict analysis to adequate biopsy specimens (length >15 mm and >5 portal tracts in the absence of cirrhosis).
We synthesized the overall quality of each body of evidence on the basis of the type and quality of studies (good, fair, or poor); the precision of the estimate of diagnostic accuracy or the estimate of effect, based on the number and size of studies and the CI (high, moderate, or low); the consistency of results among studies (high, moderate, or low); and the directness of the evidence linking the intervention and health outcomes (direct or indirect). We rated the strength of evidence for each blood test with 1 of 4 grades (high, moderate, low, or insufficient), in accordance with the AHRQ Methods Guide for Effectiveness and Comparative Effectiveness Reviews (52).
Role of the Funding Source
This research was funded by AHRQ's Effective Health Care Program. Investigators worked with AHRQ staff to develop and refine the review protocol. AHRQ staff had no role in conducting the review, and the investigators are solely responsible for the content of the manuscript and the decision to submit it for publication.
Detailed results of the search and study selection process through May 2012 are shown in the full report (42). We reviewed a total of 8736 citations related to HCV screening in nonpregnant persons (including an update search done in January 2013) and included 172 studies on the accuracy of blood tests versus liver biopsy for diagnosing fibrosis or cirrhosis (Table 1 of the Supplement) (12-16, 18-40, 53-196). Diagnostic accuracy was also reported in 4 subsequent reports (197-200) from 3 of these studies (29, 132, 151). The studies varied with respect to inclusion criteria, such as presence of elevated aminotransferase levels, exposure to antiviral therapy, and alcohol use. They were primarily done in referral populations in the United States, Europe, Asia, and northern Africa. Fifteen were rated as good quality, 5 poor quality, and the remainder fair quality (Table 2 of the Supplement). Seventy-three studies did not describe interpretation of liver biopsy specimens by investigators blinded to test results, 93 did not clearly describe enrollment of a consecutive or random sample, and 105 did not evaluate clearly predefined test cutoffs. Only 20 studies reported the proportion of eligible patients excluded because of uninterpretable or unobtainable biopsy specimens (median, 5.0%; range, 0.7% to 26%). Forty-two studies reported results from the original sample and analysis used to develop an index. Seventeen studies reported results for diagnostic accuracy that were discordant with constructed 2 x 2 tables (Table 3 of the Supplement), and 2 studies reported different AUROCs at different cutoffs for the same test and diagnosis (23, 169).
Results for fibrosis and cirrhosis are summarized in Tables 2 and 3, respectively. Sensitivity and specificity varied on the basis of the cutoff evaluated. A platelet count less than 140 to less than 163 x 109 cells/L, an age-platelet index score of 6.0 or greater, an aspartate aminotransferase-platelet ratio index (APRI) score greater than 1.5, a FibroTest score greater than 0.70 or greater than 0.80, and a Forns index score greater than 6.9 were associated with median specificities greater than 0.90, positive likelihood ratios that ranged from 5.1 to 10, and negative likelihood ratios that ranged from 0.48 to 0.81. Positive likelihood ratios for the FibroIndex were somewhat higher, but estimates were available from only 3 studies (likelihood ratios were 10, 12, and į). A FibroTest score greater than 0.10 to greater than 0.22 and a FibroIndex score greater than 1.25 were associated with median sensitivities greater than 0.90; negative likelihood ratios of 0.21 and 0.15, respectively; and positive likelihood ratios of 1.5 and 1.6, respectively. An enhanced liver fibrosis (ELF) index score greater than 8.75 to greater than 9.78 and a Forns index score greater than 4.2 to greater than 4.57 were associated with slightly lower sensitivity (0.85 and 0.88, respectively) but similar negative likelihood ratios (0.21 and 0.22, respectively).
The median AUROC for fibrosis (METAVIR score of F2 to F4, Ishak score of 3 to 6, or equivalent) was 0.80 or greater (range, 0.81 to 0.86) for the ELF index, Fibrometer, and FIBROSpect II. The median AUROC was 0.70 to less than 0.80 for platelet counts, hyaluronic acid, age-platelet index, APRI, the FIB-4 index, FibroIndex, FibroTest, the Forns index, and Hepascore.
For cirrhosis, an APRI score greater than 2.0 was associated with a specificity of 0.94 (range, 0.65 to 0.99) (18 studies), and platelet counts less than 140 to less than 155 x 109 cells/L, an age-platelet index score of 6.0 or greater, and a Hepascore greater than 0.801 to 0.84 or greater were each associated with median specificities of 0.86 to 0.88 (Table 3). Associated positive likelihood ratios ranged from 5.1 to 8.0, and negative likelihood ratios ranged from 0.25 to 0.55. A Göteborg University Cirrhosis Index (GUCI) score greater than 1.0, 1.11, or 1.5 and a Lok index score of at least 0.5 or greater than 0.6 were associated with similar specificities and slightly lower positive likelihood ratios (4.8 and 4.4, respectively). A Lok index score of 0.20 or greater was associated with a median sensitivity of 0.90 for diagnosing cirrhosis, for a negative likelihood ratio of 0.21 (range, 0 to 0.94) (6 studies) and a positive likelihood ratio of 1.8 (range, 1.0 to 4.8).
The median AUROC for cirrhosis (METAVIR score of F4, Ishak score of 5 or 6, or equivalent) was 0.80 or greater (range, 0.80 to 0.91) for platelet counts, hyaluronic acid, age-platelet index, APRI, the ELF index, the FIB-4 index, FibroIndex, Fibrometer, FibroTest, the Forns index, GUCI, Hepascore, and the Lok index.
Excluding poor-quality studies, studies that reported discrepant results, results from the original sample and analysis used to develop an index, studies restricted to patients with normal aminotransferase levels (135, 153, 171), and results from similar or overlapping population samples had little effect on summary estimates. Studies found no consistent association between shorter biopsy specimen length (36, 91, 134, 155, 169) or presence of elevated aminotransferase levels (19, 170-171) and measures of diagnostic accuracy.
For other blood tests and indices, a median AUROC less than 0.70 for fibrosis and less than 0.80 for cirrhosis was reported (alanine aminotransferase [ALT], the aspartate aminotransferase [AST]-ALT ratio, the cirrhosis discriminant score, and the Pohl index) or the AUROC was evaluated in too few studies to reliably estimate.
Sixty-eight studies directly compared the AUROC for 2 or more blood indices in the same population (Tables 4 and 5). The most frequently evaluated indices in head-to-head studies were the APRI and FibroTest. The APRI was associated with a slightly lower AUROC than FibroTest for fibrosis (18 studies; median difference, -0.03; range, -0.10 to 0.07), but there was no difference for cirrhosis (7 studies; median difference, 0.0; range, -0.04 to 0.06). The APRI was associated with a substantially higher AUROC than the AST-ALT ratio for fibrosis (13 studies; median difference, 0.17; range, -0.06 to 0.23) and cirrhosis (11 studies; median difference, 0.19; range, -0.18 to 0.23). For fibrosis, the APRI was also associated with a higher AUROC than the cirrhosis discriminant score (4 studies; median difference, 0.08; range, 0.07 to 0.09) and platelet count (8 studies; median difference, 0.08; range, -0.06 to 0.53) and a lower AUROC than Fibrometer (8 studies; median difference, -0.06; range, -0.07 to 0.02), although differences were smaller. The FibroTest was associated with a higher AUROC than FibroIndex for diagnosing fibrosis (median difference, 0.08; range, 0.02 to 0.10), but results were based on only 3 studies. For cirrhosis, differences between the APRI or FibroTest and other blood tests were small; median differences ranged from 0 to 0.05.
Combinations of Indices
Nine studies evaluated combinations of indices (31, 71, 77, 82, 90, 123, 169, 173, 179). The Sequential Algorithm for Fibrosis Evaluation, which incorporates the APRI and FibroTest, was evaluated in 4 studies (77, 82, 169, 173). For fibrosis, it was associated with an AUROC of 0.90 and 0.94 in 2 studies (82, 169). Median sensitivity was 1.0 (range, 1.0 to 1.0) and median specificity was 0.82 (range, 0.77 to 0.88) in 4 studies (77, 82, 169, 173). For cirrhosis, the algorithm was associated with a median AUROC of 0.87 (range, 0.87 to 0.92) in 3 studies (82, 169, 173). Median sensitivity was 0.84 (range, 0.62 to 0.90) and median specificity was 0.92 (range, 0.90 to 0.93) in 4 studies (77, 82, 169, 173). In single studies, the Leroy and Fibropaca algorithms and various combinations of the APRI, FIBROSpect II, FibroTest, the FIB-4 index, and Fibrometer were also associated with diagnostic accuracy somewhat higher than that observed for single indices (90, 173, 179).
Although liver biopsy is still regarded as the most accurate method for assessing the histologic stage of HCV infection, it has limitations and is an invasive test with some risk for serious harms (201-202). This has spurred interest in noninvasive tests as a potential alternative to biopsy. We found many blood tests associated with an AUROC of 0.70 or greater (range, 0.70 to 0.86) for fibrosis (generally classified as fair to good [48-49]) and 0.80 or greater (range, 0.80 to 0.91) for cirrhosis (generally classified as good to excellent) when compared with liver biopsy (the strength-of-evidence ratings are summarized in Appendix Table 3). Among tests meeting these AUROC thresholds, those that were associated with positive likelihood ratios of 5 to 10 (generally classified as moderately useful ) at commonly used cutoffs were platelet counts, age-platelet index, APRI, FibroIndex, FibroTest, and the Forns index for fibrosis and platelet counts, age-platelet index, APRI, and Hepascore for cirrhosis. For diagnosing cirrhosis, GUCI and the Lok index had positive likelihood ratios just below the threshold. Only FibroIndex and FibroTest were also associated with negative likelihood ratios for fibrosis in the moderately useful range (0.10 to 0.20) at commonly used cutoffs, suggesting that blood tests may be somewhat more useful for ruling in than ruling out fibrosis.
In direct comparisons based on the AUROC, the APRI performed only slightly worse than FibroTest for diagnosing fibrosis and the tests did not differ for cirrhosis. The APRI performed substantially better than the AST-ALT ratio for diagnosing fibrosis or cirrhosis and moderately better than platelet count for diagnosing fibrosis. Differences between the APRI or FibroTest and other blood tests were relatively small, particularly for cirrhosis. This suggests that simple indices based on a small number of commonly available blood tests and straightforward calculations—such as the age-platelet index (based on age and platelet count) and the APRI (based on AST level and platelet count)—may perform similarly to measures based on more blood tests, including indices requiring tests not routinely obtained or involving proprietary formulas or panels of tests. Some evidence suggests that using multiple indices in combination or in an algorithmic approach is associated with somewhat higher diagnostic accuracy than using a single index.
Our study has limitations. We excluded non-English-language articles, which could have resulted in language bias, although some studies have found that restricting systematic reviews of noncomplementary medicine interventions to English-language studies has little effect on the conclusions (203-204). We did not attempt to pool the studies because of methodological limitations and variability in populations and how fibrosis and cirrhosis were defined. Many of the blood tests were evaluated in few studies, thus precluding reliable conclusions about diagnostic accuracy. Liver biopsy is subject to sampling error, inadequate specimens, and interobserver variability interpretation, which could result in underestimates of diagnostic accuracy due to misclassification (9, 205-206). Our results may not apply to specific populations of HCV-infected patients that were excluded from our review, such as patients co-infected with hepatitis B virus or HIV (who may be at higher risk for progression to cirrhosis) and those receiving hemodialysis. We also did not include results for imaging tests, such as those used to assess liver stiffness, that are addressed in the full report (42).
Results of our study should also be interpreted in the context of the analytic methods used. Estimates of diagnostic accuracy were based on a binary reference standard diagnosis (absence or presence of clinically significant fibrosis). However, fibrosis grading systems are multilevel, with higher grades associated with progressively worse prognosis. Measures that incorporate the accuracy of tests at each fibrosis stage would therefore be more informative than estimates based on dichotomized classifications. Techniques for calculating an AUROC based on a multilevel reference standard, such as the Obuchowski method (207) (which also weights the degree of discordance between predicted and observed findings), are available. However, only 2 studies reported the Obuchowski measure (115, 196), and other studies did not provide data to calculate it. We were also unable to determine the diagnostic accuracy of blood tests for less severe stages of fibrosis independent from the diagnostic accuracy for cirrhosis because almost all studies grouped less severe fibrosis (for example, METAVIR stage F2 or F3) with cirrhosis. In addition, estimates of diagnostic accuracy could have been affected by variability in the distribution and severity of fibrosis in different study populations. Methods for calculating "adjusted" AUROCs based on a standardized distribution of fibrosis stages have been proposed to enhance the comparability of diagnostic estimates across studies (208-209). We did not use such methods, which are based on assumptions about the underlying prevalence of each fibrosis stage and the effects of stage on diagnostic accuracy and require further statistical validation. Rather, we separately analyzed head-to-head studies on diagnostic accuracy—thus, in principle, reducing spectrum effects because comparative estimates from each study are based on the application of different blood tests in the same population.
Our study has other strengths. Unlike other reviews, our analysis included all blood tests rather than 1 or several tests (209-212). We restricted our analysis to HCV-infected patients, potentially resulting in a more homogeneous population. Finally, our findings were robust in sensitivity analyses related to study quality, study methods, and population differences.
Our results suggest that blood tests can help to identify HCV-infected patients with clinically significant fibrosis, with somewhat greater accuracy for identifying cirrhosis than less advanced fibrosis. In addition to the cross-sectional studies included in our review, longitudinal studies support the usefulness of blood tests in providing prognostic information, although data are more limited (213). Factors that may affect use or selection of blood tests include availability and cost, given the variability in component blood tests, the number of tests required, and proprietary status. Studies that evaluate the virologic and clinical outcomes of antiviral treatment in HCV-infected patients who have not had liver biopsy are needed (214) to further define optimum work-up strategies.