Computer models that predict response to treatment published in the international journal AIDS
London, UK; 14th September 2011. Details of computer models that can predict the chances of a patient responding to their HIV drugs with 80% accuracy are published online today in the journal AIDS. The models were developed by the HIV Resistance Response Database Initiative (RDI) using almost half a million pieces of data from approximately 6,000 clinical cases in hundreds of clinics around the world. The models are now available online as part of an experimental treatment support tool, HIV-TRePS.
The random forest models were trained to predict the probability of any combination of HIV drugs reducing the virus in the patient's blood to an undetectably low level (<50 copies/ml). They use the genetic code of the virus, the patient's immune status, their treatment history and a measure of the level of HIV in the blood, to make their predictions.
"The publication of these results is an important milestone in the development of expert computer systems to aid clinical practice", commented Professor Julio Montaner, Past President of the International AIDS Society and Director of the BC Centre for Excellence in HIV & AIDS, based in Vancouver, Canada. "The models harness the experience of hundreds of physicians treating thousands of patients and puts this distilled expertise in the hands of the individual physician via the click of a mouse."
Currently, when a patient's treatment fails and the levels of the virus increase, physicians usually run a genotype test, which detects mutations in the genetic code of the virus that can make it resistant to certain drugs. The physician then selects a combination of drugs that the test indicates will still be effective against the mutated virus. When the results of this test were compared with those of the RDI models they proved significantly less accurate as a predictor of response.
The RDI is an independent, not-for-profit international research collaboration set-up in 2002 with the mission to improve the clinical management of HIV infection through the application of bioinformatics to HIV drug resistance and treatment outcome data. Over the nine years since its inception, the RDI has worked with many of the leading clinicians and scientists in the world to develop the world's largest database of HIV drug resistance and treatment outcome data, containing information from approximately 85,000 patients in more than 20 countries.
The journal AIDS publishes the very latest ground-breaking research on HIV and AIDS. Read by all the top clinicians and researchers, AIDS has the highest impact of all AIDS-related journals.
Note: HIV-TRePS is an experimental system intended for research use only. The predictions of the system are not intended to replace professional medical care and attention by a qualified medical practitioner and consequently the RDI does not accept any responsibility for the selection of drugs, the patient's response to treatment or differences between the predictions and patients' responses.
Reference: Revell AD, Wang D, Boyd MA et al . The development of an expert system to predict virological response to HIV therapy as part of an online treatment support tool. AIDS 2011; 25(15): 1855-1863.
More information can be found at: www.hivrdi.org.
For further information contact:
Andrew Revell (Executive Director, RDI) on +44 207 226 7314, +44 7967 126498 (mobile) or firstname.lastname@example.org
24 September 2011
The development of an expert system to predict virological response to HIV therapy as part of an online treatment support tool - pdf attached
Revell, Andrew D.a; Wang, Dechaoa; Boyd, Mark A.b,c; Emery, Seanb; Pozniak, Anton L.d; De Wolf, Franke; Harrigan, Richardf; Montaner, Julio S.G.f; Lane, Cliffordg; Larder, Brendan A.a; on behalf of the RDI Study Group. aRDI, London, UK bThe Kirby Institute, University of New South Wales cSt. Vincent's Hospital, Sydney, New South Wales, Australia dChelsea and Westminster Hospital, London, UK eNetherlands HIV Monitoring Foundation, Amsterdam, The Netherlands fBC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada gNational Institutes of Allergy and Infectious Diseases, Bethesda, Maryland, USA.
Objective: The optimum selection and sequencing of combination antiretroviral therapy to maintain viral suppression can be challenging. The HIV Resistance Response Database Initiative has pioneered the development of computational models that predict the virological response to drug combinations. Here we describe the development and testing of random forest models to power an online treatment selection tool.
Methods: Five thousand, seven hundred and fifty-two treatment change episodes were selected to train a committee of 10 models to predict the probability of virological response to a new regimen. The input variables were antiretroviral treatment history, baseline CD4 cell count, viral load and genotype, drugs in the new regimen, time from treatment change to follow-up and follow-up viral load values. The models were assessed during cross-validation and with an independent set of 50 treatment change episodes by plotting receiver-operator characteristic curves and their performance compared with genotypic sensitivity scores from rules-based genotype interpretation systems.
Results: The models achieved an area under the curve during cross-validation of 0.77-0.87 (mean = 0.82), accuracy of 72-81% (mean = 77%), sensitivity of 62-80% (mean = 67%) and specificity of 75-89% (mean = 81%). When tested with the 50 test cases, the area under the curve was 0.70-0.88, accuracy 64-82%, sensitivity 62-80% and specificity 68-95%. The genotypic sensitivity scores achieved an area under the curve of 0.51-0.52, overall accuracy of 54-56%, sensitivity of 43-64% and specificity of 41-73%.
Conclusion: The models achieved a consistent, high level of accuracy in predicting treatment responses, which was markedly superior to that of genotypic sensitivity scores. The models are being used to power an experimental system now available via the Internet.
Since the advent of HAART, long-term suppression of HIV and concomitant prevention of HIV disease progression has become readily achievable for the majority of patients in well resourced healthcare settings. Nevertheless, despite the availability of approximately 25 antiretroviral drugs from six classes, viral breakthrough remains a significant clinical challenge. The latter is often associated with the emergence of drug-resistant virus, necessitating a change in therapy [1,2]. Sustained re-suppression of drug-resistant virus requires optimal selection of the next regimen. The complexities of HIV drug resistance interpretation and the number of potential drug combinations available make successful individualized sequencing of antiretroviral therapy highly challenging . For physicians with limited experience or resources, antiretroviral treatment decision-making can become even more problematic.
The standard of care in well resourced settings is to monitor the patient's viral load regularly, with detection of viral breakthrough triggering a re-evaluation of the efficacy of the antiretroviral drug regimen [1,2]. Once the viral breakthrough is confirmed with repeated viral load testing, a genotypic resistance test is usually performed to identify any selected viral mutations that may confer drug resistance. The interpretation of this genotype is often complex and is usually performed using rules-based interpretation software that relates point mutations to the susceptibility of the virus to single drugs . However, there is no gold standard interpretation system: different systems provide different interpretations with varying degrees of agreement [5-9]. Moreover, it is difficult to relate genotypic changes and the related predicted susceptibility to individual drugs to the likely relative responses to potential drug combinations. Indeed, raw genotypic sensitivity scores have been shown to be relatively weak predictors of virological response [10-13].
Bioinformatics have been used most commonly to predict phenotype from genotype and then relate a cut-off in predicted phenotype to a categorical response [14,15]. Again, it is difficult to relate this categorical prediction for an individual drug to the relative responses that may be achieved with different candidate combinations.
Models that provide a quantitative prediction of virological response to combination therapy, rather than to individual drugs, directly from the genotype and other clinical information may offer a potential clinical advantage. However, this can be challenging given that a very large dataset is required to accommodate a range of prognostic variables, including multiple possible drug-genotype permutations and their respective drug response data . The HIV Resistance Response Database Initiative (RDI) was established in 2002 explicitly to take on this challenge and be the global repository for data, collected from clinical practice around the world, required to develop such models .
Currently, we have collected data from approximately 84 000 patients, predominantly from western Europe and North America, but also including some from Africa, Australia and Japan. We have previously trained computational models, including artificial neural networks, random forests and support vector machines, using subsets of these data to predict virological response to treatment from genotype, viral load, CD4 cell count and treatment history . When tested with independent retrospective data, the models have proved accurate, with correlations between the predicted and actual changes in viral load in excess of 0.8 (r2 ≥ 0.65), which compares favorably with the correlations typically achieved by common rules-based genotype interpretation systems [13,19]. In addition, the models are able to identify combinations of antiretroviral drugs that are predicted to be effective for a substantial proportion of cases of virological failure in the clinic following a genotype-guided change in therapy [20,21].
In order to assess the clinical utility of the RDI tool, a Web-based user interface was developed that provided clinical investigators access to predictions of virological response to alternative antiretroviral regimens. Two multinational clinical pilot studies were initiated in which 23 participating physicians entered baseline data for 114 cases of treatment failure via the interface and then registered their treatment intention based on all the laboratory and clinical information available to them . The baseline information was automatically input to the RDI models, which made predictions of response to their intended regimen plus more than 200 potential alternative combinations of antiretroviral drugs. The physician received an automated report listing the five alternative regimens that the models predicted would be most effective, plus their own treatment selection, ranked in order of predicted virological response. Having reviewed the report, the physicians entered their final treatment decision.
Overall 33% of treatment decisions were changed following review of the report. The final treatment decisions and the best of the RDI alternatives were predicted to produce significantly greater virological responses and involve fewer drugs than the physicians' original selections. The system was found to be easy to use and positively rated as a useful aid to clinical practice. Participating physicians also submitted their suggestions for maximizing the utility of the system for current clinical practice.
An alternative system for predicting short-term treatment responses specifically at 8 weeks after a change in antiretroviral treatment, using a combination of three different computational models trained with information from a European dataset, has recently been evaluated and shown to be comparable or superior to estimates of response provided by physicians .
Encouraged by our results, we set out to develop a new set of models to power a version of the online system that would incorporate the suggestions made by the physicians and be made available over the Internet. Here, we describe the development and evaluation of computational models trained to predict the probability of a regimen reducing the viral load to below 50 copies/ml and the use of these models to power the RDI's online treatment selection aid that was launched in October 2010.
Characteristics of the training and test datasets
After the application of the three alternative TCE selection filters relating to the inclusion or exclusion of TCEs with suboptimal treatments, training sets of 3692, 5334 and 6136 TCEs were obtained. The single random forest models developed using these datasets were tested with the independent, randomly selected test set of 200 TCEs producing ROC curves with area under the curve (AUC) values of 0.74, 0.78 and 0.79 respectively. The overall accuracy was 71, 74 and 71%, respectively. Sensitivity (percentage of responses correctly predicted) was 73, 63 and 73 and specificity (percentage of failures correctly predicted) was 70, 79 and 70. Performance of the models with 25 etravirine-containing TCEs gave AUC values of 0.74, 0.80 and 0.76, overall accuracy of 72, 80 and 72% with sensitivity of 89, 88 and 89% and specificity of 63, 81 and 63, respectively.
On the basis of the figures for overall accuracy, AUC and specificity with the entire test set and the etravirine TCEs, the second filter (suboptimal TCEs permitted in the treatment archive and baseline positions) was selected and used for the selection of TCEs for the main round of modeling. This resulted in 5752 TCEs, of which 553 included etravirine-based regimens.
The selected data came from 24 sources: five cohorts, nine individual clinics and 10 clinical trials, with data from more than 15 countries. The characteristics of the two sets of test TCEs are summarized in Table 1.
Results of the modeling
The performance characteristics from the ROC curves of the 10 individual models during cross-validation and testing with the independent test set of 50 TCEs from Sydney, Australia, are summarized in Table 2. The 10 models achieved an AUC during cross-validation ranging from 0.77 to 0.87, with a mean of 0.82. The overall accuracy ranged from 72 to 81% (mean = 77%), the sensitivity from 62 to 80% (mean = 67%) and the specificity from 75 to 89% (mean = 81%). The ROC curve for the best performing model during cross-validation is presented in Fig. 2.
When tested with the 50 independent test TCEs, the 10 models achieved an AUC ranging from 0.70 to 0.88, with a mean of 0.79 and a CAP value of 0.83. The overall accuracy ranged from 64 to 82% (mean = 71%, CAP = 76%), the sensitivity from 57 to 71% (mean = 66%, CAP = 71%) and the specificity from 68 to 95% (mean = 79%, CAP = 82%).
The performance of the 10 models with etravirine-containing regimens during cross-validation gave an AUC of 0.84, overall accuracy of 80%, sensitivity of 71% and specificity of 91%.
When the GSSs for the 50 test TCEs from the basic ANRS, HIVDB and REGA genotype interpretation systems were used as predictors of response, predictions were close to chance. The AUC values for the three systems were 0.53, 0.56 and 0.55, respectively (Table 2). Overall accuracy was 46, 52 and 50%; sensitivity 50, 50 and 46% and specificity was 41, 55 and 55% , respectively. The expanded version of HIVDB and REGA gave AUC values of 0.57 and 0.55 with the overall accuracy of 58 and 52%, sensitivity of 71 and 50% and specificity of 41 and 55%, respectively.
These results demonstrate that the random forest models achieved a consistent, high level of accuracy in predicting virological responses to combination antiretroviral treatment, which was markedly superior to that of GSSs.
The results of the initial round of modeling, using test data selected using alternative filters for suboptimal regimens, suggested that the inclusion of TCEs with monotherapy or dual therapy in the treatment history was associated with better performing models. In selecting this filter for the main round of modeling, more emphasis was given to specificity (percentage of failures correctly predicted) rather than sensitivity, as clinically, it is more critical to predict treatment failure reliably than success.
The 10 random forest models that were subsequently developed achieved consistently accurate predictions of responses to treatment, whichever measure was considered. Specificity was consistently higher than sensitivity during cross-validation and with the test set, averaging approximately 80 vs. 67%. The random forest committee as a whole (CAP) performed better than all but one or two individual models.
The performance of all the models was markedly superior to that of GSSs from rule-based genotype interpretation systems in common use. This finding is consistent with previous studies . This may in part reflect the inherently superior accuracy of a system developed to predict virological response to combination therapy compared with one that makes categorical predictions of sensitivity or resistance to individual drugs. The GSSs performed unusually poorly here: historically, GSSs from these systems have predicted treatment response with accuracy typically in the region of 60-65%. However, it should be pointed out that the test set was too small to be considered adequate to test these systems, the outputs from which are not truly continuous variables.
On the basis of the overall accuracy of the 10 random forest models during cross-validation and independent testing, it was decided that these models could be used to power an experimental treatment support system and made available for open testing via the Internet. HIV-TRePS was launched in October 2010 (available at www.hivrdi.org). Long-term evaluation of this Internet-based system is currently underway. The main shortcoming of the current models is that they do not include some of the newest drugs (maraviroc, raltegravir and tipranavir). New models are under development to include these drugs and to replace the current models in 2011.
The unit of data used to train computational models is the treatment change episode (TCE), as first described by the RDI in 2003 . This comprises the following on-treatment data collected immediately prior to and then following a change in antiretroviral therapy guided by a genotype (as illustrated in Fig. 1):
1. Plasma viral load from a sample taken no more than 8 weeks prior to the change in treatment.
2. CD4 cell count and genotype from samples taken no more than 12 weeks prior to the change in treatment.
3. The drugs in the baseline regimen.
4. Antiretroviral treatment archive.
5. The drugs in the new regimen.
6. The time to follow-up.
7. A follow-up plasma viral load taken between 4 and 48 weeks following introduction of the new regimen.
Secondary treatment change episode selection rules
TCEs with all the above data were extracted and then edited according to the following additional rules:
1. No more than three TCEs from the same change in therapy (using multiple follow-up viral loads) were extracted for use in any modeling. All TCEs from the same treatment change must have follow-up viral load determinations more than 4 weeks (>28 days) apart.
2. TCEs involving the following drugs that are no longer in current use in clinical practice, as either the failing regimen or the new regimen were excluded: zalcitabine, delavirdine, loviride, emivirine, capravirine, atervidine and adefovir. These drugs were permitted in the treatment archive position, however.
3. Any TCEs involving the following drugs, which were not adequately represented in the RDI database, were excluded: tipranavir, raltegravir and maraviroc.
4. Any TCEs that included a protease inhibitor (other than nelfinavir) without ritonavir as a booster, in the failing or new regimen positions, were excluded. Any TCEs that had ritonavir as the only protease inhibitor in the failing or new regimen were also excluded.
5. Any TCEs without any resistance mutations were excluded from modeling.
6. TCEs with viral load values of the form '< X' where X was greater than 50 or 1.7 log copies (e.g. '<400' copies) were excluded as the true values were not known.
7. Three alternative filters were initially applied related to the inclusion or exclusion of TCEs involving treatment with fewer than three full-dose drugs that were not part of a deliberate treatment simplification strategy ('suboptimal treatment'): permitted in the treatment archive position only; permitted in the archive and baseline positions but not in the new regimen; and permitted in any position. Single models were developed using each of these filters and the filter associated with the best model performance was then taken forward for the main round of modeling.
Computational model development
Random forest models were developed to predict the probability of the follow-up viral load being less than 50 copies/ml, using the TCEs that met all the above criteria. A random forest model (see the following subsection of the same name for details) is a predictor consisting of a collection of decision trees. Each decision tree is a decision support tool that uses a tree-like graph of decisions and their possible consequences. The inputs to the trees are the values of the input variables used to train the random forest model. The 85 input variables used to train the models in this study were selected on the basis of previous modelling studies and were:
1. the baseline viral load (log10 copies HIV RNA/ml),
2. the baseline CD4 cell count (number of cells/ml),
3. the treatment history up to the point of treatment change (five variables determined by previous research to have a significant impact on the accuracy of models, coded as 1 = exposure, 0 = no exposure): zidovudine; lamivudine/emtracitabine; any non-nucleoside reverse transcriptase inhibitors (NNRTIs); any protease inhibitors; and enfuvirtide,
4. the following 59 baseline mutations in the HIV RNA regions encoding reverse transcriptase and protease, coded as binary variables (present = 1, absent = 0): reverse transcriptase (n = 32; M41L, E44D, A62 V, K65R, D67N, 69 insert, T69D/N, K70R, L74 V, V75I, F77L, V90I, A98G, L100I, L101I/E/P, K103N, V106A/I, V108I, Y115F, F116Y, V118I, Q151M, V179D/F, Y181C/I/V, M184V, Y188C/L/H, G190S/A, L210W, T215Y, T215F, K219Q/E, P236L); protease (n = 27; L10F/I/R/V, V11I, K20M/R, L24I, D30N, V32I, L33F, M36I, M46I/L, I47V, G48V, I50V, I50L, F53L, I54V/L/M, L63P, A71V/T, G73S/A, T74P, L76V, V77I, V82A/F/S, V82T, I84V/A/C, N88D/S, L89V, L90M),
5. the following 18 antiretroviral drugs in the new regimen (present = 1, not present = 0): zidovudine, didanosine, stavudine, abacavir, lamivudine/emtracitabine, tenofovir DF, efavirenz, nevirapine, etravirine, indinavir, nelfinavir, saquinavir, (fos)amprenavir, lopinavir, atazanavir, darunavir, ritonavir (as a protease inhibitor booster), enfuvirtide,
6. time from the change of treatment to the follow-up viral load (number of days).
The output from the trees was the follow-up viral load coded as a binary variable such that an undetectable viral load (value ≤1.7 log or 50 copies/ml) is coded as 1 and a detectable viral load (any value above 1.7 log or 50 copies/ml) as 0. The models were trained to produce an estimate of the probability of the follow-up viral load being less than 50 copies/ml.
Each random forest model was trained by building individual trees using bootstrap samples that were drawn from the training set. The trees were built to predict the out-of-bag (OOB) samples, which were not present in the bootstrap samples. A randomly selected subset of input variables (covariates) was used to build an optimized tree with each node splitting the data into finer branches, which resulted in a classification of patients into a number of clusters. As there are many covariates and treatment change outcomes, it is computationally too intensive to find the optimal tree model. Therefore, a very large number (200-300) of trees were built using random selections of subsets of covariates at the nodes. The predictions from these decision trees (a random forest) were averaged across the forest.
Initially, three random forest models were trained using the TCEs obtained by applying each of the filters described in point 7 of the section 'Secondary TCE selection rules' above relating to suboptimal treatments. A common independent set of 200 TCEs was randomly selected for testing these random forest models, with the constraints that no patient could have TCEs in both the training and test sets and only one TCE per patient was used in the test set. The results of this modeling were then used to select the filter used for the development of a committee of 10 random forest models.
The performance of the models as predictors of virological response was evaluated by plotting receiver-operator characteristic (ROC) curves and assessing the area under the ROC curve, the overall accuracy, the sensitivity and the specificity.
The random forest models were also validated using external data. The three initial random forest models were tested using the independent set of 200 TCEs from patients partitioned at random from the initial set of available TCEs. Following internal cross-validation, the final committee of 10 random forest models were tested using an independent set of 50 TCEs from two clinics in Sydney Australia (Immunology B Ambulatory Care Service at St Vincent's Hospital and Taylors Square Private Clinic). A smaller test set was used for this purpose in order to maximize the TCEs available for training and because the accuracy of the 10 random forest models had been established during cross-validation.
In addition to the performance of the 10 individual random forest models, the committee average performance (CAP) was evaluated using the mean of the predictions of the 10 models for each of the 50 test TCEs. It has been shown that the average prediction across the forests, known as the committee vote, is usually more accurate than the prediction from a particular forest when the system is used to make predictions for external data .
Performance of the models for regimens including etravirine, the newest drug to be included in the RDI's modeling, was evaluated separately in order to check for acceptable performance because of the relatively small number of TCEs available with this drug.
The random forest models were compared with genotypic sensitivity scores (GSSs) derived using three interpretation systems in common use (Stanford HIVDB 6.0.10, REGA V8.0.2 and ANRS V2010.07, accessed via the Stanford Web site 03 February 2011) in terms of the accuracy of the their predictions for the 50 test TCEs. The full list of mutations derived from population-based sequencing, rather than the subset used for the development of the computational models, was used to obtain these scores. In each case, the GSS for each regimen was derived by adding the score for each constituent drug and using the total score for the regimen as a predictor of response. The basic version of each system (which classified the virus as being sensitive, intermediate or resistant) was used, with sensitive scored as 1, intermediate as 0.5 and resistant as 0. In addition, the expanded versions of HIVDB with five categories and REGA with six were used, with HIVDB scoring from 0 to 1 in 0.25 intervals and REGA categories from 0 to 1.5