Key challenges for delivering clinical impact with artificial intelligence
Abstract
Background Artificial intelligence (AI) research in healthcare is accelerating rapidly with potential applications being demonstrated across many different domains of medicine. However, there are currently limited examples of such techniques being successfully deployed into clinical practice. This article explores the main challenges and limitations of AI in healthcare, and considers steps required to translate these potentially transformative technologies from research to clinical practice.
Main body Key challenges for the translation of AI systems in healthcare include those intrinsic to the science of machine learning, logistical difficulties in implementation, and consideration of barriers to adoption or necessary sociocultural or pathway change. Robust peer-reviewed clinical evaluation as part of randomised controlled trials should be viewed as the gold standard for evidence generation, but conducting these in practice may not always be appropriate or feasible. Performance metrics should aim to capture real clinical applicability, and be understandable to intended users. Regulation that balances pace of innovation with potential for harm, alongside thoughtful postmarket surveillance, is required to ensure that patients are not exposed to dangerous interventions, nor deprived of access to beneficial innovations. Mechanisms to enable direct comparisons of AI systems must be developed, including the use of independent, local and representative test sets. Developers of AI algorithms must be vigilant to potential dangers including dataset shift, accidental fitting of confounders, unintended discriminatory bias, the challenges of generalisation to new populations, and unintended negative consequences of new algorithms on health outcomes.
Conclusions The safe and timely translation of AI research into clinically validated tools that can benefit everyone is challenging. Further work is required to continue developing robust clinical evaluation and regulatory frameworks using metrics that are intuitive to clinicians, identifying themes of algorithmic bias and unfairness while developing mitigations to address this, reducing brittleness and improving generalisability, and developing methods for improved interpretability of machine learning models. If these goals can be achieved, the benefits for patients are likely to be transformational.
Main body Key challenges for the translation of AI systems in healthcare include those intrinsic to the science of machine learning, logistical difficulties in implementation, and consideration of barriers to adoption or necessary sociocultural or pathway change. Robust peer-reviewed clinical evaluation as part of randomised controlled trials should be viewed as the gold standard for evidence generation, but conducting these in practice may not always be appropriate or feasible. Performance metrics should aim to capture real clinical applicability, and be understandable to intended users. Regulation that balances pace of innovation with potential for harm, alongside thoughtful postmarket surveillance, is required to ensure that patients are not exposed to dangerous interventions, nor deprived of access to beneficial innovations. Mechanisms to enable direct comparisons of AI systems must be developed, including the use of independent, local and representative test sets. Developers of AI algorithms must be vigilant to potential dangers including dataset shift, accidental fitting of confounders, unintended discriminatory bias, the challenges of generalisation to new populations, and unintended negative consequences of new algorithms on health outcomes.
Conclusions The safe and timely translation of AI research into clinically validated tools that can benefit everyone is challenging. Further work is required to continue developing robust clinical evaluation and regulatory frameworks using metrics that are intuitive to clinicians, identifying themes of algorithmic bias and unfairness while developing mitigations to address this, reducing brittleness and improving generalisability, and developing methods for improved interpretability of machine learning models. If these goals can be achieved, the benefits for patients are likely to be transformational.