Oral Presentation ANZBMS-MEPSA-ANZORS 2022

Artificial Intelligence Based Diagnostic Software for Atypical Femur Fractures  (#88)

Hanh H Nguyen 1 , Duy T Le 1 , Cat Shore-Lorenti 1 , Hengcan Shi 2 , Roger Zebaze 1 , Frances Milat 3 , Shoshana Sztal-Mazer 4 , Vivian Grill 5 , Roderick Clifton-Bligh 6 , Jianfei Cai 2 , Peter R Ebeling 1
  1. Department of Medicine, Monash University, Clayton, Victoria, Australia
  2. Department of Information Technology, Monash University, Clayton, Victoria, Australia
  3. Department of Endocrinology, Monash Health, Clayton, Victoria, Australia
  4. Department of Endocrinology and Diabetes, Alfred Health, Melbourne, Victoria
  5. Department of Endocrinology and Diabetes, Western Health, Footscray, Victoria
  6. Department of Endocrinology, Royal North Shore Hospital, Sydney, New South Wales

Background

Despite well-defined criteria for radiographic diagnosis of atypical femur fractures (AFFs)1, misdiagnosis is common. An AFF diagnostic software could provide timely AFF detection to improve their management and prevent progression of incomplete/contralateral AFFs.

 

Objective

Develop a semi-supervised artificial intelligence (AI)-based application using deep learning models (DLMs) to train algorithms to diagnose AFFs from femur X-rays.

 

Methods

Pre-operative complete AFF(cAFF), incomplete AFF(iAFF), typical femoral shaft fracture(TFF), and non-fractured femoral(NFF) X-ray images in anterior-posterior view were used. AFFs were defined as per 2014 ASBMR case definition1. Fractures were labelled using bounding boxes in Conda. All images were used to train and test the model using a 5-fold cross validation approach. Convolutional neural networks (CNNs) were trained to identify AFF diagnostic features. The DLMs were built using a pretrained (ImageNet dataset) ResNet backbone with the proposed Box Attention Guide (BAG) module. The model’s attention beta was visualised. Precision (result relevancy), recall (prediction performance within a category), and F1 score (precision-recall, overall prediction performance) were measured.

 

Results

The dataset included 2015 radiographs from 1014 patients. The number of cAFF, iAFF, TFF and NFF radiograph labels were 213, 49, 394 and 1359, respectively. The model achieved high precision, recall and F1-score for classifying cAFF X-rays (96%, 94%, and 95%, respectively), while iAFFs were detected with 86% precision, 82% recall and an F1-score of 83%. High precision, recall and F1-scores were also achieved for classifying TFFs (96%, 97%, 97%, respectively) and NFFs (99%, 99%, 99%, respectively). 

 

Conclusion 

A DLM trained on femoral X-rays was able to classify cAFF, TFF, and NFF X-rays with excellent precision and accuracy. Accurate AI-based AFF diagnostic software has the potential to improve AFF diagnosis, reduce radiologist error, and allow urgent intervention, thus improving patient outcomes. Further research to validate this model in a larger, well-phenotyped dataset is underway. 

  1. 1 Shane E, Burr D, et al. Atypical subtrochanteric and diaphyseal femoral fractures: second report of a task force of the American Society for Bone and Mineral Research. JBMR 29(1)(2014)1-23.