Reliability of the AOSpine Classification System in Children
Grant Recipient: Daniel Hedequist, MD
- Boston Children's Hospital
- Presentations & Publications:
- Further Funding:
- Additional Information:
- Methods: Twenty-seven patients with operative and non-operative subaxial cervical spine fractures, defined as a fracture involving one or more vertebrae between C3 and C7, were identified from 2000- February 2020. Inclusion criteria included availability of computed tomography (CT) and magnetic resonance imaging (MRI) imaging at the time of injury and under the age of 18. Patients with significant co-morbidities were excluded. Each case was reviewed by a single senior surgeon to determine eligibility. Educational videos, schematics describing the AO Subaxial Cervical Spine Injury Classification, and the imaging from the 27 cases were sent to 10 experienced orthopedic surgeons. The surgeons first classified each case based on morphology (Type A: Compression Injuries, Type B: Tension Band Injuries, Type C: Translation Injuries, and Type F: Facet Injuries). Subcategories allow for further classification in the Categories A and B. Inter- and intra-observer reproducibility was assessed across 10 raters by Fleiss’s kappa coefficient (kF) along with 95% confidence intervals (CI) for primary classifications and by Krippendorff’s alpha (αK) across sub-classifications. Concordance for each rater compared to the senior surgeon gold standard ratings was assessed using the S statistic along with 95% CI. The S statistic is not affected by the paradoxes of kappa. Interpretations for agreement statistics were based on Landis and Koch (1977): 0-0.2, slight; 0.2-0.4, fair; 0.4-0.6, moderate; 0.6-0.8, substantial; and >0.8, almost perfect agreement.
Results: Inter-rater agreement was moderate (kf, 0.41 to 0.50) with respect to primary classifications (A, B, C, and F) across all raters and was only slight (ak, 0.23 to 0.25) with respect to all sub-classifications. Intrarater kappa coefficients for primary classifications ranged from 0.49 to 0.83 and was 0.63 over all raters. Krippendorff’s alpha for intrarater reliability over all sub-classifications ranged from 0.31 to 0.64 with 0.48 over all raters. When comparing each rater with the senior surgeon gold standard ratings, concordance for primary classifications was mostly moderate ranging from 0.38 to 0.61and was mostly substantial ranging from 0.52 to 0.79 across sub-classifications.