Markus Reuber MD PhD, Academic Neurology Unit, University of Sheffield, Royal
Hallamshire Hospital, Glossop Road, Sheffield, S10 2JF
My Editor’s Choice from the current volume of Seizure is a retrospective observational cohort study by Colin B. Josephson et al. intended to explore clinical and socio-economic features affecting the mortality risk of patients with late onset epilepsy (1). Based on the records of over 1,000 older adults thought to have epilepsy and over 10,000 controls (>65 years) - identified from over 1 million primary care patients - the study describes ten feature clusters associated with different outcomes. Cases with epilepsy were selected from the primary care database using a method initially developed for the Secure Anonymised Information Linkage (SAIL) databank (Wales, UK), which had previously been shown to spot people with epilepsy with a sensitivity of 88% and specificity of 98% (2). The study’s findings were derived from the linkage of primary care and electronic hospital episode statistics (HES) data. Additional information about Cause of death was obtained from the linked United Kingdom’s Office for National Statistics (ONS). An unsupervised machine learning approach was used to characterise the clusters. While the hazard ratio (HR) of premature death was elevated to 1.7 (95% CI 1.5-2.0) across individuals with late onset epilepsy, the risk was found to be much higher in some of the clusters, including those named ‘dementia and anxiety’ (HR 5.4; 95%CI 3.3-8.7), ‘brain tumour’ (HR 5.0; 95%CI 2. 9-8. 6), ‘intracranial haemorrhage (ICH) and alcohol misuse’ (HR 2.9; 95%CI 1.8-4.8), and ‘ischaemic stroke’ (HR 2.83; 95%CI 1.8-4.0). Seizure-related cause of death was uncommon and restricted to the ICH, ‘ICH and alcohol misuse’, and ‘healthy female’ clusters.
This study is a good example of a research approach which has been used increasingly over recent years – facilitated by linkable electronic databases and advances in machine learning. Thi s ‘big data’ approach has given rise to a debate reflected in two other publications in the current volume of Seizure: An editorial by Randi van Wrede et al (3) and a response by Julie W. Dreier (4). Van Wrede et al. make the point that ‘big data’ studies often draw broad conclusions without taking sufficient account of intra- individual variability. They state that some ‘big data’ studies in the field of epilepsy fail to differentiate between epilepsy as a cause, consequence or association of comorbidities or other relevant pathological findings. Furthermore they highlight the risk of simplistic headlines and secondary reports promoting misunderstandings of epilepsy – even if the the limitations of the original work had been discussed by the authors in the initial publication (3). In contrast, Dreier et al point to the great potential and achievements of ‘big data’ studies. They refer to the ready availability of data, ensuring cost-effective use of limited research funds, as well as the large size of the cohorts, allowing for the inclusion of millions of subjects and exploration of rare exposures or outcomes. They list the reduction of selection bias, availability of longitudinal data, enabling researchers to investigate long-term effects, and the elimination of recall bias through prospective data collection as further strength of the ‘big data’ approach. They remind readers of the confirmation of the teratogenic effects of valproate as an important discovery based on ‘big data’ studies.
My Editor’s Choice gives readers an opportunity to make up their own mind: Are the feature clusters identifying some patients as being at particularly high risk of early death clinically useful? Could they encourage clinicians to focus their attention patient groups at particular risk - such as ‘healthy females’? – Or are the ‘big data’ diagnoses so often incorrect and the features so vague that the analysis by Josephson et al is likely to promote pointless anxiety and a waste of resources?
References
(1) Josephson CB, Gonzalez-Izquierdo A, Engbers JDT, Denaxas S, Delgado-Garcia G, Sajobi TT, Wang M, Keezer MR, Wiebe S. Association of comorbid-socioeconomic clusters with mortality in late onset epilepsy derived through unsupervised machine learning. Seizure 2023, 111:58-67.
(2) Fonferko-Shadrach B, Lacey AS, White CP, Powell HWR, Sawhney IMS, Lyons RA, et al. Validating epilepsy diagnoses in routinely collected data. Seizure 2017;52:195–8.
(3) von Wrede R, Witt JA, Helmstaedter C. Big Data - Big Trouble: The two faces of publishing results from big data studies based on cohorts with poor clinical
definition. Seizure 2023;111:21-22.
(4) Dreier JW, Bjørk M-H, Alvestad S, Gissler M, Igland J, Leinonen MK, Sun Y, Zoega H, Cohen JM, Furu K, Tomson T, Christencen J. Why Big Data Carries Big Potential Rather Than Big Trouble. Seizure 2023;111:106-108.