# Circadian Rhythms and Logistic Regression

Karoly et al suggest that we can improve upon our prediction of epileptic seizures by incorporating the personalized cyclic patterns of seizure throughout an individual’s day. While this circadian pattern of seizure likelihood varies across different patients, an individual’s circadian pattern of seizures remains constant across many years. Karoly et al thus suggest that patient-specific forecasting that provides a probability spectrum of seizure vulnerability could be an improvement over the traditionally binary view of prediction mechanisms (i.e. correct or incorrect prediction). Just as a probability of rain can be more helpful than a possibly incorrect broad fact about the weather, the probability of a seizure for a given day can help an individual prepare for or prevent seizures. While logistic regression is not a new technique, including circadian profiles in the prediction mechanism preventatively helped the ten subjects in the study halve their monthly seizure average. Figure 1. Line graph shows the estimated probability density of seizures over the day after training on data from 200 days. Showcases the Circadian patterns of seizures.

Previously, one of the reasons for poor generalization and accuracy of predictions was the poor duration of available datasets. The authors worked with a larger set of data than Cook et al, which had previously attempted to prove a similar conclusion. Collectively, Karoly et al’s subject had an average of 38 ‘lead seizures’ in the 100-day training phase, and an average of 116 seizures in the remaining evaluation period. So despite only having 10 subjects, enough data existed to create a probability of seizure for each subject.

The goal of logistic regression is to find the best biologically reasonable model to describe the relationship between a dependent variable and a set Figure 2. Logistic Probability Function (Wikipedia by Krishnavedala) Presented as an example of a probability density function. Note the nonlinearity of this function that was used to model a subject’s probability of seizure.

of independent explanatory variables that fulfill a threshold of significance. It does this by estimating probabilities using a logistic function and assuming a standard logistic distribution of errors. Traditionally, the dependent variable follows a Bernoulli distribution, but in this case, the algorithm either does or does not predict a seizure for a patient given some past evidence. Yet, the results of logistic regression are able to predict the probability of particular outcomes. In this study, to represent the continuous nature of probability of seizure, the authors generated probability density functions to describe the circadian profile for each subject. To generate the probability density function, researchers began at a uniform distribution (a straight line with the same probability at all time steps), and the distribution was updated with every new seizure occurrence to more accurately describe a specific subject’s circadian profile. Given this probability density function with its updates, logistic regression was then able to provide the probability of seizure. Logistic regression is inherently binary in the sense that there are only two outcomes: either a seizure or no seizure. It is undoubtedly an improvement to include extra information when generating the logistic regression probability, but the real reason why logistic regression works here is because we are trying to model the relationship between the two outcomes. Modelling that with a nonlinear logistic function would best describe this probability relationship. While a linear regression model would describe multiple potential outcomes- which could be useful, since there are varying levels of seizure to be differentiated- a linear function does not describe the probability relationship nearly as well. I am personally curious to see the data in a linear regression model, however, since this would have been useful to understand the degree of improvement logistic regression makes. Figure 3. Line graph shows how the various forecasting models compare to the actual probability for each subject.

In this particular study, Karoly et al used three different prediction models, with the purpose of showing the improvement that circadian informed logistic regression made on logistic regression, or circadian prediction alone. To compare the probabilistic forecasts, Karoly et al suggest using the Brier score. The Brier score measures the difference between a long sequence of consecutive forecasts to the observed rate of seizures without adding in “tunable parameters” such as true and false positive ratings on the predictions. Such tunable parameters would make the data set vulnerable to overfitting within a given data set. Thus, the authors explain that the Brier score can evaluate the accuracy of a prediction instead of using binary true/false positive ratings. The lower the Brier score is for a set of predictions, the better the predictions are calibrated; however, the Brier score is traditionally applicable to tasks in which predictions assign probabilities to mutually exclusive outcomes.

In Figure 3, we see that the combined circadian logistic regression consistently scored better than previous prediction mechanisms. As the key shows, the blue line representing circadian logistic regression consistently scores better than the other two prediction mechanisms. Although circadian logistic regression still fails to provide consistently accurate predictions, by comparing them via Brier score, we can see that circadian logistic regression has a higher probability of prediction accuracy. I also appreciated that the paper then went on to specify the various factors that might have influenced subject anomalies. Given the small number of subjects, such subject anomalies play an even larger part in creating confounding factors.

Thus, Karoly et al show that circadian informed logistic regression consistently achieved a lower Brier score and more accurate results. According to the paper, the only performance benchmark thus far for similarly long-term seizure prediction is from a 2016 Kaggle competition making use of canine and human seizure data. While the results of this study are inconclusive, circadian patterns are clearly shown to elevate results from previous iterations of prediction mechanisms, to the same range of results as “state-of-the-art” machine learning algorithms today.

*Kaggle is a website for data manipulation