What it is:
More people use multi-sensor health tracking devices than ever before. In combination with mobile-based apps, these wearables and “nearables” (used near the body but not worn, such as under-bed sleep trackers) measure various physiologic parameters and give us insight into how well we perform in different arenas. Once simple pedometers now paint a picture of sleep architecture, track menstrual cycles, and even present a daily readiness or resilience score that many people consider as they approach their days.
Daily data can help meet personalized fitness goals and adhere to healthier lifestyles. However, trackers don’t benefit everyone, and it’s essential to consider their limitations and check in with what fits your particular needs, especially when it comes to tracking sleep.
The purported claims:
Top-rated devices claim to capture time spent asleep versus awake, time spent in different sleep stages, and often assign a numerical sleep score to the night.
Devices use movement, heart rate, and body temperature data to build a picture of how we rest across the night, which is extrapolated to estimate various sleep stages.
Modern devices are good at tracking sleep generally and somewhat effective at capturing sleep stages. However, accuracy rapidly declines during a poor night of sleep.
What the science says:
Early Beginnings
Long before, the sleek and aesthetically pleasing devices that we see on the market today were analog movement transducers designed to be worn on the waist or hip by early research participants and patients. The first documented use of these devices was in 1972 by Yale graduate student David Kupfer, who had eight patients wear these rudimentary trackers while he simultaneously recorded brain activity using electroencephalography (EEG). When he found that the movement traces aligned well with the EEG scores of sleep and wake episodes, this opened the door to a new way of measuring sleep from the wrist without the cumbersome setup of a traditional sleep study.
The mechanics of these early devices were very straightforward. They consisted of a small ball held within a metal tube. When the participant moved, the ball moved within the tube, changing the device's voltage output.
The next advancement came in the 1970s when the MediLog system utilized an off-center nut soldered to one end of an EEG pen wire. The other end was attached to a piezoelectric element. "Piezoelectricity" refers to the electric charge found within some solid materials, such as crystals and ceramics, in response to applied pressure. When mechanical stress is applied to one of these elements (which can happen either directly or in the form of acceleration via movement), an electrode can measure the changes in voltage and provide an output that can be quantified.
The Algorithms
All of the devices on the market that are associated with a proprietary mobile app use an algorithm to predict sleep episodes. Many of these are black-box algorithms that aren’t revealed to consumers or even researchers who use these devices in scientific studies. These tend to be some variation of the Cole-Kripke algorithm, refined in 1992. This algorithm predicts whether a given time period of about 30 seconds (typically referred to as an “epoch” by sleep researchers) should be characterized as sleep or wake by looking at what happened in the four epochs preceding and two epochs succeeding the epoch in question. If most of the epochs surrounding the epoch in question are sleep, then there is a greater likelihood that the epoch in question is sleep. The same logic applies to wake. Because of this, devices are very good at predicting that the time between the start and end of a night of sleep is sleep, but they are not particularly good at catching small awakenings because of the inherent bias toward predicting sleep.
This is where having multiple sensors can help to capture micro awakenings, such as those experienced by individuals with obstructive sleep apnea, for example. During a respiratory event, when breathing is either very shallow or temporarily stops altogether, oxygen saturation drops, and heart rate increases. Using sensors that capture this data can offer additional input to correctly build a picture of sleep-wake behavior across the night. However, it is important to note that photoplethysmography (PPG), which captures heart rate by measuring blood flow in capillaries at the wrist or finger, is less effective on darker skin tones, individuals with obesity, and tattooed skin.
Evaluating Performance
Scientific studies have compared commercially available sleep trackers to the gold standard, lab-based polysomnography (PSG), rating them in terms of sensitivity, specificity, and accuracy.
Sensitivity is the percentage of true sleep epochs correctly scored as sleep by the device.
Specificity is the percentage of true wake epochs correctly scored as awake by the device.
Accuracy is the percentage of epochs where the device score matches the output from the PSG recording.
Sensitivity is often high due to the algorithms favoring sleep predictions within a sleep episode. Specificity tends to be the lowest score because sleep trackers can struggle to capture brief awakenings sandwiched between sleep. It's important to note that these algorithms, while designed to enhance accuracy, can sometimes lead to overestimating awakenings.
The accuracy score demonstrates that a device can correctly measure sleep outside of the lab. However, even the in-lab gold-standard testing method (PSG) can be inaccurate. This is problematic as the device is meant to test insomnia. Therefore, the best way to diagnose insomnia is not, in fact, via PSG but via self-reports.
The biggest obstacle with wearable devices is their ability to score sleep stages accurately, sometimes called “sleep architecture.” This is because the brain produces the primary signal for where we are in a sleep cycle (i.e., REM sleep or slow wave sleep), and it is then captured using EEG. Unless the wearable device comes with electrodes that measure brain activity, it is important to bear in mind that everything else (movement, heart rate, body temperature, etc.) serves as a proxy for what might be happening in the brain during the night.
In a recent study comparing six commercially available devices to in-lab PSG, correct sleep architecture estimates hovered between 50 and 65%. In another study from 2021, it was shown that overall, the ability of devices to correctly capture awakenings during the night was poor, and their ability to correctly stage sleep was worse when the night of sleep was generally poor or disrupted. In another recent study, four popular devices were found to perform poorly at tracking daytime sleep episodes, such as naps. This may be problematic for people who routinely take naps, shift workers who often sleep during the day, or in cultures where afternoon siestas are common.
It's important to note that studies assessing sleep tracker performance are primarily conducted on young, healthy adults. These devices, rightfully so, do not claim to diagnose sleep disorders. However, this means their performance is best understood among individuals who already have good sleep patterns. A 2021 study in the journal Sleep, led by Dr. Evan Chinoy, revealed that these devices tend to perform less accurately when sleep is disrupted. This implies that the data they generate may only be beneficial for those without any pre-existing sleep issues.
Therefore, it is best to speak to a health professional if you become concerned about your sleep score or whether you're getting enough deep sleep. The devices are great for giving you a general idea, but you must implement healthy sleep practices first and foremost.
The Bottom Line
Sleep is vital for all aspects of health, but it cannot be “optimized” or perfected. Sleep is a biological response that occurs after an extended period of wake and happens as a result of intrinsic cellular processes. In other words, our bodies know how to do it. We can prioritize sleep by giving ourselves enough time in bed, having an evening wind-down routine, and keeping lights dim at night, but there is no evidence to suggest that we can somehow perfect our sleep cycles. Our efforts are best placed by creating the circumstances in which sleep can occur and trusting our bodies to do what they need to during that time.
Mindset is incredibly powerful. It is for this reason that placebo pills are used in clinical trials alongside experimental drugs. Remember that how you feel is generally an excellent indicator of how well you will perform that day, and also the quality of your sleep the night prior. Sleep or daily readiness scores generated by consumer tracker devices and their associated apps may match how you feel on a given day, but they may not. Trust in how you feel, and consider using devices more so to capture information about general trends, such as how well you sleep when you travel or if you are trying to gauge sensitivity to things like caffeine, meal timing, and alcohol intake.
Our take:
In many studies that use commercial sleep trackers to measure sleep during an intervention, feedback from the tracker itself can end up independently improving sleep. Particularly for those who previously gave little thought to their time in bed each night, using an app that provides daily reminders, personalized insights using real data, and resources such as simple tips and infographics can be a powerful way to improve sleep health. For some people, however, the feedback can be a stressor. Particularly if there is any existing apprehension about sleep quality to begin with, sleep and readiness scores can be a source of anxiety. This is referred to as “orthosomnia”.
Devices improve yearly, incorporating more sensors, better algorithms, and more aesthetically appealing hardware. For those striving towards specific health goals, trackers can be useful ways of complying with routines and schedules. They can also serve as motivators that support people on their journeys to better health. However, they should not be given more power than is appropriate.
Sleep changes over the lifespan. Deep sleep is abundant in childhood and decreases steadily across middle and older life. It is thought that all stages of sleep are essential and that the body sleeps as it needs to. Signs of impaired sleep can be seen in waking life: cognitive problems, frequent illness, inflammation, and mood symptoms, to name a few. How you feel about your sleep is generally accurate and worth trusting in.
Will this benefit me?
Wearable devices come in many flavors, and there are lots of factors to consider when choosing one:
Battery life - Do you want a device you can comfortably forget about charging for days, or are you happy to top up with a daily boost?
Performance - Has the device been studied in a rigorous research setting? How does it perform against others on the market?
Native apps/software - What do you want from your app? Are you purely data driven, or do you also enjoy other features such as articles and mindfulness exercises?
Practicality - Rings must generally be removed for high impact activities such as weight lifting, but many users enjoy that rings do not have a clock face or screen that emits light at night.
Cost - Do you prefer a one-time purchase of the device and app, or is a subscription service more suitable for you?
Still curious to try it? If you do, here’s what to keep an eye on:
Remember that clever marketing is used by manufacturers of all of the popular wearable trackers on the market today. Often, claims are overstated and give the impression that devices perform better than they actually do. The newest devices truly are leagues better than the earliest accelerometers because they incorporate multiple biometrics, but they still have limitations. This is particularly true regarding sleep, especially among those who sleep poorly. If you are considering a new tracker, consider what you need and leave the rest behind. If you have any concerns about your sleep, speak to your healthcare provider.
References & Additional Materials:
Past, present, future of multisenor trackers - https://pubmed.ncbi.nlm.nih.gov/34713186/
Six wearable devices vs PSG - https://pubmed.ncbi.nlm.nih.gov/36016077/
Seven wearable devices vs PSG - https://pubmed.ncbi.nlm.nih.gov/33378539/
Eleven wearables and nearables vs PSG - https://pubmed.ncbi.nlm.nih.gov/37917155/
Are devices good at tracking daytime sleep? - https://pubmed.ncbi.nlm.nih.gov/37032817/
Is it just really good marketing? https://pubmed.ncbi.nlm.nih.gov/35445837/
Oura Gen 3 vs PSG - https://pubmed.ncbi.nlm.nih.gov/38382312/
A study of wearable trackers in healthy people - https://pubmed.ncbi.nlm.nih.gov/32043961/
Fitbit vs sleep diaries in cancer patients - https://pubmed.ncbi.nlm.nih.gov/37886444/
A review of sensor performance - https://pubmed.ncbi.nlm.nih.gov/34372308/
Fitbit Versa vs EEG - https://pubmed.ncbi.nlm.nih.gov/31499232/
PPG limitation - https://pubmed.ncbi.nlm.nih.gov/35003845/
Image created with BioRender.com
Comments