Heart Rate Accuracy During Daily Activity
Since we launched Labfront, the most frequently asked question has probably been “Is the quality of signals from the watch any good?” This question is then closely followed up with, “Under which circumstances can we reliably trust the signals?” We too were interested in finding the answers to these questions and have consequently done a number of experiments to discover the answers. In this blog, we share our results with you in the hopes of advancing your understanding of the strengths and limitations of wearable devices and to help make Labfront more meaningful in your quest for solid scientific research.
Heart rate and heart rate variability are related to mental and physical health. As a means to collect HR/HRV data, wearable smart bands are extremely convenient and thus becoming increasingly popular for measuring stress, exercise intensity, arrhythmia detection, and so on. Watches derive heart rate by measuring the changes in vascular blood flow during the cardiac cycle using a photoplethysmography (PPG) sensor. However, recent studies have suggested that the accuracy of HR as measured by a PPG-based sensor can be susceptible to various confounding factors such as physical movement and upper arm muscle contractions. To investigate this further, we started a series of experiments to validate the accuracy of heart rate data measured by the smart band.
Two young, healthy male participants wore the Garmin Vivosmart 4 and Polar H10 for 24 hours on a regular workday. They manually recorded the time and duration of daily activities, including sleeping, (computer) typing, chatting, and walking.
We tested the accuracy of heart rate (HR) data obtained from the wrist band Garmin Vivosmart 4 by comparing the data with the Polar HR strap (H10, Polar Electro Oy). The Polar H10 is a handy chest-strap device that is widely considered the most accurate method for obtaining heart rate since it uses electrocardiographic (EKG) signals and not PPG-based signals. In other words, HR is determined by directly measuring the electrical activations of the heart and not indirectly through blood flow changes in the wrist.
To test the differences in the HR data from Vivosmart 4 and Polar H10 during different daily activities, we extracted three separate, but continuous 10-minute segments during each activity type from each participant.
Figure 1 illustrates an individual sample of HR time series from the Vivosmart 4 and the Polar H10 during the unrestricted, real-life scenarios. In general, the HR from the Vivosmart 4 and Polar H10 matched well through all four activities, in which most of the HR differences are less than 5 bpm. In particular, the two time-series are nearly identical during sleep. However, when typing, chatting, or walking, the HR series from Vivosmart 4 appear noisier than that from Polar H10.
There are several established ways to objectively quantify the accuracy of a device when compared to a “gold standard”. In this case, the device being evaluated is the PPG-based Garmin Vivosmart 4, and the “gold standard” is the Polar H10. Bland-Altman plot is one method where the level of agreement between two devices can be evaluated. Take for instance in the morning, your Vivosmart 4 shows a HR of 102 bpm while Polar H10 shows a HR of 106 bpm. This HR difference of 4 bpm might be considered acceptable when the HR is in the 100s but not so much when the HR is in the 40s. For this reason, this HR difference is plotted against the average HR of the two measures – in this case, 104 bpm.
Figure 2a-d show the Bland-Altman plots for HR data shown in Figure 1a-d, separately. The y-axis is the difference in HR obtained simultaneously by the two devices; the x-axis is the HR average of the measures obtained from Vivosmart 4 and Polar H10. In each subplot, the solid line represents the mean of HR differences, and the two dashed lines depict the mean ± two standard deviations, separately. Assuming a normal distribution, 95% of data would fall within this range demarcated by the two dashed lines. In other words, if your Garmin watch shows a HR of 81 while talking (Figure 2c), you would be ~95% confident that the true HR (as determined by Polar H10) is within the HR range of 71 to 91 bpm since the 2xSD was approximately 10 bpm.
If you were purely interested in the absolute difference (not worrying about whether one device had higher or lower measured HR), then the mean absolute error (MAE) can be calculated. As noted in Figure 2b-e, MAE was < 5 bpm in all 4 activities. But as noted previously, an MAE of 2 bpm may not be significant if your average HR was 140 compared to a lower HR of 35. Therefore, to account for this factor of HR, the MAE can be “corrected” by dividing it by the actual HR measured from Polar H10 to obtain the mean absolute percentage error (MAPE). A MAPE value less than 10% is generally the criterion by which the data is considered of sufficient quality [1-3] and so, by this criterion, the Vivosmart is sufficiently reliable for determining HR from the perspective of general use.
The results are summarized here in Table 1. For each type of activity, there are 6 sessions (3 sessions * 2 participants) of 10 mins HR data.
During sleep, both the MAE & MAPE values are quite low - suggesting that the HR from the Vivosmart 4 wrist band is accurate during sleep. The MAE & MAPE values are greater during the other 3 activities – indicating that Vivosmart 4 may not be as accurate during physical activity. Nevertheless, for all four activities, the MAPE values are less than 10% and so the HR from the Vivosmart may be considered reliable during these daily activities.
As a general rule, based on this data, we would be comfortable using a Garmin wristband for measuring heart rate through the course of the day, although the PPG does tend to give noisier HR during more physically-active motions.
5. Future Validations
In the current study, we tested the HR under four controlled activities. Therefore, the influences on HR accuracy from other factors such as watch snugness or motion artifacts (like multi-activities at the same time or sudden movements) would need to be explored.
What other validation studies would you like to see? Tell us what you want to know at firstname.lastname@example.org
 B. W. Nelson and N. B. Allen, “Accuracy of Consumer Wearable Heart Rate Measurement During an Ecologically Valid 24-Hour Period: Intraindividual Validation Study.,” JMIR mHealth uHealth, vol. 7, no. 3, p. e10828, Mar. 2019.
 B. D. Boudreaux et al., “Validity of wearable activity monitors during cycling and resistance exercise,” Medicine Sci. Sports Exercise, vol. 50, no. 3, pp. 624–633, 2018.
 H.W. Chow, C.C. Yang. “Accuracy of optical heart rate sensing technology in wearable fitness trackers for young and older adults: Validation and comparison study”. JMIR mHealth uHealth, vol. 8, no. 4, 2020
Han-Ping is the senior research lead (and chief plant caretaker) at Labfront, specializing in physiological data analysis. An alumnus of the BIDMC/Harvard's Center for Dynamical Biomarkers, Han-Ping uses his PhD in electrophysics to help Labfront customers convert raw physiological data into health insights. He does his best Python coding while powered by arm massages from his spiky-tongued cat, Pi.
Francis is a research Lead at Labfront, responsible for data validation and analysis. He is interested in applying physics or math to medical research.