How Raw Multimodal Signals Beat the Black-Box Sleep Score

For wearable sleep monitoring, one of the most consequential design decisions is rarely visible on a spec sheet: where the analysis actually runs. The same raw photoplethysmography (PPG) and accelerometry that a ring or watch collects overnight can be scored on the device itself, on a paired phone, or shipped to a server for heavier modeling. That single architectural choice, on-device versus off-device, shapes accuracy, transparency, battery life, privacy, and how quickly you can act on a signal. Here is how the tradeoffs break down, and how to decide which approach fits a given study or product.

Two pipelines, one stream of data

Every consumer sleep wearable starts the same way: optical and inertial sensors capture raw time-series signals, typically PPG, tri-axial accelerometry, beat-to-beat intervals (BBI), and sometimes skin temperature. What happens next splits into two patterns. In an on-device pipeline, a model runs locally and only derived outputs leave the device, such as sleep stages, a sleep score, or summary metrics. In an off-device pipeline, raw or near-raw signals are transmitted to a server where more powerful models do the scoring.

This distinction matters because, by default, the major consumer platforms keep the heavy lifting local and transmit only aggregated results. Garmin makes the split explicit: its Health API exposes aggregated daily metrics, while the Companion SDK streams raw time-series accelerometer, BBI, and respiration data.10 Oura and Apple follow the same broad pattern, surfacing derived stages and scores rather than the underlying waveforms.

On-device analysis: fast, private, constrained

Where it wins. Keeping computation on the device is privacy-preserving by architecture: biometric data never leaves the user’s hardware, which maps cleanly onto GDPR privacy-by-design expectations and a defensible HIPAA posture.13 It also enables real-time feedback with minimal latency, which is exactly what just-in-time adaptive interventions (JITAIs) and EMA triggers depend on, and it keeps working when connectivity does not.14 When a device transmits only compact outputs instead of continuous waveforms, it also keeps radio power, the single largest drain in many designs, in check.8

Where it struggles. Compact wearables have limited compute, memory, and battery, which caps the complexity of any model that can run locally and in real time.1 The deeper problem for research is that vendor on-device algorithms are proprietary and frequently validated in small samples, so their measurement validity is often unclear.4 You inherit the vendor’s scoring choices with no ability to audit, reproduce, or improve them.7

Off-device analysis: powerful, transparent, hungry

Where it wins. Server-side computing removes the resource ceiling. Convolutional and U-Net style networks, and self-supervised models trained on large cohorts, can map raw PPG and accelerometry to four-class hypnograms that approach the agreement of human scorers.3,6 Just as important for rigorous work, an off-device model is one you control: you can validate it epoch-by-epoch against polysomnography (PSG) using sensitivity, specificity, and Cohen’s kappa, publish it, and let others reproduce it.2 Open, data-driven models have outperformed hand-crafted feature baselines, and they can be retrained as cohorts and methods grow.4

Where it struggles. Streaming raw, high-frequency signals over Bluetooth Low Energy is the most power-hungry thing a wearable can do, and continuous raw transmission drains the battery quickly.8,9 It also introduces a dependence on connectivity and adds latency, and it widens the privacy surface: data in transit can be intercepted, and centralized stores of health data are high-value breach targets.12

The real advantage is raw signal access, not the cloud

It is tempting to frame this as on-device versus the cloud, but that misses the point. The reason an off-device approach can beat a proprietary one is access to raw multimodal signals. A proprietary on-device pipeline collapses a rich overnight waveform into a handful of derived numbers, discarding information and baking in opaque assumptions that downstream models then have to fight.7 Very few independent validation studies of consumer sleep technology exist, precisely because raw data is rarely made accessible for external use.5

When you can pull raw BBI and accelerometry, the calculus changes. You can build transparent models, validate them against PSG, and reproduce the result. This is where Garmin earns a place as a first-class research platform rather than a footnote. The Companion SDK provides raw, time-series accelerometer, BBI, and respiration data at research-grade resolution,10 and the HIPAA-compliant Standard SDK lets you aggregate and archive that data in your own systems.11 With no mandatory subscription gating the data and accessible entry-level hardware, raw-signal research becomes feasible across a full cohort rather than a benchtop demo.

Why vendors default to on-device anyway

None of this means the platform vendors made the wrong call. Radios consume several milliwatts while transmitting, and a poorly optimized design that continuously streams raw data will exhaust a battery fast.9,8 Computing on the device and shipping only compact summaries is what buys a consumer multiple days between charges. For someone who wants a week of wear and a morning readiness score, that is the right tradeoff. For a researcher who needs raw signals and auditable models, it is simply a constraint to engineer around, for example with scheduled or opportunistic uploads and on-device feature extraction paired with periodic raw transfers.1

When to do what

A simple way to choose:

  • On-device when the priority is real-time intervention (JITAIs, EMA triggers), strict privacy, unreliable connectivity, or maximum battery life.
  • Off-device when the priority is accuracy, transparent and auditable models, external validation, retraining on pooled cohorts, or retrospective re-analysis as methods improve.
  • Hybrid when you want both, which in practice is most of the time.

The pragmatic standard is a hybrid edge-cloud design: lightweight on-device inference for instant feedback, with raw or feature-level uploads to a server for the heavy, transparent modeling and validation that research demands.14,1

The throughline is that the architecture should follow the research question, not the marketing. If you need defensible sleep staging that you can validate, reproduce, and improve, raw multimodal signal access plus off-device modeling is the stronger foundation, and the battery cost is a known, solvable engineering problem rather than a reason to settle for a black box. That is the bet Centralive’s platform is built on: transparent, validated analysis grounded in raw signals, engineered to respect the realities of battery and bandwidth.

References

  1. Beyond the Sleep Lab: A Narrative Review of Wearable Sleep Monitoring. Bioengineering, 2025. https://doi.org/10.3390/bioengineering12111191
  2. Walch O, et al. Sleep stage prediction with raw acceleration and photoplethysmography heart rate data derived from a consumer wearable device. SLEEP, 2019. https://doi.org/10.1093/sleep/zsz180
  3. Sleep staging classification from wearable signals using deep learning. Google Research. https://research.google/pubs/sleep-staging-classification-from-wearable-signals-using-deep-learning/
  4. Self-supervised learning of accelerometer data provides new insights for sleep and its association with mortality. npj Digital Medicine, 2024. https://doi.org/10.1038/s41746-024-01065-0
  5. A Flexible Deep Learning Architecture for Temporal Sleep Stage Classification Using Accelerometry and Photoplethysmography. IEEE Transactions on Biomedical Engineering, 2022. https://ieeexplore.ieee.org/document/9813567
  6. Kotzen K, et al. SleepPPG-Net: a deep learning algorithm for robust sleep staging from continuous photoplethysmography. 2022. https://doi.org/10.48550/arXiv.2202.05735
  7. A Multi-Level Classification Approach for Sleep Stage Prediction With Processed Data Derived From Consumer Wearable Activity Trackers. Frontiers in Digital Health, 2021. https://doi.org/10.3389/fdgth.2021.665946
  8. BLE in Wearable Medical Devices. Ignitec. https://www.ignitec.com/insights/ble-in-wearable-medical-devices
  9. How can designers decrease power and increase functions in wearables. Design World. https://www.designworldonline.com/how-can-designers-decrease-power-and-increase-functions-in-wearables-part-1/
  10. Enable Digital Phenotyping with Garmin Devices in Your Longitudinal Research Studies. Center for Technology and Behavioral Health, Dartmouth. https://www.c4tbh.org/research-tools/enable-digital-phenotyping-with-garmin-devices-in-your-longitudinal-research-studies/
  11. Health SDKs Overview. Garmin Developers. https://developer.garmin.com/health-sdk/
  12. Privacy is All You Need: Revolutionizing Wearable Health Data with Advanced PETs. 2025. https://doi.org/10.48550/arXiv.2503.03428
  13. Synheart Emotion: Privacy-Preserving On-Device Emotion Recognition from Biosignals. 2025. https://doi.org/10.48550/arXiv.2511.06231
  14. Edge AI vs. Cloud AI. IBM. https://www.ibm.com/think/topics/edge-vs-cloud-ai

Sign up for the Centralive Newsletter: https://newsletter.centralive.health/signup