Towards General Intelligence for Wearable Health Data: Trillion-Minute Pretraining and LLM Agent Proxy

While wearable devices can capture vast amounts of physiological and behavioral signals, translating them into personalized health insights remains challenging due to large individual variability and the scarcity of high-quality labeled data. To address this, we introduce a foundation model for wearable health, pre-trained on over one trillion minutes of unlabeled sensor data from five million participants. Our studies show that joint scaling of model capacity and pre-training data yields systematic performance gains across 35 tasks spanning cardiovascular, metabolic, sleep, and mental health domains. The model unlocks label-efficient few-shot learning and generative capabilities, and we deploy large language model agents to automatically search for optimal downstream prediction heads, further boosting performance. The resulting Personal Health Agent, evaluated by 1,860 clinicians, demonstrates superior relevance, contextual awareness, and safety.

Background and Context

The proliferation of wearable sensor technology has fundamentally altered the landscape of digital health by making the continuous capture of user behavior and physiological information more accessible than ever before. These devices generate vast streams of raw data, yet the translation of these low-dimensional signals into high-value, personalized health insights remains a formidable challenge. The core difficulty lies in the extreme heterogeneity of human phenotypes. Individuals vary significantly in their baseline health status, underlying physiological mechanisms, and lifestyle habits, creating a complex mapping problem from raw sensor inputs to high-level health state representations. This variability means that a model trained on one population often fails to generalize to another without significant adaptation.

Compounding the issue of individual variability is the severe scarcity of high-quality labeled data. Obtaining ground-truth health outcomes that are accurately annotated to wearable sensor data is both costly and time-intensive. In retrospective studies, the manual annotation process is often impractical due to the sheer volume of data and the need for specialized medical expertise. Consequently, the industry faces a bottleneck where the potential of wearable data is limited by a lack of supervised learning targets. This data asymmetry has historically forced researchers to rely on small, curated datasets that may not reflect real-world diversity, limiting the robustness and generalizability of predictive models.

To address these systemic challenges, this research introduces a foundational approach specifically designed for wearable health data. The primary innovation is the shift from supervised learning on small datasets to unsupervised pre-training on massive scales of unlabeled data. By constructing a universal representation space capable of understanding the spatiotemporal patterns of complex physiological signals, the model aims to decouple feature learning from task-specific annotation. This strategy lays a robust data and model foundation for subsequent health prediction and personalized intervention, effectively bypassing the traditional dependency on expensive, high-quality labels during the initial training phase.

Deep Analysis

The technical architecture of this foundation model is built upon an unprecedented scale of pre-training data. The model was trained on a cohort comprising five million participants, processing a cumulative total of over one trillion minutes of unlabeled sensor data. This massive dataset allows the model to autonomously learn the intrinsic laws and patterns of physiological signals without relying on human-provided supervision. The sheer volume of data enables the model to capture subtle, long-term trends and rare events that smaller datasets would miss, creating a rich, high-dimensional embedding space that encapsulates diverse health states. A critical finding of this study is the demonstration of scaling laws within the domain of wearable health. The research shows that the joint scaling of model capacity and pre-training data volume yields systematic performance gains across a wide array of tasks. This indicates that, similar to large language models, increasing the size of both the neural network and the training corpus leads to predictable improvements in representation quality. The study confirms that the benefits of scaling are not marginal but substantial, suggesting that the current scale of data and compute used is still within a regime where further expansion will continue to drive performance improvements.

To fully exploit the potential of these pre-trained representations, the research team moved beyond traditional supervised fine-tuning. They deployed a novel mechanism involving large language model (LLM) agents acting as a "classroom" for automated architecture search. These agents were granted autonomy to search and optimize the structure of downstream prediction heads within the model's embedding space. This approach significantly reduces the burden of manual model design and allows for the exploration of a much broader model space. The results indicate that LLM agents can discover network structures that outperform human-designed architectures, with performance gains increasing as the capacity of the LLM agents themselves increases. The evaluation of this framework was conducted across 35 diverse health prediction tasks, spanning cardiovascular health, metabolic indicators, sleep quality, mental health, and lifestyle-related demographic factors. The experiments demonstrated that the representations extracted from the foundation model exhibit high label efficiency in few-shot learning scenarios. This means the model can achieve robust estimation of daily metrics with very few labeled examples, a crucial capability for rare conditions or new health metrics where data is scarce. Ablation studies further confirmed that the scale and quality of pre-training data are decisive factors for final performance, while the agent-based search strategy consistently identified superior prediction heads compared to manual design.

Industry Impact

The implications of this research extend beyond academic metrics to offer a new paradigm for the commercialization of wearable health data. By establishing a general-purpose health foundation model, developers can significantly lower the barrier to entry for creating specialized health applications. Instead of training separate models for each specific health metric, which is resource-intensive and data-hungry, developers can leverage the pre-trained foundation model and adapt it to specific tasks with minimal additional data. This accelerates innovation in digital health, allowing for the rapid deployment of applications that monitor everything from cardiac arrhythmias to metabolic shifts. A key component of this impact is the integration of downstream predictors into "Personal Health Agents." These agents are not merely static dashboards but dynamic systems that can provide relevant, context-aware, and safe health advice. The use of LLM agents to optimize these predictors ensures that the advice is tailored to the individual's unique physiological baseline and current context. This shift from passive monitoring to active, intelligent assistance represents a significant value proposition for consumers and healthcare providers alike, potentially improving adherence to health recommendations and enabling earlier intervention. The validity and reliability of these Personal Health Agents were rigorously tested through an evaluation involving 1,860 clinicians. The feedback indicated that the agents demonstrated superior relevance, contextual awareness, and safety compared to existing approaches. This professional endorsement is critical for the adoption of AI in medical settings, as it addresses concerns about hallucination and inappropriate advice. The ability of the system to provide clinically sound insights suggests that it can serve as a valuable tool for healthcare professionals, helping to triage patients, monitor chronic conditions remotely, and reduce the burden on overstrained medical resources.

Furthermore, the research highlights the potential of the model in generative tasks. The ability to generate simulated physiological signals that adhere to individual biological laws offers significant advantages for data augmentation and privacy-preserving model training. Synthetic data can be used to train other models without exposing sensitive patient information, addressing a major regulatory hurdle in health tech. This generative capability also opens doors for personalized simulation, allowing users to see potential health outcomes based on different lifestyle choices, thereby empowering proactive health management.

Outlook

This study marks a pivotal transition in wearable health analysis, moving from single-metric monitoring systems to general-purpose intelligent health agents. The successful application of trillion-minute pre-training and LLM-driven optimization demonstrates that the challenges of data scarcity and individual variability can be overcome through scale and automation. As the field moves forward, the focus will likely shift towards refining these agents for real-time deployment and integrating them into broader healthcare ecosystems. The ability to provide continuous, personalized insights has the potential to transform preventive care, shifting the paradigm from reactive treatment to proactive health maintenance. Looking ahead, the integration of multimodal data sources represents the next logical step. While this study focused on sensor data, combining physiological signals with electronic health records, genetic information, and environmental data could further enhance the accuracy and depth of health predictions. The foundation model architecture is well-suited to accommodate these additional modalities, allowing for a more holistic view of an individual's health. This multi-faceted approach could unlock new insights into the complex interplay between biology, behavior, and environment. Additionally, the scalability of the LLM agent framework suggests that automated model design could become a standard practice in health AI. As agents become more sophisticated, they may be able to not only optimize prediction heads but also identify novel biomarkers or health indicators that were previously undetectable. This could lead to the discovery of new early warning signs for diseases, further enhancing the preventive capabilities of wearable technology. The collaboration between AI researchers and clinicians will be essential to ensure that these advancements are translated into safe, effective, and equitable health solutions.

Finally, the ethical and privacy implications of such powerful models must be addressed. The ability to generate realistic physiological data and provide personalized advice raises questions about data ownership, consent, and algorithmic bias. Robust governance frameworks will be needed to ensure that these technologies are used responsibly. However, the potential benefits are immense. By democratizing access to high-quality health insights and enabling early detection of health issues, wearable health foundation models have the potential to improve global health outcomes and reduce healthcare costs significantly. The journey from raw data to intelligent action is just beginning, and this research provides a clear roadmap for the future of digital health.