How to Export Apple Health Data to InfluxDB for Grafana Dashboards
Problem
I wanted to visualize my Apple Health data in custom dashboards beyond what the Health app provides. Apple’s Health app has basic charts, but I needed:
- Cross-metric correlation (sleep vs. activity, HRV vs. stress)
- Long-term trend analysis with advanced queries
- ML-based predictions for sleep quality
- A single dashboard combining all health metrics
I tried exporting data to InfluxDB and building Grafana dashboards. Then I hit a critical issue that made my queries return wrong values.
The Critical Gotcha
My step count queries were returning “5 steps today” instead of thousands. I spent hours debugging before discovering the problem:
Apple Health exports step counts as per-minute granules, not daily totals.
# What I expected:2024-01-15, steps: 8500
# What Apple Health actually exports:2024-01-15 09:00, steps: 152024-01-15 09:01, steps: 232024-01-15 09:02, steps: 8... (1440 rows per day)When I used mean() aggregation, I got the average of per-minute values, not the total. The fix:
// WRONG: Returns ~5 steps (average per minute)from(bucket: "apple_health") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "steps") |> mean() // This averages per-minute granules// CORRECT: Returns ~8500 steps (sum of all granules)from(bucket: "apple_health") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "steps") |> sum() // Sum all per-minute valuesUse sum() for cumulative metrics (steps, distance, calories). Use mean() for point-in-time metrics (heart rate, blood oxygen).
Solution Architecture
I set up this pipeline:
iPhone Health App | vLocal Webhook Server (Python/Flask) | vInfluxDB (Time-series database) | vGrafana Dashboards + ML PipelineSetting Up the Webhook Server
I created a local webhook server to receive Apple Health data:
from flask import Flask, request, jsonifyfrom influxdb_client import InfluxDBClient, Pointfrom influxdb_client.client.write_api import SYNCHRONOUSimport jsonfrom datetime import datetime
app = Flask(__name__)
# InfluxDB configurationclient = InfluxDBClient( url="http://localhost:8086", token="your-token", org="health")write_api = client.write_api(write_options=SYNCHRONOUS)
@app.route('/health-sync', methods=['POST'])def health_sync(): """Receive Apple Health data from webhook""" data = request.json
points = [] for record in data.get('data', []): # Determine aggregation type based on metric metric_type = get_metric_type(record['type'])
point = Point(record['type']) \ .tag("source", record.get('source', 'iphone')) \ .tag("metric_type", metric_type) \ .field("value", record['value']) \ .time(record['timestamp'])
points.append(point)
# Batch write to InfluxDB write_api.write(bucket="apple_health", record=points)
return jsonify({"status": "success", "count": len(points)})
def get_metric_type(metric_name: str) -> str: """Determine if metric is cumulative or instantaneous""" cumulative = ['steps', 'distance', 'active_energy', 'flights_climbed'] return 'cumulative' if metric_name in cumulative else 'instantaneous'
if __name__ == '__main__': app.run(host='0.0.0.0', port=5000, ssl_context='adhoc')InfluxDB Bucket Setup
I created an InfluxDB bucket with appropriate retention:
# Create bucket with 2-year retentioninflux bucket create \ --name apple_health \ --org health \ --retention 17520h # 2 years
# Create token with write accessinflux auth create \ --org health \ --write-bucket apple_health \ --description "Apple Health sync token"Flux Queries for Grafana
Here are the queries I use in my Grafana dashboards:
Daily Steps Query
from(bucket: "apple_health") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "steps") |> filter(fn: (r) => r._field == "value") |> aggregateWindow(every: 1d, fn: sum, createEmpty: false) |> yield(name: "daily_steps")Heart Rate (use mean, not sum)
from(bucket: "apple_health") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "heart_rate") |> filter(fn: (r) => r._field == "value") |> aggregateWindow(every: 5m, fn: mean, createEmpty: false) |> yield(name: "heart_rate")Sleep Analysis
from(bucket: "apple_health") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "sleep_analysis") |> filter(fn: (r) => r._field == "value") |> aggregateWindow(every: 1d, fn: sum, createEmpty: false) |> yield(name: "sleep_hours")HRV Trend
from(bucket: "apple_health") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "heart_rate_variability") |> filter(fn: (r) => r._field == "value") |> aggregateWindow(every: 1d, fn: mean, createEmpty: false) |> movingAverage(n: 7) // 7-day moving average |> yield(name: "hrv_trend")Grafana Dashboard Panels
I created 6 dashboards:
- Sleep Dashboard - Sleep stages, duration, quality score
- HRV Dashboard - Heart rate variability trends, stress indicators
- Heart Rate Dashboard - Resting HR, exercise HR, zones
- VO2 Max Dashboard - Cardio fitness trends
- Activity Dashboard - Steps, distance, active energy, stand hours
- SpO2 Dashboard - Blood oxygen levels
Here’s a sample panel configuration:
{ "title": "Daily Steps", "type": "stat", "targets": [ { "query": "from(bucket: \"apple_health\")\n |> range(start: v.timeRangeStart, stop: v.timeRangeStop)\n |> filter(fn: (r) => r._measurement == \"steps\")\n |> aggregateWindow(every: 1d, fn: sum)", "refId": "A" } ], "options": { "graphMode": "area", "colorMode": "value" }, "fieldConfig": { "defaults": { "unit": "short", "thresholds": { "mode": "absolute", "steps": [ {"color": "red", "value": 0}, {"color": "yellow", "value": 5000}, {"color": "green", "value": 10000} ] } } }}ML Pipeline for Predictions
I added a RandomForest model for sleep quality prediction:
import pandas as pdfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import train_test_splitfrom influxdb_client import InfluxDBClientimport joblib
class SleepPredictor: def __init__(self, influx_url: str, token: str, org: str): self.client = InfluxDBClient(url=influx_url, token=token, org=org) self.model = None
def fetch_training_data(self, days: int = 90): """Fetch last 90 days of health data""" query = f''' from(bucket: "apple_health") |> range(start: -{days}d) |> filter(fn: (r) => r._measurement == "steps" or r._measurement == "heart_rate" or r._measurement == "active_energy" ) |> aggregateWindow(every: 1d, fn: sum) '''
query_api = self.client.query_api() result = query_api.query_data_frame(query) return result
def train_model(self): """Train sleep quality predictor""" df = self.fetch_training_data()
# Feature engineering features = df.pivot_table( index='_time', columns='_measurement', values='_value' ).fillna(0)
features['sleep_quality'] = features['sleep_analysis'].apply( lambda x: 1 if x >= 7 else 0 # Binary: good sleep >= 7 hours )
X = features[['steps', 'heart_rate', 'active_energy']] y = features['sleep_quality']
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )
self.model = RandomForestClassifier(n_estimators=100) self.model.fit(X_train, y_train)
# Save model joblib.dump(self.model, 'sleep_model.joblib')
return self.model.score(X_test, y_test)
# Cron job: Retrain every Sunday at 3 AM# 0 3 * * 0 /usr/bin/python3 /path/to/sleep-predictor.py --trainAutomation with Cron
I set up automated tasks:
# Sync Apple Health data every 5 minutes*/5 * * * * /usr/bin/python3 /opt/health/sync.py
# Retrain ML model weekly (Sunday 3 AM)0 3 * * 0 /usr/bin/python3 /opt/health/train_model.py
# Daily backup to S30 2 * * * /usr/bin/influx backup /tmp/backup && aws s3 sync /tmp/backup s3://my-bucket/health-backup/Common Mistakes
Mistake 1: Using mean() for Cumulative Metrics
// WRONG: Returns average per-minute value (~5)|> aggregateWindow(every: 1d, fn: mean)
// CORRECT: Returns total daily value (~8500)|> aggregateWindow(every: 1d, fn: sum)Mistake 2: Not Handling Missing Data
// WRONG: Creates gaps in visualization|> aggregateWindow(every: 1d, fn: mean)
// CORRECT: Fill gaps with interpolation|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)|> fill(usePrevious: true)Mistake 3: Wrong Timezone
// WRONG: Data appears at wrong times|> aggregateWindow(every: 1d, fn: sum)
// CORRECT: Use local timezoneimport "timezone"option location = timezone.location(name: "America/New_York")|> aggregateWindow(every: 1d, fn: sum)Why This Matters
Building this pipeline gave me:
- Holistic Health View - Correlations between sleep, activity, HRV on one screen
- Proactive Health Management - Anomaly detection before issues become problems
- ML Predictions - Predict sleep quality from daily activity patterns
- Privacy - All data stays on my local infrastructure
The most valuable insight was discovering that my HRV drops significantly 2 days before I get sick, giving me early warning to rest.
Summary
In this post, I showed how to export Apple Health data to InfluxDB for custom Grafana dashboards. The critical gotcha is using sum() aggregation for cumulative metrics like steps, distance, and calories, not mean(). Apple Health exports per-minute granules, so you must aggregate correctly or your queries will return wrong values.
The complete pipeline includes: a local webhook server for data sync, InfluxDB for time-series storage, Grafana for visualization, and an ML model for predictions. With proper aggregation, you get accurate health dashboards that reveal patterns invisible in the Health app.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: I gave my home a brain. Here's what 50 days of self-hosted AI looks like
- 👨💻 InfluxDB Documentation
- 👨💻 Grafana Documentation
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments