Integration of remote sensor data with traditional clinical data sources is an exciting new area of biomedical research. Increasingly, clinical trial protocols incorporate wearable biosensors to remotely measure things like activity, heart function, blood glucose and oxygen saturation. This is an exciting opportunity because it allows us to view participants in their natural environments instead of in a laboratory setting. The tradeoff is that the devices need to be consumer friendly and usable without a research technician.

research device checklist

In this post we discuss data and software issues that we encountered in the course of conducting two clinical trials using commercially available health devices. In a future post, we will tackle issues related to data analysis.

This post is organized around a few themes:

  • Data access - how do you get data from the device? what data elements can you get? is the data reliable?
  • Protocol adherence - what factors influence adherence?
  • Data aggregators - does every device require writing different code?
  • Recommendations for device vendors - how to make life easier for researchers

Data Access

The most common mechanism for accessing data is an Application Programming Interface (API). An API allows a developer to request data from the device manufacturer on behalf of trial participants (with their permission). Some early stage devices that do not yet have an API will allow users to download their data as a CSV or JSON file.

In a typical architecture, the wearable device syncs with a mobile application via bluetooth, the mobile application syncs its data to the device’s servers, and the researcher writes code that sends requests to the device’s server and stores the results in a database owned by the researcher.

API issues to consider:

  • APIs change - Devices in early stages of development (read: almost all devices) have APIs that change rapidly. Code you write at the beginning of a study may not work by the end.
  • Tokens expire - Once a user has authenticated you to collect data from their device API, you are responsible for storing and maintaining their authentication token to allow you continued access to their data. We experienced an issue with one device that required some subjects to re-authenticate multiple times in the course of a 7-day study. If we were not able to contact the participants in a timely manner, we would have risked losing data for that portion of the study.
  • Data privacy - Device vendors have access to all of the data generated by their device that is collected in the course of your trial. This may be important for the IRB process.

What data elements are available?

The data that are available through the device’s app are not always available via the API. For example, the Withings Aura measures luminosity, noise and temperature in the room as shown in the screenshot below.

withings environment data Screenshot of Withings iOS sleep app showing luminosity, noise and temperature

However, those 3 measures are not available through the Withings API. Take care to ensure that the data measured by the device is accessible through the API. UPDATE: These parameters are available through an alternate oauth2 based API that is not easily discoverable but is available here.

Some data are available only through a special “Partners” API. For example, the Fitbit API for heart rate returns a daily aggregated heart rate: how many minutes you spent in different “heart rate zones”, like fat burn or cardio. If you want minute-by-minute heart rate data, you must apply to the Fitbit Partner program. “Applications must demonstrate necessity to create a great user experience”. Even more hoops to jump through.

Raw data vs. derived data

Wearable devices consist of sensors and accompanying software that processes the sensor data into a value that a consumer might actually care about. For example, a Fitbit tracker has a 3-axis accelerometer that tracks your motion. The raw data looks something like the figure below:

raw accelerometer plots Raw 3-axis accelerometer data for 4 activities from a cell phone (Kwapisz, et. al.)1

Fitbit applies proprietary algorithms to the raw accelerometer data to infer a metric of interest, such as step count (below). The data available from most device APIs is the derived metric.

raw accelerometer plots Step counts in five minute increments from Fitbit

It is often difficult, if not impossible, to obtain the raw data feeds from the device. This could be problematic if you wanted to use the 3-axis wrist worn accelerometer data to compute a different metric, such as “hand tremors”. Devices oftentimes do not store or transmit the raw sensor data in order to save on space and power consumption. Of the 4 devices used in our sleep study, only the Hexoskin offers access to its raw sensor data.

Algorithm modifications

Remember how we talked about raw data vs. derived data? Vendors routinely modify their algorithms to improve accuracy and performance. These changes are not generally reported.

Without knowing when algorithm changes take place, longitudinal data analysis is complicated; was an observed change the result of a real change in the study participant or a change in the way the algorithm processed the data for the study participant?

While this might sound like a theoretical concern, it is not. In the course of a single study, 2 out of the 4 device manufacturers modified an algorithm used to compute a data element that we were using - in our case, sleep staging and Heart Rate Variability (HRV). It was only through ongoing communication with the engineering teams at both companies that we became aware of the changes.

Battery life and protocol compliance

It is important to consider the features of a wearable device that could influence protocol adherence. In our own trials, we’ve found aesthetics and comfort to be issues for some participants. The most important feature related to compliance in our preliminary trials is battery life. Devices with shorter battery life tend to have worse compliance.

Battery life vs usage Devices with shorter battery life had worse compliance in a small study (n=7). Note that the Withings Aura is plugged in, and therefore does not need to be charged.

Third party aggregators

I recently got an e-mail from a colleague “interested in getting data from patient’s wearables (that they already own) and were curious as to your suggestions on how best to do this.”

In a BYOD world, this is totally reasonable idea: lots of people already have Jawbones or Fitbits or Apple Watches or Snapface, can’t we just pull the data from the devices they already use?

The answer is “Yes you can”. You basically have two options:

  1. Sort the devices by popularity (Fitbit, Apple, Xiaomi, Samsung, Garmin)2 then write your a data ingestion pipeline with each vendor’s API
  2. Use a third party data aggregator, such as Validic or HumanAPI, and write your data ingestion pipeline using a single API

While accessing the device APIs directly is generally free, managing authentication and tokens and staying on top of changes to each vendor’s API is not. Aggregator services take care of all that and give you a nice unified, well-documented, and stable API. As you increase the number of devices and participants you want to support, the value of this service increases.

There are also platform specific solutions, such as Apple’s HealthKit and Google’s Fit. By design, HealthKit stores data locally on the user’s mobile device. If you want to access data stored in HealthKit, you have to write an iOS app to access the local data and store it on your server. This is another area where third-party services can help, for a price.

Validation data

(or, “is this device just a random number generator?”)

This is a hard one. How do you know if the device can do what it says it can do? How sensitive is your study to the accuracy of the device? Does the device have systematic problems with some population of users?

Recommendations to device makers

  1. Version your algorithms and APIs.
  2. Eat your own dog food - use your developer APIs to power your own applications whenever possible. This ensures that (a) we always have access to all our data (b) there are no discrepancies between the values displayed in the app and the values returned by the API (c) it will make your API more robust to have a user (yourself) from the start.