Everything a patient’s daily routine includes is patient-generated health data or PGHD: sleep, physical activity, medicine intake, blood pressure and glucose measurement, nutrition, hydration and more. It is a relatively new source of patient information, and it only develops over time as new meters and wearable devices appear.
Whereas PGHD has a great potential for healthcare data analytics, it is not easy to tame this flow of information and mine the valuable insights. We will consider the challenges ahead on the examples of diabetes-related data.
Diabetes patients have to always keep their blood glucose under control, because higher glucose levels can lead to severe complications (e.g., retinopathy, neuropathy), while lower might cause coma and even death. Thus, patients should measure glucose a few times a day for the rest of their lives.
Analyzing patients’ glucose measurements along with other PGHD, caregivers are able to help individuals stick to their normal numbers and avoid serious health deterioration.
Interrupted measurements
A patient with diabetes may skip his or her blood glucose measurements within several days and even weeks. As we can’t actually force patients to behave, only technology can help out in this situation. The analytics algorithm should work with the interrupted measurements without errors and false reports.
For example, if a patient have gaps in glucose tracking in the middle of month, linear regression for the entire month can still build a credible trend. However, if measurements start from a middle of the month, the trend will be faulty.
Incomplete data
In sweet dreams of every healthcare data scientist there are butterflies, unicorns and complete data on everything a patient does. This way, every entry would include both cause (medication, meal, water intake, physical activity) and effect (blood glucose, pressure, temperature).
Sadly, real life doesn’t allow us to play with the unicorns. A few really motivated and self-disciplined patients will record all of their data daily, but even they can skip some measurements from time to time. Others have mood swings and some days they don’t want their caregivers know about a guilty doughnut with their morning coffee. Who can blame them? And some patients don’t want to report anything else but their blood glucose.
Challenge is analytics algorithms should still extract valuable information in the worst case and shine amidst a rich data flow.
Diverse frequency
Patients arbitrary measure their blood glucose. The number differs day by day. Some individuals wear continuous glucose monitors, which collect readings every 5 minutes automatically. Others have only fingersticks, and use them 7 times a day. Some of patients can even limit their measurements to 3 times a week. Algorithms have to understand what is going on in every of these cases and provide relevant outputs, which is not a piece of cake.
Systematic errors
Consumer devices simply can’t reach the level of lab equipment. And as PGHD is created with the former type of technology, there will be systematic and unknown errors.
For example, one patient’s glucometer can overstate the readings by 3%, while other’s downplays by 8%. And what should we do with that? Well, algorithms need to work with inaccuracies by taking the possible errors into account and applying a range instead of actual number provided by meter.
What’s next?
You see that patient-generated health data is naturally messy and foggy, as it relies on patient’s personality and devices he or she uses. So quality issues are inevitable. Dealing with them is an ultimate challenge, as valuable information will reveal itself only after applying robust algorithms or even sets where different algorithms process data with different quality.
Handling each of these algorithms is up to savvy data scientists, and we are eager to get your opinion. What do you think about the challenges presented and how they could be tackled?
By Natallia Babrovich, Business Analyst at ScienceSoft