David Gershgorn reports:

Some of Google’s top AI researchers are trying to predict your medical outcome as soon as you’re admitted to the hospital.

A new research paper, published Jan. 24 with 34 co-authors and not peer-reviewed, claims better accuracy than existing software at predicting outcomes like whether a patient will die in the hospital, be discharged and readmitted, and their final diagnosis. To conduct the study, Google obtained de-identified data of 216,221 adults, with more than 46 billion data points between them. The data span 11 combined years at two hospitals, University of California San Francisco Medical Center (from 2012-2016) and University of Chicago Medicine (2009-2016).

Read more on Quartz.

OK, now if this is accurate, it sounds really promising, right? But I wondered how they got so much de-identified medical data on so many people. So I took a look at the paper’s methods section and here’s what is says:

We included EHR data from the University of California, San Francisco (UCSF) from 2012-2016, and the University of Chicago Medicine (UCM) from 2009-2016. We refer to each health system as Hospital A and Hospital B. All electronic health records were de-identified, except that dates of service were maintained in the UCM dataset. Both datasets contained patient demographics, provider orders, diagnoses, procedures, medications, laboratory values, vital signs, and flowsheet data, which represents all other structured data elements (e.g. nursing flowsheets), from all inpatient and outpatient encounters. The UCM dataset (but not UCSF) additionally contained de-identified, free-text medical notes. Each dataset was kept in an encrypted, access-controlled, and audited sandbox.

Ethics review and institutional review boards approved the study with waiver of informed consent or exemption at each institution.

So if you went to either of these hospitals, the hospital might have subsequently waived your informed consent and just turned over data on you that everyone believes is de-identified. Now it’s great that that it was kept encrypted, access-controlled, and in an audited sandbox, but here’s the thing:  are you okay with a hospital waiving your informed consent? How difficult might it be to re-identify the data?

I know a lot of people feel that it’s okay for entities to do this (waive consent) because it’s in the best interests of public health and progress, but of course, I focus on the individual’s rights. So think about it… is this okay and if it’s not, how does that affect your use of a particular hospital? Would you say or do anything different?


  2 Responses to “Google is using 46 billion data points to predict the medical outcomes of hospital patients”

  1. maybe its time to go back to old fashioned paper files?

    why is it always necessary to go keep up with digital technology, especially in the medical field? The best secure place are locked filing cabinets.

    • Amen. The govt required covered entities to go digital but the field wasn’t ready. And now we have massive breaches, but even more worryingly, the systems are interoperable. I can get only part of my records from my doctor, because the system they used beforehand won’t release the records in a way that they can be imported into the new/different system, etc. Bah….

