Automated De-Identification For Personal Health Data Privacy
Automated De-Identification For Personal Health Data Privacy Adrian Bridgwater, Senior Contributor
BIRMINGHAM, ENGLAND - JANUARY 22: Editors Note: This image may have been digitally manipulated for ... [+] confidentiality to remove any patient identidy data. An embryologist fertilises embryos in the fertility laboratory at Birmingham Women's Hospital fertility clinic on January 22, 2015 in Birmingham, England. Birmingham Womens Hospital provides a range of health services to women and their families using the latest scientific procedures and care. Last year the maternity unit delivered over 8,000 babies, cared for 50,000 patients and performed over 3000 procedures in it's state of the art theatres. The hospital is also home to world renowned research scientists, fertility clinic and the national sperm bank. (Photo by Christopher Furlong/Getty Images)Getty Images
People create data. Every interaction we humans make with our apps, machines, devices, services and computing platforms inititiates computing ‘events’ which in turn create log files and ultimately form some part of the planet’s ever-growing data mountain.
As we now increasingly digitize our lives, more and more of that ‘people data’ is individually-specific to ourselves and therefore sensitive from a privacy and security perspective. These days we call that kind of data Personally Identifiable Information (PII) and managing and working with it requires kid gloves.
Personally Identifiable Information (PII)
Although all PII is arguably fairly important, we really start to care about it when we consider its use in highly-regulated industries like healthcare and finance. Specialist enterprise technology vendors now exist to help manage, secure and cleanse data used in this space to protect individuals’ rights, freedoms and privacy.
This is data that is governed by a huge number of wide-ranging regulations, such as HIPAA for healthcare and GDPR in the EU, CCPA in California etc. In the USA, there are even more privacy laws coming out at the state level; regulation and compliance are now a core part of the data fabric that we are all exposed to every day.
To take a working example, imagine analyzing data on a wearable device in combination with a person’s medical prescription data. This could provide a holistic picture of a person’s health and habits and that information could yield better preventive care. By doing this at scale across populations, companies in healthcare could uncover new insights, identify patterns and potentially improve healthcare outcomes for thousands of people, maybe millions.
The problem is that this mashing together of data types that includes sensitive healthcare data raises significant privacy concerns that must be addressed with the utmost care to protect people’s privacy and comply with regulations. This is the caveat highlighted by Shubh Sinha, co-founder of Integral, an organization known for its Certifications-as-a-Service technology that provides privacy automation services to create de-identified data that companies can use to discover insights and make business decisions.
Helping health & healthcare
“Combining healthcare data that contains medical records, health histories and genetic information with non-healthcare data such as social media activity, purchasing history and location data, allows researchers to gain a deeper understanding of a person's health and lifestyle, which can be invaluable for healthcare research. However, this combination also poses a significant threat to people's privacy, as it exposes a wealth of intimate information about individuals,” explained Sinha.
So, how does this get resolved?
Today, in order to ensure the privacy of data, companies contract consultants to analyze data and produce a set of privacy risks. These consultants and companies work together to remediate datasets to produce a safe and actionable dataset. However, this process generally can typically take a month (perhaps two) because these are largely manual (and expensive) processes.
To attempt to address this problem, Sinha co-founded Integral to create technology that automates the process of identifying and classifying sensitive data in order for it to meet certified data compliance standards that uphold privacy regulations. Essentially, the company is delivering automation software that helps preserve privacy for working with sensitive data.
Information assimilation
“The assimilation of data has the potential to revolutionize healthcare research and innovation. However, this must be done with great care and sensitivity to protect people's privacy. The use of automated privacy technology is critical to achieving this goal, ensuring that personal information is protected and that people's privacy rights are respected. By prioritizing privacy and with automation, the full potential of data has the potential to improve healthcare outcomes for all,” said Sinha.
By providing access to data more quickly with compliance and safely sharing data, Integral claims to be opening new possibilities for healthcare i.e. better patient care, experiences, treatments and faster development of drug treatments while helping to maintain the privacy of personal data and making data available that can make a difference in people’s lives.
A technology tailwind
But why wasn’t this done before? Sinha says he is aware of this key thought.
He says that a key reason is the current 'tailwind' for privacy-compliant data analytics i.e. data privacy is now coming into the spotlight, so companies care more about it and as a result, they’re putting more effort towards being privacy first. This means they wait until everything is safe before they go ahead with analysis, but that wait time needs to be as little as possible, hence the need for automation.
"User data has exploded across all verticals (healthcare, wearable, geographic, demographic) and this has been further fuelled in recent years (and during Covid-19), because of the now quite widespread use of digital health devices that people have adopted. There are massive repositories of this data, but stitching it together for analysis was not always done due to privacy concerns," concluded Sinha.
Automation was the answer in this case, but automation in the form of software robot ‘bots’, accelerators and all forms of Artificial Intelligence (AI) and Machine Learning (ML) are the answers (plural) in almost all burgeoning areas of enterprise technology this decade.