An industry and academic data-sharing project went live on Tuesday, nearly a year after the platform was expected to launch. Visitors to the site – no credentials needed – can register and access de-identified clinical data from the comparator arms of nine oncology trials, with more to come.
Project DataSphere, the newest data transparency and collaboration project with buy-in from several top pharmas and academic medical institutions, is now open for research. The launch delay was attributed to a need to expand the analytical tools available on the website.
The project’s goal is not to reassess clinical data on currently available treatments or failed compounds. Instead, participating organizations are deepening the pool of de-identified cancer patient information, which they hope will lead to more efficient clinical trial protocols, epidemiological studies and ultimately the emergence of new cancer therapies (background here).
Data sets available on the Project DataSphere website are limited to comparator arms of late-stage oncology trials – there are nine data sets currently available – with an additional 25 data sets coming soon, according to the project’s leaders. Initial contributors include Sanofi, Celegene, AstraZeneca, Bayer, Pfizer, Johnson & Johnson and Memorial Sloan Kettering. Others are encouraged to participate, and organizations like Amgen, the National Cancer Institute’s Clinical Trials in Oncology, and Quintiles may soon join the list. SAS, the North Carolina-based software and analytics firm, is hosting the project’s website and providing analytical tools for researchers, pro bono.
Comparator arm data
Although active arm trial data – the pass/fail measure of any new therapeutic compound – is not available in Project DataSphere, data contributors say cancer control groups, particularly when pooled together, represent an important (and previously unavailable) resource for research. “Understanding the expected outcome for a comparator arm is a very important step in designing new clinical trials,” says Ronit Simantov, medical affairs lead, Pfizer oncology. “It will help with physical assumptions, the sizing of a trial, and other design logistics.”
It’s also easier to twist comparator arm data out of pharma company archives, since worries about giving away IP-related proprietary data is diminished. “The comparator arm is low-risk, at least in comparison to the EMA demanding that companies put all data out into the public domain,” says Joel Beetsch, VP, patient advocacy, at Celgene. “We’ve contributed a prostate cancer data set and we’re putting in breast and lung cancer data sets in the next month or so, with some hematology data to follow.”
Peter Doshi, an assistant professor at the University of Maryland’s School of Pharmacy, and an associate editor at BMJ, agrees that comparator arm data is valuable for cancer R&D going forward, but says Project DataSphere doesn’t address “the historical problem of lack of access to data to understand drugs that are in use today…in that sense it’s not addressing the aims of petitions like AllTrials, and the call to release all trial data.”
Will drug developers be able to harness aggregated comparator data from Project DataSphere to build “in silico” comparator arms for new active trials, and then use that data as part of a new drug submission? That’s an open question, but Beetsch and Simantov both spoke optimistically about the possibility.
To protect patient privacy, and to avoid violating HIPPA in the US and other privacy laws globally, data contributed to Project DataSphere must be rigorously de-identified. Organizations are responsible for de-identifying their own data prior to sharing it; variance across data sets is inevitable, but enforcing a single de-identification standard would potentially strip out useful information, says Beetsch. “We want to make sure we don’t reduce the data to its least common denominator.”
By now, most multi-national pharmaceutical companies have at least some experience with de-identifying internal clinical data for various research purposes, but it’s not always a cut-and-dried process. Bradley Malin, a biomedical informatics and computer science professor at Vanderbilt University and an independent de-identification consultant for pharma companies working with Project DataSphere, authored an informative de-identification strategy paper for organizations grappling with the legal implications of de-identification. “It would be nice to have a single standard to facilitate integration [across data sets], but there’s a lot of diversity in the types of data being collected, and the context in which it’s being collected,” says Malin. “Does that prevent the ability to combine data sets? Not necessarily, but it may end up implying that the level at which data gets integrated is done at a higher level of granularity that what you might have [with a single data set] in its most specific form.”
Models for data-sharing
Charles Hugh-Jones, Sanofi’s chief medical officer for the US and one of the most outspoken advocates of the project, says there are three basic models for sharing data:
- Black box model: Researchers submit a research question, and get – or don’t get – an answer. This is the most common and the most restrictive model.
- Gatekeeper: Espoused by the PhRMA/EFPIA principles for sharing data. Less restrictive than the black box model, but still requires research credentials and an acceptable research proposal for access to data.
- Broad access: This is the Project DataSphere model. Open access to almost everyone, but data is often limited as a result (comparator arm data only in Project DataSphere).
All of these models are useful in different ways, since “not all data is the same, and not all sharing is the same,” says Hugh-Jones. “We have to help the general public understand that that there are many flavors of data sharing…an ecosystem of data sharing rather than one place where all data is shared.”
Just as there are thousands of different cancer types, ranging from quite benign to extremely malignant, and different ways to treat them, there are also different sharing projects, with different aims, says Hugh-Jones. “There will be all sorts of models that hopefully will synergize to allow people to do different things.” The second phase of Project DataSphere will include research challenges, the first being a prostate cancer challenge in collaboration with the Prostate Cancer Foundation, says Hugh-Jones. There is also a social media component in the works, that would allow researchers to discuss their projects on the Project Datasphere website. “My belief is that [Project DataSphere] will be maximally successful once we start to build in the social overlay, because then everyone – payers, researchers, companies, academia and others – can start talking to each other,” says Hugh-Jones.
Other data-sharing projects
In addition to workshops the Institute of Medicine has convened on data-sharing, and the PhRMA/EFPIA data-sharing principles referenced above, many institutional and patient projects have surfaced in recent years. These include but certainly aren’t limited to Harvard’s Personal Genome Project, AHRQ’s Healthcare Cost and Utilization project, NIH’s Big Data to Knowledge (BD2K) project, PatientsLikeMe, TransCelerate, the Clinical Data Interchange Standards Consortium (CDISC), the Immune Tolerance Network, ClinicalStudyDataRequest.com, and the Yale Open Data Access (YODA).
Last January, Janssen Research and Development (a J&J subsidiary) partnered with YODA [insert Star Wars joke here] in what has been described as a “pioneering partnership model for sharing clinical data.” The project is distinct in that it “gives the decision-making power over access to data to the Yale group, rather than it being held by J&J,” says Doshi. “J&J is going quite a bit further, it seems, in its pledges to give independent investigators access to both arms of data for trials it has sponsored.”
Doshi says what’s often overlooked in the charge on pharma’s data vaults is the data that’s housed at FDA. “They singularly hold the most data across companies, across therapeutic areas, across time. EMA is releasing data, but why is FDA not similarly opening up its archives to the public?” asks Doshi. “I think the status quo for them is a more comfortable place to be than a revolution in access to data.”