In this interview, we speak to Dr. Maria Chatzou Dunford, CEO of Lifebit, about their technology, how it helps to empower people to make better use of their biomedical data, and what the future of technology in life sciences may look like.
Please can you introduce yourself and tell us about your role at Lifebit?
My name is Dr. Maria Chatzou Dunford, and I am the CEO of Lifebit. My background includes bioinformatics, medical informatics, and high-performance computing. It was early on in my career that I discovered the bottleneck for discovering new drugs, therapies, and treatments, which was the translation of biomedical research. It was from this personal experience that I obtained a Ph.D. in biomedicine from Pompeu Fabra University.
Prior to Lifebit, I worked on solving the problem of scaling reproducible data analysis over any system at the Centre for Genomic Regulation (CRG) in Barcelona. It was there we founded the now widely used open-source workflow language, Nextflow, which now powers the majority of organizations performing production-scale genomics analysis
At Lifebit my role is to continue to pursue big data analysis of genomic markers and the breaking down of silos of genomic data.
At Lifebit, you help to empower people to make better use of their biomedical data. Can you tell us more about Lifebit and what some of your aims and missions are?
At Lifebit, we have created a patented, federated technology that brings analysis and computation to where data resides. The generation of large amounts of biomedical data has become relatively straightforward; the challenge is how organizations can access and utilize biomedical data across thousands of disconnected locations.
With our technology researchers can run analyses on multiple, distributed datasets in-situ, avoiding the risky movement of highly sensitive data. We empower therapeutic leaders and pioneers in precision medicine to access and analyze siloed biomedical data that otherwise would never have been available to them.
Allowing research to be done while data is still under the control of organizations that gathered it initially reduces the legitimate security concerns of patient data confidentiality and security. This method allows researchers to gain faster data insights by linking to other data sources virtually.
At the end of the day, our mission at Lifebit is to create a world where access to biomedical data will never again be an obstacle for curing diseases.
Image Credit: Dusan Petkovic/Shutterstock.com
Statistics show that for every person in the world, 270GB of data will be created. What are some of the biggest challenges currently faced by scientists when trying to access and analyze this data? Why is it therefore so important to have tools available that make this process easier?
The biggest challenge that scientists currently face when trying to access and analyze data is handling the volume of data itself. Current technology used for data accessibility is no longer sustainable given that datasets have exponentially grown in size and complexity.
Recent studies have shown that by 2025, an estimated more than 500M human genomes will be sequenced in a clinical environment. So, what does this mean? Traditional methods of moving sensitive data into a centralized environment, similar to a “data lake,” are not keeping up with demand. Not to mention, the traditional methods take a lot of time and money.
Another challenge that organizations face is the strict regulatory and data privacy frameworks that differ from each country, which results in data not being able to leave countries. Consequently, data becomes unused and siloed. In the last couple of years, however, Genomics has entered the 2.0 era and with that, new innovative technologies for data accessibility are being used and practiced.
Federated data access allows users to run analyses over multiple sets of disparate data as if they were coming from one location. By doing this, there is no issue of cross-country data movement and ensures patient data security, as data never leaves the organization that obtained it.
At Lifebit, you have your Lifebit CloudOS platform. Can you tell us more about how this platform works and what types of data it is compatible with?
At Lifebit, our CloudOS platform provides an end-to-end solution for large-scale population genomic programs. Our platform is compatible with a wide range of health data - genomic, multi-omic, clinical variables, EHR data, real-world data, and more. In the beginning, the clinico-genomic data is cleaned up because the quality of the data being accessed is very important. From here we set up the infrastructure, which enables research and clinical insights. We guide organizations through the entire process.
We have two key user groups - Data Custodians, those who generate and aggregate the data (i.e. national population genomic initiatives, healthcare providers, and biobanks), and Data Consumers, those who analyze the data for Clinical Diagnosis and R&D purposes (i.e. pharmaceutical companies and public/private research organizations).
We guide Data Custodians to achieve ultimate usability for their data by deploying our Trusted Research Environment within their infrastructure (i.e. cloud, hybrid, on-premise, etc.). It is there we are able to integrate with hospitals, sequencing centers, and other institutions to abstract, standardize and obtain clinico-genomic linkage for the data. Lifebit utilizes industry-standard genomic and clinical data models such as FHIR, OMOP, and HL7, however, Lifebit CloudOS is data agnostic and the platform benefits from a fully scalable database able to query tens of thousands of clinical phenotypic variables, thousands of annotations, billions of genotypes and tens of millions of individuals in milliseconds.
Image Credit: Lifebit
What are some of the advantages of using the Lifebit CloudOS platform compared to other cloud-based systems available?
One of the biggest advantages of using Lifebit CloudOS compared to other cloud-based systems is that there is no transfer of sensitive biomedical data out of the organization that holds the data. While common with some other cloud-based systems, Lifebit’s technology avoids dangerous data transfers by orchestrating federated data analysis and management directly in the user’s cloud, HPC, or hybrid infrastructure.
Once data is transferred out of the initial environment, the organization no longer has control over its data and lacks appropriate auditing of their data and analyses. With CloudOS, on the other hand, customers are in complete control of their data, how it is handled and are provided with an end-to-end auditing report of what happens to their data.
Other cloud-based systems do not provide companies and research organizations with the flexibility needed in terms of workflows and tools to effectively analyze data, as customers are restricted to whichever pipelines are available in the specific platforms. CloudOS allows customers to easily and seamlessly transfer, integrate and utilize all of the customer’s own or the community’s analytical pipelines in one platform. Doing so accelerates workflows, allows them to protect their valuable IP, access all open-source tools & pipelines.
Other cloud-based systems provide a proprietary cloud platform as a service through a pay-as-you-go model, charging for computation and/or data storage. Lifebit’s CloudOS does not constrain users to a rigid cloud platform. Instead, it gives users the flexibility to have a native operating system that can power any computer, cluster, or cloud system to accelerate genomic data analysis.
Making data more accessible, allows scientists more time to focus on the science itself. Why is this so important for new discoveries?
Increased biomedical data has a tremendous impact on the research being conducted for new discoveries for treatments, therapies, and diagnosing. As previously mentioned, accessing data the traditional way can take an enormous amount of time and money. Researchers need secure, accessible, and collaborative platforms that can comply with new technological advances, regulatory requirements and manage large amounts of data, successfully and securely. Giving scientists access to more robust clinical data sets of cohorts increases data diversity, which can power new discoveries.
Data diversity can also limit researchers’ ability to accurately interpret genetic data. Replicating genetic associations in different cohorts and populations is an important step in validating that the associations are not false positives. Differences in underlying genetic architecture across population groups mean that making estimates on genetic risk for disease based on studies of European populations could result in entirely inaccurate estimates in non-European populations.
The ongoing COVID-19 pandemic has also highlighted the importance of data within research. What involvement did Lifebit have within the pandemic?
Early on in the pandemic, we were selected by Genomics England (GEL) to set up a dedicated cloud-based environment for researchers who were working on COVID-19 vaccines and treatments. Researchers were able to analyze and collaborate over large sets of genomic and medical data within a matter of seconds to answer the most burning questions about the virus. The enhanced technology and automated tools sped up researchers’ understanding of the underlying genetic factors that would explain why some individuals were more susceptible to the virus or had severe and life-threatening symptoms.
The pandemic was a key moment in history, where the research and findings of a new virus impacted the greater human population. Successful milestones were met with increased access to biomedical data during the Pandemic. In fact, later in 2020 work done by GenOMICC and Genomics England demonstrated a major breakthrough for COVID as they identified five biomarkers that could be key to new COVID-19 treatments.
Image Credit: 3DJustincase/Shutterstock.com
Many companies and organizations will require different information from their datasets. How can you tailor your solutions at Lifebit to the customer’s needs and requirements?
The beauty of Lifebit’s CloudOS technology is that we are able to tailor it to the customer’s needs and requirements. For example, we enable end-users to bring any public or private dataset into the platform within a matter of seconds. Users can also bring their own analytical tools and libraries onto the Lifebit CloudOS platform for joint data analysis. In the case of Genomics England, Lifebit helped them to curate and import over 80 proprietary workflows and pipelines into their platform, as well as enabled their industry partners to link their own tools from public and private repositories without exposing IP.
You have recently announced a partnership with Boehringer Ingelheim. Can you tell us more about this partnership and what you are trying to achieve together?
We are very pleased to collaborate with Boehringer Ingelheim. Using Lifebit’s CloudOS technology as the architecture, Boehringer Ingelheim will build a scalable data, analytics, and infrastructure platform that will enable the ability to capture translational disease insights from large external healthcare biobanks. Translational insights derived through Lifebit’s CloudOS allow researchers to “translate '' scientific findings into actual therapeutic interventions for the purposes of improving care.
Maximizing the value and diversity of data among different cohorts and biobanks on a global level will transform R&D entirely. The collaboration will have a powerful and significant impact on the clinical development pipeline across many disease areas.
As technology within science continues to evolve at a rapid pace, what do you see the future of technology within science to look like? Are there any particular trends you foresee?
As technology increases, I believe that there will be an acceleration of diagnosis because of AI and improved shared access to patient data. What is interesting is that it can take up to several years to diagnose a rare disease. When you combine improved access to genomics data with AI, records can be read faster and the speed of patient diagnosing will be shortened. Rather than waiting four years to be diagnosed, just imagine if patients could be diagnosed in a matter of days or weeks.
This acceleration is becoming increasingly relevant as demonstrated in a recent study from Stanford, where researchers were able to diagnose a young patient within a matter of 5 hours and 2 minutes. That would be a feat normally considered unachievable for a genetic diagnosis, which could typically take years. Combining rapid genome sequencing with powerful federated access to distributed datasets promises to be a game-changer for patient care and precision medicine.
What is next for Lifebit?
We are extremely excited to expand our partnerships with other global governments, biobanks, and organizations that will greatly impact the life science community for the better. Federation and AI will support us in our mission by allowing researchers to apply their algorithms and analytics to distributed data sets.
Lifebit’s transformational strategy is uniting the global genomics community in a shared vision for the future that will deliver wide-ranging benefits at a population scale, ensuring patients receive the best possible predictive, preventive, and personalized care by harnessing the potential of large-scale linked clinical and genomic data.
Where can readers find more information?
About Dr. Maria Chatzou Dunford
Dr. Maria Chatzou Dunford holds a Ph.D. in Biomedicine, MSc in Bioinformatics, and BSc in Computer Science and Biomedical Informatics. She is a biotech innovator and a proud geek, with unique expertise in bioinformatics, medical informatics, High-Performance Computing, Machine Learning, and Artificial Intelligence. A passionate entrepreneur, Dr. Chatzou Dunford has founded two companies: Innovation Forum Barcelona and Lifebit, creators of Lifebit CloudOS and AI Engine — intelligent cloud-based genomics technology that enables integration and analysis of diverse data to accelerate insights in a safe, FAIR, reproducible, standardized, and cost-effective way.
As a researcher at the Centre for Genomic Regulation in Barcelona, Dr. Chatzou Dunford designed tools and methods that facilitate the analysis of Big Biomedical Data, enabling biological discoveries and promoting personalized medicine. She was part of the developing team of Nextflow, the programming framework revolutionizing the computational analysis of genomic data.