In our previous blog post – ‘The Imperative for Bioinformatics-as-a-Service’ – we addressed the issue of the profusion of choice in computational solutions in the fields of bioinformatics research. Traditionally, there has been a systemic, acute, and documented dearth of off-the-shelf technological solutions designed specifically for the scientific research community. In bioinformatics and omics research, this has translated into the necessity for users to invent their own system configurations, data pipelines, and workflows that best suit their research objectives.
The output of this years-long DIY movement has now generated a rich corpus of specialised bioinformatics tools and databases that are now available to the next generation of bioinformaticians to broker, adapt, and chain into a sequence of point solutions.
On the one hand, next-generation high throughput sequencing technologies are churning out genomics data more quickly, accurately, and cost-effectively than ever before. On the other, the pronounced lack of next-generation high throughput sequence analysis technologies still requires researchers to build or broker their own computational solutions that are capable of coping with the volume and complexity of digital age genomics big data. As a result, bioinformatics workflows are becoming longer, toolchains have grown more complex, and the number of software tools, programming interfaces, and libraries that have to be integrated has multiplied.
Even as cloud-based frameworks like SaaS become the default software delivery model across every industry, bioinformatics and omics research remain stranded in this DIY status. The industry urgently needs to shift to a cloud-based as-a-service paradigm that will enable more focused, efficient, and productive use of research talents for data-driven omics innovation and insights, instead of grappling with improvisation and implementation.
Even as the cloud has evolved into the de-facto platform for advanced analytics, the long-running theme of enabling self-service analytics for non-technical users and citizen data scientists has undergone a radical reinterpretation. For instance, predefined dashboards that support intuitive data manipulation and exploration have become a key differentiating factor for solutions in the marketplace.
However, according to Gartner’s top ten data and analytics technology trends for 2021, dashboards will have to be supplemented with more intelligent capabilities in order to extend analytical power – that thus far was only available to specialist data scientists and analysts –to non-technical augmented consumers. These augmented analytics solutions enable AI/ML-powered automation across the entire data science process – from data preparation to insight generation – and feature natural language interfaces for NLP/NLG technologies to simplify how augmented consumers query and consume their insights and democratize the development, management, and deployment of AI/ML models.
Specialized bioinformatics-as-a-service platforms need to adopt a similar development trajectory. The focus has to be on completely eliminating the tedium of wrangling with disparate technologies, tools, and interfaces, and empowering a new generation of augmented bioinformaticians to focus on their core research.
A single human genome sequence contains about 200 gigabytes of data. As genome sequencing becomes more affordable, data from the human genome alone is expected to add up to over 40 exabytes by 2025. This is not a scale that a motley assortment of technologies and tools can accommodate.
In comparison, bioinformatics-as-a-solution platforms are designed with these data volumes in mind. A robust and scalable SaaS platform is built to effortlessly handle the normalization, storage, analysis, cross-comparison, and presentation of petabytes of genomics data. For instance, our Retrieve & Relate (R&R) platform utilises a container-based architecture to auto-scale seamlessly to handle over 200 petabytes of data with zero on-ramping issues.
And scalability is not just about capacity. SaaS platforms also offer high vertical scalability in terms of services and features that researchers need to access. All R&R users have a simple “Google-style” search bar access to 350 million sequences spanning 11 of the most popular publicly available databases, as well as to in-built tools for sequence analysis, multiple sequence alignment, and protein domain analysis. R&R also comes in three different flavours, each with additional features such as a one-click integration with proprietary data (R&R+) and with existing research infrastructure (R&R2).
Over and above all this, SaaS solutions no longer restrict research to the lab environment. Researchers can now access powerful and comprehensive bioinformatics-as-a-service via laptops – or even their smartphones if mobile-first turns out to be the next big SaaS trend – in the comfort of their own homes or their favourite coffee shop.
Bioinformatics has typically involved a trade-off between speed and accuracy. In some cases, methodologies make reductive assumptions about the data to deliver quicker results, while in others the error rate may increase proportionally to the complexity of a query. In multi-tool research environments, the end result is a discrete sum of the results received from each module in the sequence. This means that errors generated in one process are neither flagged nor addressed in subsequent stages, leading to an accumulation of errors in the final analysis.
A truly integrated multi-level solution consolidates disparate stages of conventional bioinformatics and omics data analysis into one seamlessly integrated platform that facilitates in-depth data exploration, maximizes researchers’ view of their data, and accelerates time-to-insight without compromising on speed or accuracy.
With a SaaS solution, end-users no longer need to worry about updates, patch management, and upgrades.
With vertical SaaS solutions, such as bioinformatics-as-a-service, continuous innovation becomes a priority to sustain vertical growth in a narrow market. For users, this translates into more frequent rollouts of new features and capabilities based on user feedback to address real pain points in the industry.
For instance, in just a few months since the official launch of R&R, we have added new capabilities for SDK/API-based integrations for proprietary data and infrastructure, expanded our tools and expertise to assay design, drug development, gene therapy, crop protection products, and biomarkers, and we are building out an AI platform with state-of-the-art graph-based data mining to discover and synthesise knowledge out of a multitude of information sources.
SaaS is currently the largest segment in the public cloud services market – and yet the segment’s footprint in bioinformatics is virtually non-existent. Today, there are a few cloud-based technologies targeted at genomic applications that focus on specific workflows like sequence alignment, short read mapping, SNP identification, etc. However, what the industry really needs is a cloud-based end-to-end bioinformatics-as-a-service solution that abstracts all the technological complexity to deliver simple yet powerful tools for bioinformaticians and omics researchers.