Omics data related to the genome, proteome and transcriptome is the very lifeblood of Life Science research. And indeed, since all genomes are capable of expression as pure data, analytics and data management form the beating heart of the practice of genomics – the breakthroughs, discoveries and innovations in the Biopharma, Life Science and healthcare industries.
However, at present, omics data analysis is massively complex and punishingly laborious, involving techniques that demand enormous amounts of computing power and data storage capacity, and often require the skills of specialist practitioners to perform. There are myriad challenges involved with currently available methods for omics data analysis – data silos, fragmented data types, lack of adequate data integration to enable unified big picture insights, and a disparate tool environment filled with solutions that, frankly, have significant limitations in terms of speed, accuracy, and scalability.
The need of the hour, therefore, is for a bleeding-edge solution capable of coping with the dynamics of integrative analysis of omics big data.
The Retrieve & Relate (R&R) research platform developed by Belgian startup BioStrand is breaking new ground with its revolutionary approach to omics database access, integration and data analysis.
BioStrand has developed a technique for indexing cellular blueprints and building blocks, resulting in an entirely new and more efficient way of handling the large data sets involved in Life Science. BioStrand’s Retrieve & Relate (R&R) platform reduces the complexity of analysis and improves ease of use by decreasing the number of steps required, enabling the fast and accurate detection and study of similar sequences, multiple sequence alignment, and annotation through a simple, one-source interface.
The platform is available to subscribers in three flavours – R&R, R&R+, and R&R² – each of which imparts outstanding benefits to the user, in terms of convenience, scalability, and integration.
Life Sciences research imposes a unique set of obstacles to data entry and extraction, as it incorporates data and techniques from thousands of projects going back for decades. A large number of databases exist which contain a heterogeneous mix of information from different origins, which vary significantly in size and in quality. Traditionally, dataset comparison is a disjointed process that can require significant extra work.
R&R comes pre-loaded with 350 million sequences spanning 11 of the most popular publicly available databases. The platform is continuously updated and, with sequence analysis conducted through a single “Google style” search bar, you can search simultaneously through the public databases and enjoy a quick and convenient way of searching, relating, and annotating sequences. In addition, you can search through the sequences incorporated in patent databases, and establish at the start of your research whether you are working with a sequence that is already patented for a certain application.
Scaling is possible for millions of sequences through the R&R platform, thanks to a biological discovery called HYFT™ patterns – signature sequences in DNA, RNA and AA that serve as biological fingerprints. At BioStrand we have detected, precomputed and indexed more than half a billion such patterns. Using indexing principles rather than letter-by-letter sequence matching, you can look for alignment, similarities and differences in sequences, and retrieve results almost instantaneously.
One of the keys to unlocking the vast potential of genetic information lies in the integrated analysis of DNA, RNA and proteins. R&R makes this possible via HYFTs™, which serve as structural anchor points that carry a multitude of information layers.
In constructing R&R+, BioStrand has overcome the scalability issues typically associated with existing omics big data analysis, with a container-based architecture that can scale automatically as required, improving business efficiency.
To handle the vast data volumes generated by genomics, BioStrand has constructed a SaaS platform that will eventually be capable of indexing hundreds of petabytes of data, handling its normalisation, storage, analysis, cross-comparison, and presentation. To this end, BioStrand has already normalised and indexed publicly available data sources.
By matching against 660 million HYFT™ patterns, BioStrand can index a million sequences with an average length of 320 characters, in about three minutes.
For R&R+ users, this means that you can conveniently match, expand, and search all of your omics data. R&R+ integrates your own proprietary information by default, so you can search through the combination of all public data and your own data, and enjoy the ease and convenience of combining those with the pre-indexed public knowledge bases to enrich your work.
R&R+ expands the knowledge base with which you can compare new or unknown sequences. And all your research data can be made actionable and insightful without you having to worry about scalability and exponentially increasing compute costs. You pay only for the storage of your data – and on the R&R+ platform, secure storage is guaranteed.
Normalisation to a common format prior to indexing is a major part of the early work in omics data formatting, and BioStrand is developing what could become a new universal schema for storing omics big data and metadata for use across multiple platforms, techniques, and tools.
If you’ve already developed a robust pipeline or are operating in an enterprise ecosystem, R&R² offers hyper scalability, and the handling of massive datasets or batch processes. The platform provides full integration with your existing infrastructure, as it’s tailored for implementation directly within your existing ecosystem through Application Programming Interface (API) or Software Development Kit (SDK) connections.
As part of its self-service SaaS platform, BioStrand is constructing a set of services that will eventually incorporate its own and customer machine learning and AI tools, and let R&R² customers work on their own and third-party datasets in a secure environment.
And, again, as an R&R² customer you are enjoying BioStrand’s unprecedented process times without having to worry about exponentially increasing compute costs when data size is increasing. R&R 2 handles all the normalisation, analysis, presentation, and cross-comparison.
The volume, complexity, diversity and siloed nature of omics has historically been seen as a reality that has to be accommodated rather than addressed. But as we have demonstrated, it is possible, with a little bit of technology, ingenuity and innovation, to make omics data research less about the data and more about the research. At BioStrand, we focus on making all omics data computable out-of-the-box so that you can focus on research, analysis, insights and breakthroughs.