Imagine a scenario wherein the oil & gas industry has honed the science of oil prospecting and exploration – perhaps the most challenging and speculative phases of the fossil fuel value chain – to such a degree of sophistication as to map commercially viable fuel fields almost at will and with 100% accuracy. Now, can you imagine the oil industry deploying drilling and refining technologies from the 70s and 80s to realise value from this revolutionary new development?
What’s hard to imagine for the oil & gas industry is practically the reality when it comes to genomics.
If data is the new oil, then genomics is currently facing a gusher.
Innovations in upstream genomics processes have substantially reduced the time and cost of genomic sequencing. The project to sequence the human genome was completed in 2003, well ahead of schedule and below cost estimates. Since then, researchers have been expanding this data trove with complete genome sequences of hundreds of animal and plant species, with the list expanding almost daily.
However, the value of all this data lies almost entirely in the sophistication and capabilities of the processes that follow.
Take comparative genomics as an example – a key research process that focuses on identifying and understanding the similarities and differences between different genomic sequences. This seemingly simple task of aligning sequences – to study structure-function relationships between DNA, RNA and amino acids, identify conserved domains in proteins of interest, build evolutionary trees, etc. – is currently a huge computational challenge in omics data analysis.
Even as the volume of genomics data explodes, this foundational process of mapping new genome sequences and subsequences – comprising thousands upon thousands of characters – against existing public and enterprise databases, still relies on methodologies like dynamic programming algorithms and heuristic algorithms developed in the 70s and the 80s. As effective, productive and efficient as these approaches have been in the past, they are simply not capable of coping with the demands of the Big Data era in terms of speed, scalability or accuracy.
Then there is the complexity and resource intensity of these methods, requiring specialised bioinformaticians grappling with a multiplicity of tool environments to build complex pipelines to deal with diverse data types across each siloed database.
Today, the rate of generation of genomic sequence data far outstrips the rate at which researchers can analyse and extract value from the data.
Surely there has to be a better way to analyse genomic data in the Big Data age.
BioStrand is the first company since the 1980s to come up with a completely new, out-of-the-box solution for genetic data analysis. We have designed a Google for genomics that provides researchers with a simple and familiar search interface to search – using either sequences or text – through all available omics sequence data including public, proprietary and the sequences in patent databases. With just one click, our Retrieve & Relate (R&R) omics research platform returns all relevant matches organised by DNA, RNA and Amino Acids. Delivered as a SaaS, the R&R platform is embedded with sophisticated functionalities and powerful visual applications that enable users to define and follow pathways that are most relevant to their research.
We’ve built all that technological innovation on a proprietary and pathbreaking biological discovery called HYFT™. The result is a much better, faster, more accurate, reliable and scalable way to search and analyse multi-omics data.
HYFT™ patterns are signature sequences in DNA, RNA and AA that serve as biological fingerprints. Each HYFT™ comprises several layers of information, relating to function, structure, position etc., that together create a multilevel information network to significantly enhance and refine sequence analysis.
We first precomputed and indexed all the information stored in each of these layers and across millions of HYFT™ patterns into a proprietary knowledge database, with over 350 million sequences, that is the data engine powering all the capabilities of our R&R platform. All publicly available omics-related sequence databases and sequences in patent databases are also organised as per HYFT™ principles and are automatically reviewed and updated as they evolve. We then augmented the searchability of all this data with advanced indexing and exact matching technologies à la Google to make search simple, scalable, accurate, and fast.
Every time a sequence is input into the R&R search bar, our technology quickly indexes HYFT™ patterns in the search sequences, compares these patterns to those in the databases and retrieves only those results that are an exact match to the input sequence. It currently takes around 3 seconds to extract perfect sequence and subsequence matches, sorted by DNA, RNA and AA, from a database of 350 million sequences.
For a long time now, technology has been the primary determinant to the progression of omics research. If you are a skilled bioinformatician, then you can probably take the most cumbersome set of technologies and eventually plot a pathway to new insight, knowledge and value. For researchers without comparable technical skills, however, the scope of their research is for all intents and purposes determined by the limitations of the technology they have at their disposal.
At BioStrand, we fully believe that researchers – be they expert bioinformaticians or enthusiastic geneticists – should be in complete control of the ambition and scope of their research progress. Technology should remain in the background – yet empower researchers to investigate unconventional hypotheses, obscure pathways and/or personal intuitions.
Our mission is to provide you with intuitive, powerful, versatile and multidimensional tools that make your analysis and research more productive, effective and consequential. You start with a search query of your choice — text or sequence — and then use R&R’s comprehensive set of tools and features to sort, group, filter, exclude/include parameters and drill down to the details that are most appropriate for the research.
Want to start with identifying matching sequences for a protein annotation and then drill down to results based on gene ontology definitions? You can.
Want to locate aligned sequences for an RNA query and discover novelty functional relationships in associated amino acids? No problem.
Rather start with a broad, text-based search and determine the progression based on the resultant biological sequences? You got it.
Just want to understand patent activities related to a particular disease or species in order to identify opportunities for new research? Just type and search.
Combining the power of HYFT™, indexing, exact matching and our proprietary algorithms, Retrieve & Relate is a truly user-centric, multilevel omics research platform that enables you to home in on the knowledge and insights that are most relevant to your ambitions. We believe that we have built a revolutionary platform with the potential to disrupt genomics and yet be as simple to use as Google search.
We have also created a detailed walkthrough on the power, flexibility and versatility of the BioStrand R&R solution. And you can take the Google of genetics for a spin here.