Sequencing and spectrometry techniques are advancing at a rapid pace, providing more and more insight into genomes, transcriptomes and proteomes, but not all problems can be solved by looking at a single layer of biomolecules. Many of the remaining challenges in life sciences demand insight into complex disease etiologies that exist at all levels of a given biological system – gene, transcript, protein and metabolites - a daunting example of which is found in cancer. For solving this problem, vertical integration of multiple layers of omics data is essential, as it can lead to prediction of new biomarkers, identification of new disease subtypes and can generally deepen our understanding of disease biology.
The advent of omics technologies presented a leap forward in life sciences. First, genomics, powered by breakthroughs in sequencing technology, allowed scientists look at the genetic code in its entirety, instead of focusing on single loci. Transcriptomics applied the same next generation sequencing technologies to RNA and added a new layer of information on gene activity.
Finally, proteomics, metabolomics and interactomics relate to pharmaceutically relevant molecules and interactions. As sequencing and mass spectrometry techniques become cheaper and faster, the real challenge shifts from the generation of data in specific omics fields to integrating those data into results that are clinically or pharmaceutically relevant.
While many uses for single omics techniques can be envisioned, its main use has been as a hypothesis-building tool:
By having access to the genome, transcriptome or proteome as a whole, scientists are able to take an unbiased approach. They can let the data do the talking, which eliminates confirmation bias and regularly opens up unexpected avenues of research.
For example, scientists using transgene mice as an animal disease model may find that previously unknown processes are involved when taking a proteomics approach. Such discoveries would have been impossible with a classical biochemical approach that focusses on a few predetermined readout targets, such as qPCR measurements, immunostaining or functional assays.
The last word has hardly been spoken on single omics techniques, and advances are still being made every day in each separate omics field, but a new challenge has emerged simply from having all these different types of data available. That challenge is to make data from all different omics techniques, which are very different in their nature but still describe a single biological system, come together.
When operating next to each other, without interacting, these techniques are lacking a vital dimension containing a bulk of untapped resources, targets and perhaps, solutions to many standing problems. Integrating data from separate omics techniques performed on the same cohort of samples is often referred to as vertical data integration, and it is not an easy task.
Multi-omics experiments face a challenge inherent in their experimental design: how to integrate data of different natures. DNA may be transcribed into RNA, which is translated into proteins, which then influences metabolism, but the relationships are a far cry from the simple “central dogma of molecular biology” formulated by Francis Crick in the 1950s.
Genes regularly code for more than one transcript, and proteins can undergo a range of posttranscriptional processes before they become active. The challenge becomes exponential for metabolites, which are often the result of long chains of biochemical modifications. Adding feedback layers such as transcriptional activators or repressors, which are themselves proteins interacting with other proteins, does not make things easier.
These complex networks however are essential in uncovering genotype-to-phenotype relationships and understanding a biological system as a whole.
The advantage of multi-domain analysis may be not be immediately obvious, as single-layer techniques have been adequate in leading to breakthroughs in many fields in life sciences. One might say, however, somewhat provocatively, that these breakthroughs were low-hanging fruit, and truly complex questions, many of them in research areas dealing with multifaceted pathologies such as cancer, remain unanswered. Meanwhile, the need for effective therapies and biomarkers for these diseases is ever growing.
The inability of single-layer omics studies to deal with the most challenging questions in, for example, cancer research, is caused precisely by the flat structure of the data: many causative changes in the genome (e.g. copy number variations, single nucleotide variations or methylation status) may only become identifiable when additional layers with information on gene binding activity, transcription and translation are overlaid in a vertically integrated structure.
For example, overlaying RNA-Seq and ChIP-Seq data can link gene-binding activity to transcriptional changes, adding a layer of functional insight to otherwise exploratory data. Similarly, integrating genomics, transcriptomics, and genomics can link copy number variations or single nucleotide variations in the genome directly to functional targets that can be explored for therapies.
Multi-domain analysis can serve three ultimate goals:
While tools for analysing multi-omics datasets are emerging at an encouraging pace, data analysis still requires extensive manual optimization and costly time from research experts trained specifically for that task. For a comprehensive and up-to-date list of common multi-omics tools, visit Bioconductor, where searching for “multi-omics” currently returns 7401 results.
For a selected list of multi-omics data integration tools, grouped by approach and biological purpose, see the review article by Subramanian et al. at the end of this article . Very few of these tools however provide a true all-in-one package and in fact, the best results are often achieved with a tailored approach carried out by an experienced bioinformatician.
Strategies for vertical data integration can be classified in a number of ways, for example, by biological application (subtyping, disease etiology, or biomarker prediction), analysis method (supervised, unsupervised, or semi-supervised), integration scheme (parallel integration versus hierarchical integration) or variable selection method (penalization versus Bayesian variable selection). For an useful overview of data integration strategies categorised based on variable selection method, see the review article by Wu et al. below .
Multi-omics strategies and tools can only become useful when there are standardised methods for collecting and sharing data. One of the most commonly used data depositories for multi-omics data is TCGA (The Cancer Genome Atlas, https://cancergenome.nih.gov/ ) which curates RNA-Seq, DNA-Seq, MiRNA-Seq, SNV, CNV, DNA methylation, and RPPA (reverse phase protein assay) datasets. One multi-omics depositary that also curates non-cancer data is Omics Discovery Index (https://omicsdi.org), where genomics, transcriptomics, proteomics, and metabolomics data are available.
Even with the right tools in hand, any multi-omics project runs the risk of becoming carried away by its own data-richness. There is no doubt that an integrated dataset of cancer cells containing RNA-seq, proteomics, and mutation info can lead to impressive heatmaps, stunning p-values, or complicated-looking graphs. However, data does not have to look impressive but has to lead to meaningful results.
Without a consistent workflow and predefined hypotheses (which can be generated from single-omics experiments), finding relevance in the result of an integrative analysis can be difficult or even impossible. The original goal of an unbiased, agnostic approach to data analysis can only be carried out when robust standard practices are available, which is difficult when many different tools and methods compete for attention.
Especially for non-experts or first-time multi-omics analysts, the challenge of finding the right tools for the job is great, and a sub-optimal decision may only reveal itself a long way down the line. The need for an innovative all-in-one multi-omics tool that is usable by non-experts is therefore high.
A technological revolution does not occur when new technology becomes invented, but when it becomes accessible.
The field of high-content omics studies has come a long way in the past 30 years, from the start of the human genome project in 1990 to the advent of low-cost high-throughput sequencing technologies, to the challenge of vertically integrating multi-level omics data. This challenge however must be overcome, and many tools are already available for analysing multi-domain data. However, a true revolution in the world of multi-domain analysis in life sciences will only come about when those analyses can be performed in a rapid, effective and accessible manner.
Picture source: AdobeStock © Sergey Nivens 311300857