The BioStrand Interviews: Meet Sébastien Lemal

BioStrand

01.19.2022

Advancing our understanding of biology is the only way we will boost developments in biotechnology and discover the drivers for diseases we aim to treat, strive to prevent, and hope, one day, to cure.

Here, at BioStrand, we are on a mission to create a truly effective, powerful, and convenient omics data analysis solution that will empower life sciences researchers to revolutionise genetic research, ramp up the speed and effectiveness of R&D lifecycles, and bring personalised treatments and precision medicine to the next level.

We are blessed with a truly talented and innovative team of researchers, developers, engineers, and data scientists that have not only made this solution a reality, but work tirelessly every day to continuously enhance our technology, improve its speed, accuracy, and ease of use, and develop valuable new functionalities for you, our users.

In this new series, we’re excited to introduce you to each member of our esteemed team so you can learn more about what projects they’re working on, and what tools, technologies, and techniques they use to bring the BioStrand platform to life.

We start with Sébastien Lemal, PhD. Sébastien has been with us since June 2020, continuing his career as a Data Scientist from his previous research positions at the University of Liège. He is responsible for the whole development process – from whiteboard to production-ready models.

Interview with Sébastien

What are you currently working on?

I am currently involved with the integration of BioStrand’s HYFT^TM technology to our state-of-the-art workflow for omics analysis. The HYFT^TM technology is currently implemented within the BioStrand Omics Parser, which extracts HYFT^TM patterns from biological sequences. For people familiar with Natural Language Processing (NLP), this can be seen as an elaborate way to tokenize biological sequences.

In the same way raw text can be tokenized as information-encompassing entities, the HYFT^TM patterns are the tokens, which are small snippets of sequences which carry information. These HYFT^TM patterns can be the basis for multiple down-stream task, such as indexing, clustering, mapping, etc.

I am also working on the integration of textual data to the BioStrand platform. More specifically, I am developing a tool – an NLP solution – to perform enrichment of differential gene expression data derived from scientific literature. The objective is to significantly speed up research on biological pathways involved with certain diseases and treatments, and help researchers formulate their hypotheses.

_DSF9354

What project are you most proud of and why?

Our NLP solution started as a simple use case to exploit, conjointly, BioStrand’s text analytics platform and omics platform. In the primitive step of the project, it was difficult to assess how this would fit in the market.

But as I discussed the matter with clients and collaborators, it turns out that there’s not only a need for having an integrative environment for textual data and omics data of any kind, but many potential applications, such as enrichment, automatic annotations, cross-mapping through different datasets, and more.

The scope of this project alone makes it very exciting, as well as the opportunity to make an impact on the market.

Are you reading any interesting technical papers we should know about?

Given the nature of BioStrand’s technology, I am quite attentive to the progress done in the field of bioinformatics and NLP. Recently, NLP has seen a breakthrough with the advent of deep learning transformer architectures¹, such as Google’s BERT², which significantly improved NLP tasks, including Named Entity Recognition, Relation Extraction, Question Answering, etc.

More spectacularly in my opinion, transformers and their Attention Mechanisms are at the heart of a significant breakthrough in molecular biology, with their implementation in DeepMind’s AlphaFold2 model, which is able to predict the 3D structures of proteins just from their amino-acid sequence³ . Finally, considering my work at BioStrand, I had great interest with this review⁴of the intersection of NLP, deep learning-based methods, and protein structure.

It’s a great read for anybody who wants to have some high-level understanding of the progress made and the challenges encountered in the field over recent years.

What software or tools do you use every day?

As a Data Scientist, my go-to tool to process data is the Python programming language, and the variety of libraries available (pandas, spark, etc.). For visualisation, I particularly enjoy Plotly and sub-libraries for interactive plots, as well as Seaborn for static ones.

To help in dealing with different projects, I like to code within notebooks in the Jupyter Lab environment. It is rudimentary and lacks the sophistication of PyCharm, but I prefer its sobriety. I also use Gephi or Cytoscape to visualise network-based data, typically after some processing, as data tends to be stored within relational tables (as in Microsoft Excel).

_DSF9360

What skills have you developed throughout your career that you think apply beyond work?

In my opinion, patience is a soft skill that is hard to acquire yet has significant purpose – both in work and in life. When I start working on a project, I never know if it is going to lead to something good or not, and how much time it’s going to take before coming to fruition. For some people, this can be frustrating, and they would prefer working on shorter-term projects.

However, if I want to work toward a greater objective, it is necessary to take both the risks and the time associated with the work necessary to achieve it. Hence, lack of patience so often leads to disappointment.

👏 Photo credit: Georgios Triantopoulos

¹ Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

² Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

³ Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596 (7873), 583-589.

⁴ Ofer, D., Brandes, N., & Linial, M. (2021). The language of proteins: NLP, machine learning & protein sequences. Computational and Structural Biotechnology Journal.

Tags: BioStrand Company, NLP, Company culture, Employee spotlight

Related Blogs

Multimodal language models in protein engineering: Functional clonotyping & beyond

Biomedical knowledge graphs and the power of ontology

eBook

Download the HYFTs^® — Connecting the Dots & Databases eBook to see how to solve the unique data challenges in biotherapeutics

Back to Blogs

Let’s accelerate change. Partner with us.

Powering
Biotherapeutic
Intelligence™

In Silico Discovery

Powering
Biotherapeutic
Intelligence™

Insight Hub

Powering
Biotherapeutic
Intelligence™

Company

News & Events

The BioStrand Interviews: Meet Sébastien Lemal

Interview with Sébastien

What are you currently working on?

What project are you most proud of and why?

Are you reading any interesting technical papers we should know about?

What software or tools do you use every day?

What skills have you developed throughout your career that you think apply beyond work?

Multimodal language models in protein engineering: Functional clonotyping & beyond

Biomedical knowledge graphs and the power of ontology

Download the HYFTs^® — Connecting the Dots & Databases eBook to see how to solve the unique data challenges in biotherapeutics

Subscribe to our blog:

Sign-up for latest news, blogs and more

PoweringBiotherapeuticIntelligence™

In Silico Discovery

PoweringBiotherapeuticIntelligence™

Insight Hub

PoweringBiotherapeuticIntelligence™

Company

News & Events

The BioStrand Interviews: Meet Sébastien Lemal

Interview with Sébastien

What are you currently working on?

What project are you most proud of and why?

Are you reading any interesting technical papers we should know about?

What software or tools do you use every day?

What skills have you developed throughout your career that you think apply beyond work?

Multimodal language models in protein engineering: Functional clonotyping & beyond

Biomedical knowledge graphs and the power of ontology

Download the HYFTs® — Connecting the Dots & Databases eBook to see how to solve the unique data challenges in biotherapeutics

Subscribe to our blog:

Sign-up for latest news, blogs and more

Powering
Biotherapeutic
Intelligence™

Powering
Biotherapeutic
Intelligence™

Powering
Biotherapeutic
Intelligence™

Download the HYFTs^® — Connecting the Dots & Databases eBook to see how to solve the unique data challenges in biotherapeutics