Bioinformatics Group

The Bioinformatics Group is a multi-disciplinary team of computational biologists, mathematicians, software developers and data scientists. The group has developed a number of bioinformatics tools and web applications to facilitate genomic research, such as , , and .

We run and administer an in-house, high-performance computing (HPC) facility for genome data analysis, including de novo assembly, RNA-Seq, functional annotation and genotyping. Our research spans across a number of applications, from developing genome assemblies for plant crops, fungal and bacterial pathogens, to machine learning and data science for personalised nutrition. Our external collaborators include LSHTM, Rothamsted Research, Unilever, LaFe hospital, Illumina and AstraZeneca.

Our research

Software tools and algorithms

We develop and maintain several software tools and data analysis solutions for plant science, infectious disease, food and plant science, as well as biomedical research, especially when genomic and genetic data is involved.

Learn more about our software and tools below.

: a standalone software for visualising and interacting with optical mapping data and can be used for in-depth exploration of hybrid scaffolding alignments.

Above: Figure 1 - for visualising and processing optical mapping and sequencing data.

: a command-line tool for super-fast querying of a large number of variant calling files.
: an R package for disease-associated variant discovery and annotation.
: a highly customizable genome browser supporting multiple visualization instances.

Above: Figure 2 -

Genome informatics

Together with our colleagues in the Environment and Agrifood Theme, we are applying our expertise in bioinformatics to help solve a range of biological challenges funded by SWAG合集RI, EU and through direct industry funding. The group is leading all sequencing informatics at SWAG合集 and has been involved in a number of Research Council and commercially-funded projects in this research area.

Learn more about our recent work below.

Chilbix

As part of this BBSRC-funded project, we developed a high-quality reference genome for two highly heterozygous, drought and salinity tolerant tomato species, Solanum chilense and Solanum sitiens.

Research team:
Andrew Thompson
Fady Mohareb
Tomasz Kurowski
Corentin Molitor

Adroots

We aim to understand the genetic control of adventitious rooting (AR) and to identify molecular markers and beneficial alleles that will allow AR to be controlled to the benefit of breeding in horticultural crops. AR formation from the hypocotyl is important for rootstock vigour in grafted annual crops like tomato, and AR formation is essential for propagation in all perennial crops.

Research team:
Andrew Thompson
Fady Mohareb
Zoltan Kevei
Tomasz Kurowski

Controlling dormancy and sprouting in potato and onion

We aim to provide a molecular understanding for the process of dormancy break in potato, through the identification of key genes and transcription factors responsible for the process of dormancy break. As part of this project, we developed a high-quality transcriptome assembly for onion and potato tubers during the process of dormancy break and early sprouting.

Read more about our potato transcriptome

Tea transcriptome - industrial partner: Unilever (Colworth)

Tea is one of the most popular beverages in the world. Together with Unilever and the Postharvest Group at Cranfield, we developed the most comprehensive tea transcriptome assembly using a commercial tea variety, as well as performing a global transcriptome profiling on plants treated under different withering processes to provide a differentially expressed gene analysis about the compounds related to tea flavour and aroma, such as flavonoids and caffeine.

Research team:
Professor Leon Terry
Dr Fady Mohareb
Dr M. Carmen Alamar
Dr Emma Collings

Minimally invasive biomarkers for a personalised treatment of neonatal hypoxic-ischemic encephalopathy (in collaboration with La Fe Hospital, Valencia, Spain)

The main objective of this work is to establish whether there is a correlation between metabolic biomarkers of neuronal damage and microRNAs (miRNAs), and the neurodevelopmental outcome of newborns with hypoxic-ischemic encephalopathy (HIE) at 24 months of age, determined by means of standardised scales. This is a clinical, observational study involving samples collected and stored at the Neonatal Research Unit (Health Research Institute, La Fe) in the frame of a multicentric, randomised, blinded placebo-control clinical trial with competitive funding (HYPOTOP study). This study proposes the dynamic monitoring of the effect moderate, whole body hypothermia treatment and the administration of topiramate versus placebo in newborns with HIE.

Research team:
Corentin Molitor
Fady Mohareb

Machine learning and data science

The group has 15 years' experience in machine learning predictive modelling, especially when used in tandem with rapid and non-invasive techniques for profiling food quality, safety and authentication. The advances in this field achieved through Dr Fady Mohareb's research has been internationally recognised through key publications in the field and keynote talks in a number of high-profile international food conferences. Furthermore, the quality, safety and authentication profiling protocols are currently being adapted in a number of commercial food production set ups as a cheaper and more accurate alternative to conventional microbiological techniques.

Symbiosis-EU (Horizon 2020)

Development of an interactive research framework for food quality.

This EU-funded project brought together 14 partners from Europe to study meat safety and quality. The overall aim is to identify and quantitatively evaluate practical and easy-to-use chemical, biochemical and molecular indices and establish their applicability as quality monitors for inspection of meat safety and quality.

We developed the , which allows the users to perform classification and/or regression analysis. For a specific datasets already uploaded into the SorfML database, users select 'regression analysis' and they are able to select the analytical platforms, bacterial growth mediums and machine learning methods to include in their analysis.

Nutrishield (Horizon 2020)

The project’s goal is a mobile and interactive platform for guiding EU citizens towards personalised nutritional plans, to contribute to reducing diet-related health disorders. The project has a duration of 48 months (1 November 2018-31 October 2022) and a total budget of 8.5 million Euros. It brings together 16 leading European research and academic institutions, including SWAG合集, as well as industries and SMEs from the nutritional, medical, biological, IT and instrumentation domains.

Through Nutrishield, we aim to develop an innovative framework to support personalised nutrition based on a comprehensive set of genetic and environmental factors. The approach will be validated through three clinical studies: one for personalised nutrition of young individuals with obesity or/and type 2 diabetes; the second focused on prematurely born infants and their lactating mothers, aiming at augmenting the nutritional value of human milk; the third exploring the relationship between nutrition and cognitive decline in young individuals, in order to bring analytical capabilities to a larger number of practicing physicians.

Through this project, the Bioinformatics Group, led by Dr Mohareb, will be responsible for developing a machine learning-based personalised nutrition algorithm. Using clinical, biochemical and dietary data of children, a set of predictive mathematical models will be developed in order to unravel hidden patterns between the genotypic fingerprint and the observed metabolic profile. The validated models will be then used as a decision support system to guide in the development of an optimum personalised diet for each study group.