Free multi-omics data anyone?

Something that most scientists agree on is that data is good, and more data is better. Over the last few years, it has become apparent that one of the biggest barriers to progress in genomic research has been the inaccessibility of data that has been previously generated. Many journals now insist that for a scientific manuscript to be published, the data it describes should be made publicly available. This is an important cultural shift from the journals that recognises the importance of making data available, however it is often side-stepped by researchers who make the data either ‘available on request’ (good luck with that one), or release the data under ‘managed access’. This essentially means to access the data, you must submit a lengthy application to a data access committee, who decide on a case-by-case basis whether or not to grant access. This creates a lot of work for the researchers who need to fill out these applications, but also those who have to review them – and adds  huge delays to research.

The importance of open data in genomics was recognised most notably by the Human Genome Project and the scientists involved who sequenced the first human genome, completed back in 2003. The project was an enormous undertaking, taking 13 years to complete, and costing nearly £2 billion ($2.7 billion) in total. Once finished, a decision was made which really kick started genetic research, and changed the landscape of science forever – they made the data completely free and openly available to anyone who wanted to use it. This provided a blueprint for all genetic research that has happened since the project, and massively accelerated the field, making it what it is today. It is perhaps surprising that since then, researchers have been somewhat reluctant to make genetic data openly available.

One project which is trying to encourage a change in the culture of science towards open-data is the Personal Genome Project (PGP). The principle of the project is to invite willing volunteers to make their genomic data publicly available – following completion of a test which ensures that they understand the potential risks of being part of such a project. There are several branches of the PGP running globally, and I am delighted to be part of the PGP-UK team, founded by Stephan Beck in 2013.

A few months ago, we released a paper describing the pilot phase of the Personal Genome Project UK, which summarises the project, the recruitment of our pilot participants (our resident ‘citizen scientists’), and the analyses of the multi-omics data generated from these individuals. This included the first genetic and epigenetic reports which were given to participants, and the development of our app, GenoME, which allows you to explore the genomes and epigenomes of four of our participant ambassadors (the app is freely available on the iPad app store now). Finally, the paper also covers our first Genome Donations – we were the first project in the world to develop and use genome donation, allowing individuals who had their genome privately sequenced elsewhere to turn that closed-access data into open-access data for the benefit of research.

This week, we have released another paper on bioRxiv which gives a more in-depth interrogation of the multi-omics dataset we have produced. The paper describes the dataset of genetic (whole genome sequencing), epigenetic (whole genome bisulphite sequencing and methylation array) and transcriptomic (RNA-seq) data for this pilot group. One of the issues we identified when releasing these data, was that there is no single platform in which you could release open-access multi-omics datasets. To solve this problem, as well as releasing the data on the standard separate platforms (ENA, EVA and ArrayExpress), we also established collaborations with the cloud platform providers SevenBridges Genomics and Lifebit, who now host all the multi-omics data in one place. As data analysis can also be performed on these platforms, it removes the need to download all the data to perform analyses locally (which would take over 10 days with a typical connection!).

The paper also describes the extensive quality checks that the team performed on the data to ensure it is of top quality. An important part of this was verifying that the datasets all definitely do correspond to the right participants – something which surprisingly isn’t performed very often in multi-omics projects! To do this, we extracted information about a set of SNPs from each ‘omics’ dataset (whole genome bisulphite sequencing, array data and RNA-seq) and correlated these back to the SNP loci extracted from the whole genome sequencing. This allowed us to verify across all data types that the samples were annotated correctly, and also to ensure that the data quality was consistently high across the dataset.

If you would like to learn more about the project and the multi-omics dataset we generated, here are the links to the two manuscripts:

Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalised medicine (BMC Medical Genomics, November 2018):

The Personal Genome Project-UK: an open access resource of human multi-omics data (BioRxiv, March 2019):

How the UK is paving the way for personalised medicine

The personalised medicine revolution is well under way, and the UK has found itself at the forefront. Projects like the 100,000 genomes project are pushing boundaries in personalised medicine by sequencing entire genetic sequences on an enormous scale, and with unprecedented momentum. While their ultimate goal is to sequence 100,000 genomes from around 70,000 individuals with rare diseases, they have already sequenced 9892. This has already translated to patient benefit, powerfully shown by the case of Jessica Wright.

While this project is an obvious example of the pioneering nature of personalised medicine in the UK, the story runs much deeper than that. The UK is also leading biomarker discovery research, which allows the identification of which parts of the genome can be used to predict which therapies an individual is likely to respond to. When implemented in clinic, this allows a patients treatment regime to be better tailored to their needs and improves their overall outlook.

The key to performing world-class research into biomarker discovery is ultimately down to patient samples. Herein lies the secret of success for the UK’s personalised medicine research: the NHS. The NHS is a unique resource, in that it has universal coverage across the UK. No matter what class or creed you are, you have access to the NHS, and will receive the same world-class treatment for free. As well as being voted the best healthcare system in the world in 2014, this set up also provides an unparalleled environment for medical research. Patients from across the country, and from all walks of life, can be recruited into research studies focusing on understanding diseases, or finding better ways to treat them. Partnerships between the NHS and researchers (both commercial and academic) have allowed extraordinary innovation and are no doubt at the core of our success in personalised medicine research.

Ultimately the NHS is a crucial and unrivalled resource for personalised medicine research which should be nurtured and protected- for our benefit, but more importantly, for generations to come.


The inspiring story of Jessica

All too often the news is filled with stories that would test anyone’s faith in humanity, but every so often, you hear a story that inspires you, and it really sticks. Recently at a meeting, I attended a talk about the 100,000 genomes project, and the speaker gave a short case study about Jessica. Jessica is a four year old girl who had an undiagnosed developmental disorder which made her epileptic and reduced her brain function. By sequencing her entire genome, researchers identified one tiny change to her DNA, in a gene called SLC2A1. This change affected a protein in her brain, stopping her brain from transporting glucose properly. Simply by putting Jessica on a high protein and high fat diet, low in glucose (called a ketogenic diet), they expect her brain function to improve and that her epilepsy will be more easily controlled. Stories like this make me proud to be a scientist!


The use of open access data in the future of personalised medicine


I currently work on the UK branch of the Personal Genome Project (PGP-UK), which has stirred up quite a bit of controversy and was featured in an article by the Guardian this week. The principle of this project (which was originally set up by George Church in Harvard over ten years ago) is that people volunteer to have their DNA sequenced, which combined with their medical and trait data, can contribute to scientific research, improving understanding of the role of the genome in human development and disease. The unique thing about the PGP however, is the open access informed consent policy. This means that when you sign up, you agree to make your DNA sequence and medical records available to the general public.

Such open access data resources have two main implications for the future of personalised medicine. Firstly, they provide a unique resource for researchers to test scientific hypotheses without the constraint of funding and data generation, which will ultimately, along with health and prescription records, allow advancement of personalised medicine research. Secondly, they empower individuals to make more informed decisions about their lifestyle and healthcare. Participants are able to assess their risks for certain diseases and adjust their behaviour accordingly.

While the benefits of open access data are enormous, some people have concerns about the open access policy. In an era where people are obsessed with data privacy and are constantly worried about their personal data being made public, it may seem somewhat counterintuitive to make all this information freely available to anyone who cares to seek it. During the (very lengthy) sign up process, the PGP are absolutely clear that, while you can withhold your name from the process, there is absolutely no guarantee that people will not be able to link the data back to you. DNA is itself, after all, the ultimate personal identifier. Further, there is a potent argument that any idea of anonymity in data is an illusion anyway, and that most “anonymised” data can be directly tracked back to the person it relates to. This was very eloquently shown by Latanya Sweeney from the Data Privacy Lab at Harvard University in 1997, when she cross referenced the zip codes, age and gender from commercially available anonymised medical records with details from the electoral register. Using this strategy, she was able to identify the medical records of the governor of Massachusetts at the time, William Weld, who had himself supported the release of the medical records.

The debate about whether or not open access data is a good thing ultimately boils down to whether you consider the benefits to outweigh the risks. In my opinion, open access data has the potential to revolutionise research, and empowers people to take control of their own destiny. After all, knowledge is power.


Predicting treatment response in rheumatoid arthritis

Recently, I published some work from my PhD in which I identified a small section of DNA which could potentially be used to predict which patients with rheumatoid arthritis are likely to respond to etanercept, a biologic drug.

First of all, for those who are new to the world of epigenetics, I will explain a bit about this. Epigenetics is the study of modifications of the DNA which do not change the DNA sequence, but which can change how the DNA is used (which genes are switched ‘on’ or ‘off’). While there are a lot of different types of epigenetic changes, the most studied modification is called DNA methylation. Your DNA methylation profile changes as you get older, can be influenced by your environment (for example, smoking), and can be changed in disease. DNA methylation is also different in different tissues of your body, such as blood, skin or brain. This is how you can have one genome throughout your body, but lots of different methylation profiles, resulting in lots of different types of cells.

In the paper, we measured the levels of DNA methylation at around half a million methylation sites (called CpGs) in 72 patients with rheumatoid arthritis. All these patients were treated with a biologic therapy called etanercept, and were split into two groups who either responded very well to etanercept (36 patients) or didn’t respond at all (36 patients). By comparing the methylation of the two groups, we found a small segment of a gene (called LRPAP1), which has a different methylation pattern in people with rheumatoid arthritis who respond to etanercept, compared to people who don’t.  This is a relatively small study, and needs to be repeated in a much larger number of patients; however this is a very exciting finding which indicates that DNA methylation could be used in the future to predict which therapies a patient with rheumatoid arthritis is most likely to respond to. This would vastly improve the quality of life of patients with rheumatoid arthritis, as it would decrease the time taken to identify the right drug for a particular patient when they are diagnosed, in a ‘stratified medicine’ approach to treatment.

Stratified medicine is an approach to treatment in which patients are split into smaller subgroups based on their likelihood of responding to particular therapies. This means that instead of a ‘one drug fits all’ model, each patient would be assessed to identify which treatment strategy is most likely to be effective for them. Different characteristics can split patients into these groups, such as genetic or epigenetic tests or other biological markers. While the UK is on the forefront of personalised medicine research, there is still a long way to go before it can be used in everyday medicine.

If you are interested in reading the original article, you can access it here:

If you are interested in reading more about biologic therapies, there is some great information on the arthritis research UK website:

If you are interested in reading more about the basics of epigenetics (after all, who isn’t?!), there was a great article in the guardian, and for more detail try:


I am a scientist with a keen interest in epigenetics and its role in human disease. I am currently a post-doctoral researcher at the University College London Cancer Institute, investigating the role of DNA methylation in graft-versus-host disease following haematopoietic stem cell transplants.