A series of two articles will discuss how data has transformed biomedical research and what it implies in the general field? The first one will cover the introduction to the topic and breakthroughs brought by large data.

The front end of research always involves a lot of discovering in the unknown. In the scientific discovery stories we have all been told of the discovery of penicillin, Charles Darwin’s observations and much more. It was indeed quite lucky that the scientists were able to catch these tiny observations that ultimately grew into a hypothesis or a theory that revolutionalised the field. The good news is; nowadays, there are a variety of publicly available datasets, and scientific programs that contain patient medical histories and biological sample data. There are a lot of methods that scientists employ to visualize and interpret them, thus allowing researchers to start asking some meaningful questions that we previously could not.

Single cell analyses to study heterogeneity

A good example is the study of the complex interactions in a tumour microenvironment. The tumour microenvironment contains multiple cellular components, including cancer cells, immune cells, fibroblasts, blood vessels, and the extracellular matrix. However, within each cellular component, these cells could have slightly different genetic profiles, morphologies, phenotypes, and behaviour, described by a phenomenon called heterogeneity. Understanding heterogeneity in the tumour microenvironment could teach us about why certain therapies work better for some tumours and not for the others. A possible case is when certain key proteins or antigens such as PD-L1 are more highly expressed in certain cancer cell populations, thus being more reactive to the anti-PDL1 antibodies used in immune checkpoint blockades therapy (Kashima et al., 2021). Therefore, being able to identify the heterogenous populations in a tumour microenvironment and to profile their omics could provide insights to how a particular patient would react upon therapy.

Researchers can employ various single-cell molecular profiling techniques. To understand the state of the cells at a certain timepoint, one of the most common ways is to examine RNA profile of a cell with a technique called single-cell RNA sequencing (scRNA-seq). RNA can be considered as a set of protocols that instructs the cell to produce specific types of protein. Thus, by looking at the RNA spatial distribution of a cell, we can not only tell how much a protein would be produced, scRNA-seq is often coupled with other types of measurements such as the cell’s epigenetic state, and cell surface protein abundance.

With some sophisticated computational methods such as t-distributed Stochastic Neighbour Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP), scientists can integrate all these different types of data and cluster similar cells together to classify their corresponding phenotypes (Fig 1). The revolution of obtaining data of high dimensions enable scientists to understand more about heterogeneity, which not only provides a lot of insights into predicting the efficacy of a treatment, but also can spark philosophical discussions about themes like “what is a cell type?”.

(For those who are interested in the more technical details: Kashima et al., 2021, Stuart and Satija, 2019)

(Figure 1. From Kashima et al., 2021. How clusters of different cell types can be visualized using tSNE)

What is a “multi-dimensional world” anyway?

It is often said that science is humanity’s ultimate pursuit of the truth. Prior to our age of “data revolution”, our attempt to describe interactions between multiple cell types would mostly be limited to a common practice in classic developmental biology: draw out what was observed under a microscope, extract different cells from the embryo, compare their gene expressions in an attempt to identify a biomarker gene for the phenotype, and finally, use gene knockout models to observe its impact. This approach of trying to find one gene expressed by one cell type, would be what I describe as an “one-dimensional” approach.

However, whilst these approaches seemed to work to uncover master regulator genes in early stages of embryonic development, it would not be applicable in contexts that involve communication between various cell types in a local microenvironment, or even between different systems on a whole-body level. Currently, by adopting a “multi-dimensional” approach, we are now able to describe the phenotype of a cell in multiple perspectives to a high resolution and observe the landscape of an entire tissue at a single cell unit. This is perhaps the closest we have got to the “true state” of tissue biology.

The idea of “multi-dimensionality” does not necessarily have to be “a 4D creature looking at us just like how us 3D beings are watching cartoons”. Instead, the elegance of “multi-dimensionality” lies in appreciating, just as how a cell’s identity is influenced by multiple other agents in the tissue microenvironment, each single person and each point of event could relate to millions of others in a giant network. It is the network of people and events that make life intricate and perplexing; thus, the multi-dimensional world.


Kashima, Y., Togashi, Y., Fukuoka, S., Kamada, T., Irie, T., Suzuki, A., Nakamura, Y., Shitara, K., Minamide, T., Yoshida, T., Taoka, N., Kawase, T., Wada, T., Inaki, K., Chihara, M., Ebisuno, Y., Tsukamoto, S., Fujii, R., Ohashi, A. and Suzuki, Y. (2021). Potentiality of multiple modalities for single-cell analyses to evaluate the tumor microenvironment in clinical specimens. Scientific Reports, 11(1). doi:https://doi.org/10.1038/s41598-020-79385-w.

‌Stuart, T. and Satija, R. (2019). Integrative single-cell analysis. Nature Reviews Genetics, 20(5), pp.257–272. doi:https://doi.org/10.1038/s41576-019-0093-7.