CROP-Seq training datasets for AI-based foundation models of human cell biology

A foundation model of the human cell is a digital representation of the biology of a human cell. The amount of publicly available training data for such models is very limited, and Myllia's CROP-Seq perturbation datasets are a powerful source of information to fuel AI/ML-based engines accelerating drug target discovery.

Figure 1: Experimental workflow of CROP-Seq for AI training datasets at scale

Myllia’s CROP-Seq perturbation datasets can be utilized at unprecedented scale to enable the training of AI models. For instance, such AI/ML-based foundation models will help create digital avatars of human cells to predict phenotypic changes caused by CRISPR perturbation and “matched” drug perturbation data. Contact us to discuss custom projects on CROP-Seq training data generation and potential in-licensing collaborations to take your data-driven AI models to the next level. As an example, a comparative CROP-Seq screen across different cell lines is available off-the-shelf.

Artificial Intelligence has started impacting all areas of biological research, but unbiased training and testing data are still scarce. To alleviate this, a CRISPR screen was conducted across six different human cell lines (Figure 1). More specifically, the same set of 218 genes were knocked out in THP-1, Jurkat, K562, A549, U2OS or K562 cells. In addition, THP1 cells were differentiated macrophage-like cells using PMA (M0) or further differentiated using LPS treatment (M1). Following perturbation, a transcriptomic snapshot was recorded using unbiased single-cell RNA sequencing.

Analysis of single-cell RNA sequencing data revealed that the different cell lines clustered by cell identity (Figure 2). Of note, M0 and M1 macrophages clustered near their THP1 monocyte “parents”.

UMAP plot of all experimental conditions

Figure 2: UMAP plot of all experimental conditions

A detailed analysis of one of the conditions (M1) revealed the clustering of single CRISPR knockouts in distinct areas of the UMAP plot (Figure 3), suggesting that these gene knockouts had a significant impact on the transcriptome of these cells.

Transcriptomic phenotypes of distinct gene knockouts

Figure 3: Transcriptomic phenotypes of distinct gene knockouts

Download the introductory slide deck about the comparative CROP-Seq screen conducted across 8 different cancer cell lines, all performed using a single sgRNA library targeting the very same set of 218 genes in parallel: