PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling

Xinyu Yuan1,2,*, Xixian Liu1,2,*, Yashi Zhang1,2,*, Zuobai Zhang1,2, Hongyu Guo3,4, Jian Tang1,5,6,+

1Mila – Québec AI Institute   2Université de Montréal   3University of Ottawa   4National Research Council of Canada   5HEC Montréal   6CIFAR AI Chair

*Equal Contribution; the first author led the project, and the others are ordered alphabetically. Corresponding Author

Mila Université de Montréal University of Ottawa HEC Montréal

Abstract

Building virtual cells that simulate cellular responses to perturbations is a central challenge in systems biology. Because single-cell sequencing is destructive, control and perturbed cells are unpaired, requiring distribution-level modeling. Existing methods assume a single fixed response distribution under each condition. However, unobserved latent factors such as microenvironment and batch effects induce a manifold of plausible response distributions. PerturbDiff models perturbation responses as distribution-valued random variables and defines a diffusion process directly in a reproducing kernel Hilbert space. Across signaling, drug, and genetic benchmarks, PerturbDiff achieves state-of-the-art performance and improved generalization.

Motivation

Distribution variability
fig_distribution_variability.png
Distributional variability in single-cell perturbation data. (a) Traditional methods operate on unpaired control and perturbed cells, learning to map a control cell distribution to a perturbed one. (b) However, variations in cell distributions arise from unobserved latent factors, inducing a family of distinct cell distributions and shifting the objective to learning a distribution over cell distributions.
  • Single-cell sequencing is destructive, thereby no cell-to-cell correspondence exists.
  • Existing methods often assume a single perturbed distribution Pc,τ when conditioned on observed the cell type c and perturbation type τ.
  • In reality, unobservable latent biological and technical factors induce a distribution over distributions.
  • We propose to model this variability at the distribution level.

Method

Framework
fig_framework.png
Overview of the PerturbDiff framework. (a) Distribution-valued random variables Dc,τ and Dc in cell space are mapped to Hilbert-space elements μc,τ and μc ∈ Hk via kernel mean embedding. (b) Diffusion is defined on perturbed embeddings μ0 := μc,τ, with a denoising network predicting the target μθ. (c) Each MM-DiT block performs joint attention over control and perturbed token streams.

Distribution as Random Variable

We treat perturbed populations as distribution-valued random variables Dc,τ.

Kernel Mean Embedding

Each distribution P is mapped to μP in RKHS via kernel mean embedding.

Diffusion in Function Space

We define a DDPM-style diffusion directly over μ in Hilbert space.

MMD Objective

The Hilber-space diffusion derived denoising objective RKHS distance equals MMD, yielding a principled distribution-aware loss.

Results

Perturbation modeling

Radar plot
fig_radar.png
  • State-of-the-art on PBMC and Tahoe100M, competitive on Replogle.
  • Strong differential expression recovery.
  • Robust cross-dataset generalization.
Scatter
fig_scatter.png
  • Consistent performance across perturbation types.
DE recovery
fig_de_recovery.png
  • Accurate recovery of perturbation-driven differential expression (DE) patterns, compared to ground truth and the best baseline.

Pretraining

  • To improve data efficiency, PerturbDiff introduces marginal pretraining by leveraging 61M cells from scRNA-seq datasets in CellxGene.
Zero-shot
fig_zero_shot.png
  • This stage enables non-trivial zero-shot performance.
Low-data
fig_lowdata.png
  • This stage also improves low-data adaptation.

Impact

PerturbDiff provides a principled framework for virtual cell modeling, accelerating perturbation prediction in functional genomics and drug discovery. Because predictions depend on training data, the model should be used as a decision-support tool alongside experimental validation.

Citation

@article{perturbdiff,
  title   = {PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling},
  author  = {Xinyu Yuan, Xixian Liu, Yashi Zhang, Hongyu Guo, and Jian Tang},
  year    = {2025},
}