Science

Build the model the right way

Our first technical milestone is a proprietary foundation model in histopathology — optimized for the data–context–compute triangle, and designed to generalize across hospitals.

HistFM: the histology foundation model

Whole-slide images are enormous. Capturing both micro-scale cellular detail and macro-scale tissue organization requires long-context modeling.

Training substrate

We start with public datasets that cover diverse tissue types and cancer cohorts, then expand with partner data:

  • TCGA for tumor diversity and paired outcomes.
  • GTEx for normal tissue context.
  • CPTAC for proteogenomics (critical for multimodal alignment).

Architecture philosophy

We separate the problem into (1) local tile representation and (2) slide-level aggregation with long-range context. This modularity lets us iterate faster and evaluate each component independently.

  • Strong tile encoders for cellular texture and morphology.
  • Long-context slide encoder for spatial organization and microenvironment structure.
  • Multimodal objectives to embed what matters biologically.
Licensing discipline: We respect open-source and research licenses. Research-only weights are never used in deployed products — we either train our own, or obtain the appropriate commercial permissions.

Proteomics-aware learning

Vision-only models can be powerful — but therapy response is mediated by molecular function. Aligning histology with proteomics helps the representation become mechanistically useful.

Multi-task objective

Alongside self-supervised objectives, we add supervised regression heads to predict proteomic abundance on CPTAC subsets. The goal is not a single perfect predictor; it’s a richer embedding space.

External validation

We hold out cohorts and institutions to prevent leakage. CPTAC is particularly useful as a high-quality external test bed when splits are done correctly.

Path to causality

The model suggests resistance pathways; the lab tests key hypotheses. This is how the platform earns trust — and generates proprietary data.

Benchmarks & evaluation

We evaluate on tasks that test robustness, generalization, and molecular inference — not just in-distribution accuracy.

Representative benchmarks

  • PANDA — prostate cancer grading.
  • UBC-OCEAN — ovarian cancer subtyping.
  • TCGA-NSCLC — lung cancer subtyping.
  • Camelyon17-WILDS — distribution shift across hospitals.
  • MHIST — colorectal polyp classification.

The exact benchmark set may evolve as we align with the most informative public evaluations.

Molecular prediction

For multimodal grounding we use gene expression / molecular prediction suites (e.g., HEST), and evaluate whether multimodal training improves out-of-distribution performance.

  • Strict train/test splits to avoid cohort leakage.
  • Confidence calibration and uncertainty reporting.
  • Population-level bias checks when metadata allows it.

Safety & regulatory posture

We begin as a research and decision-support platform, not a diagnostic device. Clinical claims require prospective validation and the right regulatory pathway.

Human in the loop

Outputs are designed to support clinicians with evidence and uncertainty — not replace judgement.

Auditability

Every prediction should be traceable: data provenance, model versioning, and evaluation context.

Clinical validation

Prospective studies and lab validation are the bridge between retrospective signal and real-world care.

Note: Nothing on this site is medical advice. Enso’s technology is under development and not cleared or approved for clinical use.