Full Citation
Kokosar J, Turkay C, Ausec L, Stajdohar M, Zupan B. Visual analytics framework for survival analysis and biomarker discovery from gene expression data. PLoS One. 2026;21:e0325399.
Background and Question
Public gene-expression repositories can generate many candidate biomarkers, but exploratory survival mining is prone to overfitting, arbitrary cutpoints, hidden cohort bias, and selective reporting. Visual analytics can make the reasoning chain more inspectable.
Research question
Can an interactive visual analytics workflow help researchers discover and critique survival-associated expression biomarkers more transparently than static scripts alone?
Methods and Evidence Chain
Gene-expression data paired with survival outcomes.
Survival modeling and biomarker-discovery operations exposed through visual analytics.
Supports inspection of gene signals, patient groupings, survival separation, and candidate prioritization.
Makes thresholding and exploratory choices more visible for later validation.
Input
Gene-expression data paired with survival outcomes.
Analysis layer
Survival modeling and biomarker-discovery operations exposed through visual analytics.
User logic
Supports inspection of gene signals, patient groupings, survival separation, and candidate prioritization.
Reproducibility aim
Makes thresholding and exploratory choices more visible for later validation.
Key Results
The framework emphasizes iterative discovery with visual feedback rather than one-shot black-box biomarker ranking.
Visual survival outputs help users see whether a candidate is driven by broad cohort structure or narrow outliers.
The method is relevant to TCGA/GEO-style mining where clinical labels and expression matrices are available.
Computational prioritization still requires independent cohorts and biological assays.
Mechanism Interpretation
The mechanism is epistemic rather than biological: by linking expression distributions, survival splits, and cohort metadata, the analyst can detect unstable cutpoints, subgroup artifacts, and candidates that need external validation before mechanistic interpretation.
Mechanism / workflow schematic
Mermaid source is included so the website can render the diagram in supported browsers.
flowchart TD A[Public expression cohort] --> B[Preprocessing and metadata audit] B --> C[Visual survival exploration] C --> D[Candidate biomarker] D --> E[Locked external validation] E --> F[Mechanistic experiment] F --> G[Clinical utility model] C --> H[Reject unstable cutpoints or cohort artifacts]
Clinical and Translational Relevance
Clinical relevance
For medical research groups, this type of tool can reduce weak biomarker claims and improve preclinical prioritization. It is particularly useful before spending resources on qPCR validation, tissue microarrays, or mechanistic experiments.
Translational value
The framework can be adapted into a lab pipeline: public-dataset screen, preregistered cutpoint policy, independent validation, wet-lab perturbation, and clinical model comparison.
Limitations and Critique
Survival association does not imply disease-driving biology.
Public cohorts may have batch effects, incomplete treatment data, and inconsistent clinical annotation.
Interactive exploration can amplify multiple-testing risk unless controlled by workflow rules.
A statistically significant gene rarely becomes a deployable clinical biomarker without calibration and decision-curve testing.
Reviewer-style critique
The paper is most useful as a guardrail against superficial database mining. The danger is that attractive visual separation can still seduce users into overclaiming; every visually discovered signal needs locked validation and transparent reporting.
Practical Next Research Actions
Action 1
Create a local public-database mining checklist covering dataset provenance, preprocessing, cutpoint policy, multiplicity, and validation.
Action 2
Use visual analytics to screen scar, wound, and skin-cancer datasets, then freeze candidates before external testing.
Action 3
Pair survival mining with pathway enrichment and cell-type deconvolution to avoid isolated single-gene claims.
Action 4
Report negative validation results to prevent repeated weak biomarker narratives.
Evidence-quality judgment
Moderate methods evidence: useful for reproducible exploration, but clinical claims depend entirely on downstream validation.