Human social interaction depends on the ability to infer emotional states that are never directly observable. Unlike basic sensory perception, […]

The widespread adoption of genomic sequencing has fundamentally transformed modern medical diagnostics. Advances in next-generation sequencing technologies have made the identification of genetic variation routine, scalable, and increasingly affordable. As a result, genomic data generation is no longer the primary bottleneck in precision medicine. Instead, the central challenge lies in interpretation: translating raw sequence variation into meaningful biological and clinical insight.
Despite remarkable technical progress, the clinical impact of genomic sequencing remains constrained by this unresolved gap. Current computational tools are highly effective at determining whether a genetic variant is potentially pathogenic. However, pathogenicity alone is insufficient for clinical decision-making. Clinicians must understand how a mutation manifests clinically, which disease pathways it affects, which organ systems are involved, and how it aligns with a patient’s phenotype. The lack of phenotypic specificity in variant interpretation continues to limit the practical utility of genomic medicine.
Most existing variant interpretation pipelines rely on population frequency, evolutionary conservation, protein structural disruption, and curated pathogenicity databases. These approaches generate scores indicating whether a variant is likely harmful, but they do not specify which disease the variant causes. As a result, sequencing analyses often yield long lists of candidate variants that require extensive manual prioritization.
This limitation is especially problematic in rare and complex disorders, where genetic heterogeneity and overlapping clinical features complicate diagnosis. In such cases, clinicians are forced to infer phenotype relevance indirectly through labor-intensive genotype–phenotype correlation, slowing diagnosis and reducing diagnostic yield.
To address this challenge, researchers at the Icahn School of Medicine at Mount Sinai developed V2P (Variant to Phenotype), an artificial intelligence–driven framework designed to directly associate genetic variants with disease categories. Rather than focusing solely on functional impact, V2P predicts the disease class most likely associated with a given mutation.
Reported in Nature Communications on December 15, 2025, V2P represents a conceptual shift in genomic interpretation from a variant-centric view to a phenotype-aware model that aligns more closely with clinical reasoning.
V2P introduces a phenotype-aware approach to variant interpretation. Rather than asking solely whether a variant is pathogenic, V2P evaluates the likelihood that a variant contributes to a specific disease class. This represents a conceptual shift from variant-centric to phenotype-centric analysis.
The underlying hypothesis of V2P is that pathogenic variants associated with similar disease phenotypes share common molecular and biological features. By learning these features from large-scale datasets, machine learning models can infer phenotype associations even for previously uncharacterized variants.
The V2P framework integrates multiple layers of genomic and clinical information, including:
Pathogenic and benign variants curated from established human genetic databases
Disease annotations linking genes and variants to clinical phenotypes
Variant-level features such as predicted protein impact, conservation metrics, and functional annotations
Importantly, V2P employs phenotype-specific machine learning models, rather than a single generalized pathogenicity predictor. Each model is trained to recognize patterns associated with a particular disease category, such as neurological disorders, cancer, or immune-mediated disease.
This design allows V2P to generate probabilistic predictions reflecting the likelihood that a variant contributes to a given disease phenotype, thereby aligning computational output with clinical reasoning.
The performance of V2P was evaluated using de-identified clinical genomic datasets, reflecting real-world diagnostic complexity. Across multiple test cases, the framework demonstrated a consistent ability to prioritize the true disease-causing variant among the highest-ranked candidates, frequently within the top ten.
Compared with conventional pathogenicity-only tools, V2P showed improved relevance to patient phenotype, significantly reducing the number of variants requiring downstream clinical evaluation.
At its current stage, V2P predicts associations with broad disease categories, including:
Neurological and neurodevelopmental disorders
Oncological conditions
Immune and inflammatory diseases
Even at this level of granularity, the framework offers substantial gains in diagnostic efficiency.
From a clinical perspective, V2P enhances variant interpretation by aligning genomic findings with disease biology. This has several important implications:
For rare disease medicine, in particular, V2P has the potential to reduce diagnostic delay and improve diagnostic yield.
Beyond diagnostics, V2P provides a scalable approach for identifying disease-relevant genes and pathways. By systematically associating variants with phenotypic outcomes, the framework supports:
As genetic validation is a strong predictor of therapeutic success, phenotype-specific variant interpretation may substantially improve the efficiency of translational research.
Despite its conceptual and practical advances, the V2P (Variant to Phenotype) framework has several important limitations that must be carefully considered before widespread clinical adoption.
First, phenotypic resolution is currently limited. V2P classifies variants into broad disease categories, such as neurological disorders, cancer, or immune-mediated disease, rather than predicting specific clinical diagnoses. While this level of categorization is valuable for variant prioritization, it does not yet replace detailed clinical-genetic interpretation required for definitive diagnosis. Many diseases share overlapping molecular pathways, and coarse phenotype grouping may obscure clinically meaningful distinctions.
Second, dependence on existing curated datasets introduces inherent bias. The model is trained on known pathogenic and benign variants with available disease annotations. Diseases and genes that are well-studied are therefore overrepresented, while rare disorders, underrepresented populations, and novel disease mechanisms may be insufficiently captured. This limitation reflects a broader challenge in genomic AI: model performance is constrained by the scope and diversity of available data.
Third, population diversity remains a concern. Like many genomic tools, V2P is susceptible to reduced accuracy in populations that are underrepresented in genetic databases. Variants common in non-European ancestries may be misclassified or deprioritized, raising concerns about equity and generalizability in global clinical practice.
Fourth, contextual clinical factors are not fully integrated. V2P focuses primarily on variant-level and disease-level associations and does not yet incorporate detailed patient-specific clinical features such as age of onset, disease progression, environmental modifiers, or comorbidities. As a result, predictions must still be interpreted within the broader clinical context by experienced professionals.
Finally, interpretability and regulatory readiness remain evolving challenges. Although V2P improves biological relevance, AI-driven predictions still require transparent explanation to support clinical trust, regulatory approval, and medico-legal accountability. The framework is not intended to function autonomously and must be embedded within expert-guided diagnostic workflows.
The development of V2P opens several promising avenues for future research and clinical application.
A key priority is increasing phenotypic granularity. Future iterations aim to move beyond broad disease classes toward prediction of more specific disease subtypes and clinical syndromes. Achieving this will require larger, more deeply annotated genotype, phenotype datasets and refined modeling strategies.
Another critical direction is integration of multimodal data. Incorporating transcriptomic, proteomic, metabolomic, and epigenetic data, alongside structured clinical phenotypes, could substantially improve prediction accuracy and biological interpretability. Such integration would allow V2P to model disease mechanisms more holistically rather than relying primarily on static DNA variation.
Expansion into therapeutic relevance is also anticipated. By linking variants not only to disease categories but also to druggable pathways and treatment response, V2P could support genetically informed therapy selection and drug development. This would align the framework more closely with precision pharmacology and personalized medicine initiatives.
Further, prospective clinical validation will be essential. Large-scale, real-world studies across diverse populations and medical specialties are needed to establish clinical utility, robustness, and reproducibility. These studies will also inform regulatory pathways and clinical implementation standards.
Finally, ongoing work will need to address ethical and equity considerations, ensuring fair performance across populations and transparent communication of uncertainty. Continuous model updating, bias auditing, and clinician education will be critical for responsible deployment.
V2P represents a significant methodological advance in genomic medicine by addressing a long-standing limitation in variant interpretation: the inability to connect genetic mutations directly to disease phenotypes. By extending beyond pathogenicity prediction to phenotype-specific inference, the framework provides a more clinically aligned and biologically meaningful approach to genomic analysis.
While current limitations, particularly in phenotypic resolution, data diversity, and clinical integration, necessitate cautious interpretation, the foundational concept of phenotype-aware variant prediction is both robust and forward-looking. V2P should be viewed not as a replacement for clinical expertise, but as a powerful decision-support tool that enhances diagnostic efficiency, accelerates discovery, and strengthens the translational bridge between genomics and patient care.