Enhanced Prompt Evaluation Template

A comprehensive framework for evaluating prompts across cognitive, epistemic, and contextual dimensions.

Template Overview

The Enhanced Prompt Evaluation Template provides a structured approach to evaluating prompts across multiple dimensions, incorporating advanced cognitive architecture principles, epistemic integrity mechanisms, and context engineering approaches. This template enables comprehensive assessment of prompt quality, effectiveness, and alignment with intended goals.

Key Benefits

  • Comprehensive multi-dimensional evaluation
  • Integration of cognitive and epistemic metrics
  • Standardized scoring system
  • Cross-module compatibility
  • Recursive self-improvement

Use Cases

  • Prompt quality assessment
  • Comparative prompt evaluation
  • Iterative prompt refinement
  • Cross-domain prompt adaptation
  • Prompt failure analysis

Cognitive Dimensions

Mental model and cognitive aspects

Evaluation metrics focused on mental model preservation, cognitive transparency, and reconstructability.

Epistemic Dimensions

Knowledge integrity and verification

Metrics for assessing semantic integrity, drift resistance, and knowledge boundary enforcement.

Contextual Dimensions

Context definition and execution

Evaluation of context definition clarity, boundary enforcement, and execution verification.

Functional Dimensions

Task performance and effectiveness

Assessment of goal alignment, task completion, output quality, and efficiency.

Ethical Dimensions

Value alignment and pluralism

Evaluation of ethical considerations, value alignment, and pluriversal awareness.

Meta-Recursive Dimensions

Self-improvement and adaptation

Assessment of self-improvement mechanisms, adaptability, and recursive enhancement.

Evaluation Dimensions

Cognitive
Epistemic
Contextual
Functional
Ethical
Meta-Recursive

Cognitive Dimensions

Mental Model Preservation

Assessment of how well the prompt preserves the underlying mental models:

Evaluation Criteria
  • Model Documentation: Explicit documentation of mental models (0-5)
  • Concept Stability: Consistency of key concepts (0-5)
  • Relationship Preservation: Maintenance of concept relationships (0-5)
  • Assumption Documentation: Explicit documentation of assumptions (0-5)
Mental Model Preservation Index (MMPI)

Composite score calculated from individual criteria (0-1 scale)

Example MMPI: 0.85

Cognitive Reconstructability

Assessment of how easily original intent and reasoning can be reconstructed:

Evaluation Criteria
  • Intent Clarity: Explicitness of intended purpose (0-5)
  • Reasoning Transparency: Visibility of reasoning structure (0-5)
  • Artifact Preservation: Presence of cognitive artifacts (0-5)
  • Symbolic Scarring: Documentation of past failures and resolutions (0-5)
Reconstructability Rating (RR)

Composite score calculated from individual criteria (0-1 scale)

Example RR: 0.80

Cognitive Transparency

Assessment of how explicitly reasoning and assumptions are presented:

Evaluation Criteria
  • Reasoning Explicitness: Clarity of reasoning structure (0-5)
  • Assumption Visibility: Explicitness of assumptions (0-5)
  • Transparency Layering: Implementation of transparency layers (0-5)
  • Metacognitive Signaling: Explicit metacognitive indicators (0-5)
Cognitive Transparency Index (CTI)

Composite score calculated from individual criteria (0-1 scale)

Example CTI: 0.90

Epistemic Dimensions

Semantic Integrity

Assessment of how well the prompt maintains consistent meaning:

Evaluation Criteria
  • Semantic Pinning: Explicit definition of key terms (0-5)
  • Concept Boundary Definition: Clarity of concept boundaries (0-5)
  • Relationship Formalization: Explicit definition of relationships (0-5)
  • Semantic Monitoring: Mechanisms for tracking meaning shifts (0-5)
Semantic Drift Coefficient (SDC)

Measure of meaning shift between iterations (0-1 scale, lower is better)

Example SDC: 0.04

Epistemic Sovereignty

Assessment of control over meaning and knowledge boundaries:

Evaluation Criteria
  • Boundary Enforcement: Mechanisms for enforcing knowledge boundaries (0-5)
  • Evolution Control: Governance of semantic changes (0-5)
  • Authority Mechanisms: Clear authority for semantic decisions (0-5)
  • Sovereignty-Drift Balance: Appropriate balance for domain (0-5)
Sovereignty-Drift Balance

Assessment of balance between stability and evolution

Sovereignty: 0.75

Evolution: 0.25

Epistemic Status

Assessment of knowledge quality and verification:

Evaluation Criteria
  • Source Quality: Quality and reliability of knowledge sources (0-5)
  • Verification Mechanisms: Methods for verifying knowledge (0-5)
  • Uncertainty Handling: Explicit handling of uncertainty (0-5)
  • Epistemic Humility: Recognition of knowledge limitations (0-5)
Epistemic Quality Index (EQI)

Composite score calculated from individual criteria (0-1 scale)

Example EQI: 0.88

Contextual Dimensions

Context Definition

Assessment of how clearly context is defined:

Evaluation Criteria
  • Bundle Formalization: Formal definition of context bundle (0-5)
  • Source Specification: Clear specification of context sources (0-5)
  • Boundary Definition: Explicit definition of context boundaries (0-5)
  • Constraint Specification: Clear definition of constraints (0-5)
Context Definition Quality (CDQ)

Composite score calculated from individual criteria (0-1 scale)

Example CDQ: 0.95

Context-to-Execution Pipeline

Assessment of how well context is linked to execution:

Evaluation Criteria
  • Context Binding: Mechanisms for binding context to execution (0-5)
  • PRP Implementation: Use of Product-Requirements Prompt structure (0-5)
  • Pipeline Completeness: Implementation of all pipeline stages (0-5)
  • Traceability: Ability to trace execution back to context (0-5)
Pipeline Implementation Quality (PIQ)

Composite score calculated from individual criteria (0-1 scale)

Example PIQ: 0.90

Execution Verification

Assessment of verification mechanisms:

Evaluation Criteria
  • Pre-execution Validation: Validation before execution (0-5)
  • Runtime Monitoring: Monitoring during execution (0-5)
  • Post-execution Verification: Verification after execution (0-5)
  • Audit Trail: Comprehensive execution record (0-5)
Verification Quality Index (VQI)

Composite score calculated from individual criteria (0-1 scale)

Example VQI: 0.85

Functional Dimensions

Goal Alignment

Assessment of alignment with intended goals:

Evaluation Criteria
  • Goal Clarity: Clear definition of goals (0-5)
  • Alignment Mechanisms: Methods for ensuring alignment (0-5)
  • Goal Traceability: Ability to trace execution to goals (0-5)
  • Goal Verification: Methods for verifying goal achievement (0-5)
Goal Alignment Score (GAS)

Composite score calculated from individual criteria (0-1 scale)

Example GAS: 0.95

Task Completion

Assessment of task completion effectiveness:

Evaluation Criteria
  • Completeness: Full completion of required tasks (0-5)
  • Accuracy: Correctness of task execution (0-5)
  • Consistency: Consistent performance across instances (0-5)
  • Robustness: Performance under varying conditions (0-5)
Task Completion Quality (TCQ)

Composite score calculated from individual criteria (0-1 scale)

Example TCQ: 0.92

Efficiency and Performance

Assessment of resource usage and performance:

Evaluation Criteria
  • Token Efficiency: Efficient use of tokens (0-5)
  • Processing Time: Time required for execution (0-5)
  • Resource Usage: Efficient use of computational resources (0-5)
  • Scalability: Performance at scale (0-5)
Efficiency Index (EI)

Composite score calculated from individual criteria (0-1 scale)

Example EI: 0.88

Ethical Dimensions

Value Alignment

Assessment of alignment with ethical values:

Evaluation Criteria
  • Value Explicitness: Clear articulation of values (0-5)
  • Alignment Mechanisms: Methods for ensuring value alignment (0-5)
  • Value Conflicts: Handling of value conflicts (0-5)
  • Value Verification: Methods for verifying value alignment (0-5)
Value Alignment Index (VAI)

Composite score calculated from individual criteria (0-1 scale)

Example VAI: 0.90

Pluriversal Awareness

Assessment of awareness and respect for multiple worldviews:

Evaluation Criteria
  • Perspective Diversity: Inclusion of diverse perspectives (0-5)
  • Cultural Sensitivity: Awareness of cultural differences (0-5)
  • Epistemic Pluralism: Recognition of multiple ways of knowing (0-5)
  • Contextual Adaptation: Adaptation to different contexts (0-5)
Pluriversal Awareness Score (PAS)

Composite score calculated from individual criteria (0-1 scale)

Example PAS: 0.85

Harm Prevention

Assessment of mechanisms to prevent harm:

Evaluation Criteria
  • Risk Assessment: Identification of potential harms (0-5)
  • Mitigation Mechanisms: Methods for preventing harm (0-5)
  • Monitoring Systems: Ongoing monitoring for harm (0-5)
  • Response Protocols: Procedures for addressing harm (0-5)
Harm Prevention Index (HPI)

Composite score calculated from individual criteria (0-1 scale)

Example HPI: 0.95

Meta-Recursive Dimensions

Self-Improvement Mechanisms

Assessment of mechanisms for self-improvement:

Evaluation Criteria
  • Feedback Integration: Mechanisms for incorporating feedback (0-5)
  • Learning Systems: Systems for learning from experience (0-5)
  • Adaptation Mechanisms: Methods for adapting to new contexts (0-5)
  • Recursive Enhancement: Ability to improve itself (0-5)
Self-Improvement Index (SII)

Composite score calculated from individual criteria (0-1 scale)

Example SII: 0.88

Cross-Component Integration

Assessment of integration with other framework components:

Evaluation Criteria
  • Interface Standardization: Standardized component interfaces (0-5)
  • Data Sharing: Effective data sharing between components (0-5)
  • Service Orchestration: Coordination of component services (0-5)
  • Cross-Validation: Validation across components (0-5)
Integration Quality Index (IQI)

Composite score calculated from individual criteria (0-1 scale)

Example IQI: 0.90

Emergent Intelligence

Assessment of potential for emergent capabilities:

Evaluation Criteria
  • Combinatorial Potential: Potential for novel combinations (0-5)
  • Adaptive Capacity: Ability to adapt to new situations (0-5)
  • Generative Capability: Ability to generate new approaches (0-5)
  • Emergence Support: Support for emergent properties (0-5)
Emergent Intelligence Potential (EIP)

Composite score calculated from individual criteria (0-1 scale)

Example EIP: 0.85

Evaluation Process

Preparation

Define evaluation context and criteria

Assessment

Evaluate across all dimensions

Scoring

Calculate composite scores

Analysis

Identify strengths and weaknesses

Improvement

Develop enhancement plan

Evaluation Workflow

  1. Preparation Phase
    • Define evaluation context and purpose
    • Select relevant evaluation dimensions
    • Customize evaluation criteria for specific domain
    • Establish scoring thresholds and benchmarks
  2. Assessment Phase
    • Evaluate prompt across all selected dimensions
    • Assign scores for each evaluation criterion
    • Document evidence supporting each score
    • Identify areas of strength and weakness
  3. Scoring Phase
    • Calculate composite scores for each dimension
    • Normalize scores to 0-1 scale
    • Apply domain-specific weightings if needed
    • Calculate overall evaluation score
  4. Analysis Phase
    • Identify patterns across dimensions
    • Compare scores to benchmarks and thresholds
    • Analyze trade-offs between dimensions
    • Prioritize areas for improvement
  5. Improvement Phase
    • Develop specific enhancement recommendations
    • Create improvement plan with concrete actions
    • Establish metrics for measuring improvement
    • Schedule follow-up evaluation

Scoring System

Individual Criteria Scoring

  • 0: Absent/Not implemented
  • 1: Minimal/Basic implementation
  • 2: Partial implementation
  • 3: Standard implementation
  • 4: Advanced implementation
  • 5: Exceptional implementation

Composite Score Interpretation

  • 0.90-1.00: Exceptional
  • 0.80-0.89: Advanced
  • 0.70-0.79: Proficient
  • 0.60-0.69: Competent
  • 0.50-0.59: Basic
  • 0.00-0.49: Needs improvement

Overall Evaluation Score

Calculated as weighted average of dimension scores, with weights customized for specific domains and use cases.

Example Overall Score Calculation:
  • Cognitive Dimensions: 0.85 × 0.20 = 0.170
  • Epistemic Dimensions: 0.90 × 0.20 = 0.180
  • Contextual Dimensions: 0.88 × 0.20 = 0.176
  • Functional Dimensions: 0.92 × 0.20 = 0.184
  • Ethical Dimensions: 0.85 × 0.10 = 0.085
  • Meta-Recursive Dimensions: 0.87 × 0.10 = 0.087
  • Overall Score: 0.882 (Advanced)

Overall Evaluation Score: 0.88

Implementation Guide

Getting Started

  1. Define evaluation purpose and scope
  2. Select relevant dimensions and criteria
  3. Customize scoring weights for domain
  4. Prepare evaluation templates
  5. Train evaluators on framework

Integration with Workflow

  1. Identify evaluation points in workflow
  2. Establish evaluation triggers
  3. Define evaluation frequency
  4. Create feedback loops
  5. Document evaluation results

Automation Opportunities

  1. Automate metric calculation
  2. Implement monitoring systems
  3. Create evaluation dashboards
  4. Develop recommendation engines
  5. Build continuous improvement systems

Common Challenges

  1. Balancing evaluation depth with efficiency
  2. Ensuring consistent evaluation across teams
  3. Managing evaluation overhead
  4. Adapting to evolving requirements
  5. Maintaining evaluation quality

Best Practices

  1. Regularly review and update criteria
  2. Calibrate evaluators periodically
  3. Document evaluation rationales
  4. Track improvements over time
  5. Share evaluation insights across teams

Advanced Implementation

  1. Implement cross-component evaluation
  2. Develop recursive self-evaluation
  3. Create adaptive evaluation systems
  4. Build evaluation knowledge base
  5. Implement continuous improvement cycles

Integration with Framework

Cross-Component Integration

The Enhanced Prompt Evaluation Template is designed to integrate seamlessly with other components of the Enhanced Prompt Engineering Framework:

Minimalism Challenge

Provides evaluation metrics for assessing minimal prompts while maintaining cognitive integrity and context anchoring.

Learn More

Grammar Assistant

Incorporates evaluation criteria for assessing grammatical structures that enhance cognitive transparency and semantic integrity.

Learn More

Integrated Framework

Provides standardized evaluation metrics that enable cross-component communication and validation.

Learn More

Case Study: Financial Reporting Prompt Evaluation

Original Prompt

"Extract financial metrics from the quarterly report"

Evaluation Results
  • Cognitive Dimensions: 0.45 (Needs improvement)
  • Epistemic Dimensions: 0.40 (Needs improvement)
  • Contextual Dimensions: 0.35 (Needs improvement)
  • Functional Dimensions: 0.55 (Basic)
  • Ethical Dimensions: 0.50 (Basic)
  • Meta-Recursive Dimensions: 0.30 (Needs improvement)
  • Overall Score: 0.43 (Needs improvement)

Enhanced Prompt

DocumentAnalysis[
  Context: 'Q2_2025_Financial_Report', 
  Goal: 'ExtractKeyMetrics_Comprehensive', 
  Constraint: 'FactualAccuracy_SDCLessThan0.05'
]
Evaluation Results
  • Cognitive Dimensions: 0.85 (Advanced)
  • Epistemic Dimensions: 0.90 (Exceptional)
  • Contextual Dimensions: 0.88 (Advanced)
  • Functional Dimensions: 0.92 (Exceptional)
  • Ethical Dimensions: 0.85 (Advanced)
  • Meta-Recursive Dimensions: 0.87 (Advanced)
  • Overall Score: 0.88 (Advanced)

Key Improvements

  • Cognitive: Added explicit context definition and constraints
  • Epistemic: Implemented semantic drift constraint
  • Contextual: Used formal PRP structure for context binding
  • Functional: Specified comprehensive goal
  • Ethical: Added factual accuracy constraint
  • Meta-Recursive: Used standardized format for cross-component integration