SkillArena
SKILL ARENA: Project Manager Skill
Performance Evaluation Design v1.0
| Metric | Evaluation Target & Methodology |
|---|---|
| COHERENCE | Analyzes output coherence and narrative consistency. Utilizes a secondary "Judge LLM" to detect semantic drift, contradictory statements, or non-sequitur logic loops in long-form generation. |
| RUBRICS | A LLM-driven evaluation layer check based on SME generated grading rubrics for PM performance. Penalizes scores based on the count of missed performance or violated negative constraints (e.g., "Do not discuss medical advice"). |
| CITATIONS | Calculates the grounding accuracy via count of assertions or discussions based on validated citations. |
| ENTITY MATCH | Named Entity Recognition check. Validates the accuracy of all Proper Nouns, Dates, and Technical Specifications against the input data. Score drop for incorrect or hallucinated entities (e.g., "15,000" vs "150,00"). |