File skill-llm-eval.test.ts

test/skill-llm-eval.test.ts:None–None · view source on GitHub ↗

Source from the content-addressed store, hash-verified

1	/**
2	* LLM-as-a-Judge evals for generated SKILL.md quality.
3	*
4	* Uses the Anthropic API directly (not Agent SDK) to evaluate whether

nothing calls this directly

detectBaseBranchFunction · 0.90

getChangedFilesFunction · 0.90

selectTestsFunction · 0.90

judgeFunction · 0.90

extractGrepLinesFunction · 0.85

runWorkflowJudgeFunction · 0.85

addTestMethod · 0.80

createMethod · 0.80

finalizeMethod · 0.80

describeIfSelectedFunction · 0.70

testIfSelectedFunction · 0.70

pushMethod · 0.45

no test coverage detected