Basic Agent Evaluation Runner
Login with Hugging Face.
In TEST_MODE=1, this runs one random question only.
Change TEST_MODE=0 for full evaluation and submission.
Sign in with Hugging Face
Run Evaluation & Submit All Answers
Run Status / Submission Result
Questions and Agent Answers
Questions and Agent Answers
1
⋮
2
⋮
3
⋮
1
⋮
2
⋮
3
⋮