Basic Agent Evaluation Runner

  1. Login with Hugging Face.
  2. In TEST_MODE=1, this runs one random question only.
  3. Change TEST_MODE=0 for full evaluation and submission.

Questions and Agent Answers

Questions and Agent Answers