Basic Agent Evaluation Runner

Special Considerations: Due to limitation issues, this code depend on local search engine and local speech to text model. Both run through docker, see the readme file." One can achieve similar result, by using Google search API and OpenAI Whisper API.

Instructions:

Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
Log in to your Hugging Face account using the button below. This uses your HF username for submission.
Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.

Disclaimers: Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions). This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.

Questions and Agent Answers

Questions and Agent Answers