Job Description
Duration:- 03-month(Likely extension)
Role Overview:-
We are looking for a Research Assistant with 2+ years of experience in prototyping and testing AI agents or large language models (LLMs). You will design test prompts, experiment with prompt engineering, and debug AI agent tool calls within a Python/PHP software stack. You’ll also help create internal benchmarks to evaluate AI agent performance.
Key Responsibilities:-
* Create and refine test prompts to guide AI agents toward desired behavior.
* Implement and troubleshoot AI agent tool calls in a Python/PHP environment.
* Develop high-quality prompts to build internal evaluation benchmarks for AI agents.
* Test AI agents to assess their ability to perform tasks such as ordering, scheduling, or cancelling meetings.
* Analyze test outcomes, identify issues, and communicate findings for continuous improvement.
* Navigate and understand the Python codebase to correlate test results with underlying code.
* Improving and testing existing AI agents.
* Identifying where agents perform well and where they fail.
* Prompt testing and tuning to optimize agent responses.
* Flagging test results as pass/fail based on expected behavior.
Technical Skills ...