Coding Agent
Autonomous test-driven development agent that writes, tests, and iteratively fixes Python code until all tests pass.
Quick Start
agnt5 create --template python/coding_agent_agnt5
export GROQ_API_KEY=gsk_... E2B_API_KEY=...
agnt5 dev up
What You Can Build
- Algorithm Implementations: Solve LeetCode problems, implement data structures, or build utility functions
- API Clients: Generate complete Python modules with tests for third-party API integrations
- Data Processing Scripts: Create ETL pipelines, parsers, or validators with full test coverage
Installation
Prerequisites
- Python 3.12+
- AGNT5 SDK
- Groq API key (for LLM)
- E2B API key (for code execution sandbox)
Setup
# Clone or create from template
agnt5 create --template python/coding_agent_agnt5
cd coding_agent_agnt5
# Install dependencies
uv sync
# Configure environment variables
export GROQ_API_KEY=gsk_your_groq_api_key
export E2B_API_KEY=your_e2b_api_key
# Start the worker
agnt5 dev up
Get API keys:
Usage
Via Workflow Client
Call the workflow programmatically:
import asyncio
from coding_agent_agnt5.workflows import coding_agent_workflow
from agnt5.entity import with_entity_context
@with_entity_context
async def main():
task = """
Create a function that validates whether a string is a valid number.
Support integers, decimals, and scientific notation.
"""
result = await coding_agent_workflow(
task_description=task,
max_retries=15
)
if result.success:
print(f"Code:\n{result.code}")
print(f"Tests:\n{result.tests}")
print(f"Iterations: {result.iterations}")
print(f"Documentation:\n{result.documentation}")
else:
print(f"Failed: {result.error}")
asyncio.run(main())
Example Output
{
"success": True,
"task": "Create a function that validates...",
"iterations": 3,
"code": "def is_valid_number(s: str) -> bool:\n ...",
"tests": "import pytest\n\ndef test_valid_numbers():\n ...",
"sandbox_id": "e2b-sandbox-xyz123",
"documentation": "# Valid Number Validator\n\n## Overview...",
"error": None
}
The workflow automatically:
- Generates Python code from task description
- Creates comprehensive pytest test suite
- Runs tests in isolated E2B sandbox
- Analyzes failures and fixes code iteratively
- Produces final documentation in markdown
Configuration
Environment Variables
| Variable | Description | Required |
|---|---|---|
GROQ_API_KEY |
Groq API key for LLM access | Yes |
E2B_API_KEY |
E2B API key for code sandbox | Yes |
Workflow Parameters
coding_agent_workflow(
task_description: str, # Coding task description
max_retries: int = 15 # Max iterations for fixing code
)
Models Used
- Planner/Analyzer:
llama-4-scout-17b-16e-instruct(planning and error analysis) - Code Generator:
llama-4-maverick-17b-128e-instruct(code generation and fixes)
Architecture
Multi-Function Workflow
The workflow orchestrates seven specialized function nodes:
1. Planner Node
- Input: Task description
- Output: Development plan + test plan
- Model: llama-4-scout-17b-16e-instruct
- Role: Creates structured plans for implementation and testing
2. Code Generator Node
- Input: Task, dev plan, (optional) error analysis
- Output: Python code
- Model: llama-4-maverick-17b-128e-instruct
- Role: Generates or fixes code based on plan and failures
3. Test Generator Node
- Input: Task, test plan
- Output: Pytest test suite
- Model: llama-4-maverick-17b-128e-instruct
- Role: Creates comprehensive test coverage
4. Code Sync Node
- Input: Main code, test code, sandbox ID
- Output: Sync status, sandbox ID
- Role: Uploads code files to E2B sandbox
5. Code Executor Node
- Input: Sandbox ID
- Output: Test results, error logs, next action
- Tools: E2B sandbox (
run_command,read_file) - Role: Runs pytest and analyzes results
6. Error Analyzer Node
- Input: Task, code, tests, error logs
- Output: Error analysis with root causes and suggestions
- Model: llama-4-scout-17b-16e-instruct
- Role: Deep analysis of test failures to guide fixes
7. Final Response Node
- Input: Task, generated code
- Output: Markdown documentation
- Role: Generates comprehensive documentation
Workflow Steps
1. Planning
└─> Analyze task description
└─> Create development plan
└─> Create test plan
2. Generation (Iteration 1)
└─> [Parallel] Generate code + tests
└─> Sync to E2B sandbox
└─> Execute tests
3. Fix Loop (Iterations 2-15)
└─> Analyze test failures
└─> Fix code based on analysis
└─> Sync updated code
└─> Execute tests
└─> Repeat until success or max retries
4. Documentation
└─> Generate markdown docs
└─> Save to final_response.md
└─> Return results
Iterative Refinement
- First iteration generates code and tests in parallel
- Subsequent iterations analyze failures and fix code
- Each iteration includes:
- Error analysis (identifies root causes)
- Code fixing (targets specific issues)
- Test execution (validates fixes)
- Loop terminates on success or after 15 iterations
E2B Sandbox Isolation
All code execution happens in E2B sandboxes:
- Isolated Python 3.12 environment
- Pre-installed pytest
- File system access for code/test uploads
- Command execution for running tests
- Prevents host system contamination
State Management
Workflow state tracks:
task_description: Original taskdev_plan,test_plan: Planning outputsgenerated_code,generated_tests: Latest code versionsexecution_status: Test execution stateerror_logs: Failure detailserror_analysis: Analysis resultssandbox_id: E2B sandbox identifierretries: Current iteration count
Troubleshooting
Missing API keys
ValueError: Missing required environment variables: GROQ_API_KEY, E2B_API_KEY
Solution: Export both GROQ_API_KEY and E2B_API_KEY before running.
E2B sandbox creation failed
Error: Failed to create E2B sandbox
Solution: Verify E2B API key is valid and your account has available quota at https://e2b.dev/dashboard.
Max retries reached
Workflow failed: Maximum retries (15) exhausted
Solution: The agent couldn't fix all test failures within 15 iterations. Simplify the task, increase max_retries, or inspect logs to see what's failing.
Groq rate limits
Error: Rate limit exceeded
Solution: Wait and retry, or upgrade your Groq plan for higher rate limits.
Import errors in generated code
Check that the E2B sandbox has required dependencies. Modify code_sync_node in src/coding_agent_agnt5/functions.py to install packages via pip before running tests.
Customization
Change LLM Models
Modify model selection in src/coding_agent_agnt5/functions.py:
# For planning and error analysis (planner_node, error_analyzer_node)
model="groq/meta-llama/llama-4-scout-17b-16e-instruct"
# For code generation (code_generator_node, test_generator_node)
model="groq/meta-llama/llama-4-maverick-17b-128e-instruct"
You can switch to other Groq models by updating the model parameter in each function node.
Adjust Retry Logic
Change max iterations in workflow call:
result = await coding_agent_workflow(
task_description=task,
max_retries=25 # Increase from default 15
)
Add Custom Tools
Extend E2B tools in src/coding_agent_agnt5/tools.py:
@tool(auto_schema=True)
async def install_package(ctx: Context, sandbox_id: str, package: str) -> dict:
"""Install a Python package in the sandbox."""
# Implementation
pass
Register in app.py:
worker = Worker(
tools=[
E2BSandboxTools.create_sandbox,
E2BSandboxTools.write_file,
E2BSandboxTools.run_command,
E2BSandboxTools.read_file,
install_package, # Add custom tool
],
...
)
Customize Test Framework
Modify test generation prompts in src/coding_agent_agnt5/prompts/coding_agent_prompts.py to use unittest, doctest, or other frameworks instead of pytest.
Related Templates
- code_reviewer: AI-powered code review with security and quality analysis
- text-to-sql: Multi-step reasoning workflows with validation
- weather-agent: Tool-based agentic workflows
License
MIT License - see LICENSE for details