Back to Templates

Coding Agent

Autonomous test-driven development agent with E2B sandbox

Get Code
coding & dev automation Python agents tooling workflows integrations

Coding Agent

Autonomous test-driven development agent that writes, tests, and iteratively fixes Python code until all tests pass.

Quick Start

agnt5 create --template python/coding_agent_agnt5
export GROQ_API_KEY=gsk_... E2B_API_KEY=...
agnt5 dev up

What You Can Build

  • Algorithm Implementations: Solve LeetCode problems, implement data structures, or build utility functions
  • API Clients: Generate complete Python modules with tests for third-party API integrations
  • Data Processing Scripts: Create ETL pipelines, parsers, or validators with full test coverage

Installation

Prerequisites

  • Python 3.12+
  • AGNT5 SDK
  • Groq API key (for LLM)
  • E2B API key (for code execution sandbox)

Setup

# Clone or create from template
agnt5 create --template python/coding_agent_agnt5
cd coding_agent_agnt5

# Install dependencies
uv sync

# Configure environment variables
export GROQ_API_KEY=gsk_your_groq_api_key
export E2B_API_KEY=your_e2b_api_key

# Start the worker
agnt5 dev up

Get API keys:

Usage

Via Workflow Client

Call the workflow programmatically:

import asyncio
from coding_agent_agnt5.workflows import coding_agent_workflow
from agnt5.entity import with_entity_context

@with_entity_context
async def main():
    task = """
    Create a function that validates whether a string is a valid number.
    Support integers, decimals, and scientific notation.
    """

    result = await coding_agent_workflow(
        task_description=task,
        max_retries=15
    )

    if result.success:
        print(f"Code:\n{result.code}")
        print(f"Tests:\n{result.tests}")
        print(f"Iterations: {result.iterations}")
        print(f"Documentation:\n{result.documentation}")
    else:
        print(f"Failed: {result.error}")

asyncio.run(main())

Example Output

{
    "success": True,
    "task": "Create a function that validates...",
    "iterations": 3,
    "code": "def is_valid_number(s: str) -> bool:\n    ...",
    "tests": "import pytest\n\ndef test_valid_numbers():\n    ...",
    "sandbox_id": "e2b-sandbox-xyz123",
    "documentation": "# Valid Number Validator\n\n## Overview...",
    "error": None
}

The workflow automatically:

  • Generates Python code from task description
  • Creates comprehensive pytest test suite
  • Runs tests in isolated E2B sandbox
  • Analyzes failures and fixes code iteratively
  • Produces final documentation in markdown

Configuration

Environment Variables

Variable Description Required
GROQ_API_KEY Groq API key for LLM access Yes
E2B_API_KEY E2B API key for code sandbox Yes

Workflow Parameters

coding_agent_workflow(
    task_description: str,  # Coding task description
    max_retries: int = 15   # Max iterations for fixing code
)

Models Used

  • Planner/Analyzer: llama-4-scout-17b-16e-instruct (planning and error analysis)
  • Code Generator: llama-4-maverick-17b-128e-instruct (code generation and fixes)
Architecture

Multi-Function Workflow

The workflow orchestrates seven specialized function nodes:

1. Planner Node

  • Input: Task description
  • Output: Development plan + test plan
  • Model: llama-4-scout-17b-16e-instruct
  • Role: Creates structured plans for implementation and testing

2. Code Generator Node

  • Input: Task, dev plan, (optional) error analysis
  • Output: Python code
  • Model: llama-4-maverick-17b-128e-instruct
  • Role: Generates or fixes code based on plan and failures

3. Test Generator Node

  • Input: Task, test plan
  • Output: Pytest test suite
  • Model: llama-4-maverick-17b-128e-instruct
  • Role: Creates comprehensive test coverage

4. Code Sync Node

  • Input: Main code, test code, sandbox ID
  • Output: Sync status, sandbox ID
  • Role: Uploads code files to E2B sandbox

5. Code Executor Node

  • Input: Sandbox ID
  • Output: Test results, error logs, next action
  • Tools: E2B sandbox (run_command, read_file)
  • Role: Runs pytest and analyzes results

6. Error Analyzer Node

  • Input: Task, code, tests, error logs
  • Output: Error analysis with root causes and suggestions
  • Model: llama-4-scout-17b-16e-instruct
  • Role: Deep analysis of test failures to guide fixes

7. Final Response Node

  • Input: Task, generated code
  • Output: Markdown documentation
  • Role: Generates comprehensive documentation

Workflow Steps

1. Planning
   └─> Analyze task description
   └─> Create development plan
   └─> Create test plan

2. Generation (Iteration 1)
   └─> [Parallel] Generate code + tests
   └─> Sync to E2B sandbox
   └─> Execute tests

3. Fix Loop (Iterations 2-15)
   └─> Analyze test failures
   └─> Fix code based on analysis
   └─> Sync updated code
   └─> Execute tests
   └─> Repeat until success or max retries

4. Documentation
   └─> Generate markdown docs
   └─> Save to final_response.md
   └─> Return results

Iterative Refinement

  • First iteration generates code and tests in parallel
  • Subsequent iterations analyze failures and fix code
  • Each iteration includes:
    1. Error analysis (identifies root causes)
    2. Code fixing (targets specific issues)
    3. Test execution (validates fixes)
  • Loop terminates on success or after 15 iterations

E2B Sandbox Isolation

All code execution happens in E2B sandboxes:

  • Isolated Python 3.12 environment
  • Pre-installed pytest
  • File system access for code/test uploads
  • Command execution for running tests
  • Prevents host system contamination

State Management

Workflow state tracks:

  • task_description: Original task
  • dev_plan, test_plan: Planning outputs
  • generated_code, generated_tests: Latest code versions
  • execution_status: Test execution state
  • error_logs: Failure details
  • error_analysis: Analysis results
  • sandbox_id: E2B sandbox identifier
  • retries: Current iteration count

Troubleshooting

Missing API keys

ValueError: Missing required environment variables: GROQ_API_KEY, E2B_API_KEY

Solution: Export both GROQ_API_KEY and E2B_API_KEY before running.

E2B sandbox creation failed

Error: Failed to create E2B sandbox

Solution: Verify E2B API key is valid and your account has available quota at https://e2b.dev/dashboard.

Max retries reached

Workflow failed: Maximum retries (15) exhausted

Solution: The agent couldn't fix all test failures within 15 iterations. Simplify the task, increase max_retries, or inspect logs to see what's failing.

Groq rate limits

Error: Rate limit exceeded

Solution: Wait and retry, or upgrade your Groq plan for higher rate limits.

Import errors in generated code

Check that the E2B sandbox has required dependencies. Modify code_sync_node in src/coding_agent_agnt5/functions.py to install packages via pip before running tests.

Customization

Change LLM Models

Modify model selection in src/coding_agent_agnt5/functions.py:

# For planning and error analysis (planner_node, error_analyzer_node)
model="groq/meta-llama/llama-4-scout-17b-16e-instruct"

# For code generation (code_generator_node, test_generator_node)
model="groq/meta-llama/llama-4-maverick-17b-128e-instruct"

You can switch to other Groq models by updating the model parameter in each function node.

Adjust Retry Logic

Change max iterations in workflow call:

result = await coding_agent_workflow(
    task_description=task,
    max_retries=25  # Increase from default 15
)

Add Custom Tools

Extend E2B tools in src/coding_agent_agnt5/tools.py:

@tool(auto_schema=True)
async def install_package(ctx: Context, sandbox_id: str, package: str) -> dict:
    """Install a Python package in the sandbox."""
    # Implementation
    pass

Register in app.py:

worker = Worker(
    tools=[
        E2BSandboxTools.create_sandbox,
        E2BSandboxTools.write_file,
        E2BSandboxTools.run_command,
        E2BSandboxTools.read_file,
        install_package,  # Add custom tool
    ],
    ...
)

Customize Test Framework

Modify test generation prompts in src/coding_agent_agnt5/prompts/coding_agent_prompts.py to use unittest, doctest, or other frameworks instead of pytest.

Related Templates

  • code_reviewer: AI-powered code review with security and quality analysis
  • text-to-sql: Multi-step reasoning workflows with validation
  • weather-agent: Tool-based agentic workflows

License

MIT License - see LICENSE for details