You’ve built an AI agent. You need it to get to the cloud in 10 minutes.

Why Lambda for AI Agents

AI agents calling external LLM APIs (OpenAI, Bedrock, Anthropic) share a key trait: they’re computationally lightweight. The LLM provider does the heavy lifting. Your agent just orchestrates.

This makes Lambda ideal:

  • Zero idle cost — pay only for actual requests
  • Auto-scaling — handles spikes without capacity planning
  • No operations — no servers to maintain

The Problem

A deployment artifact comes from three inputs:

Artifact = Build(SourceCode, Dependencies, Environment)
Input Description
SourceCode Your Python files
Dependencies Packages from lockfile
Environment OS where pip install runs

When you pip install on macOS, pip downloads macOS binaries. Packages with C extensions (pydantic-core, numpy, orjson) contain platform-specific code.

Copy these to Lambda’s Amazon Linux → crash.

Root cause: Environment is treated as implicit, assumed identical between dev and prod.

The Solution: Clean Room Pattern

Make Environment a constant by building inside Lambda’s environment:

Artifact = Build(SourceCode, Dependencies, Lambda_Linux)

AWS publishes official Lambda images. Build inside them:

public.ecr.aws/lambda/python:3.12

If it builds in this container, it runs on Lambda. Guaranteed.

Architecture

┌─────────────────────────────────────┐
│            AWS Lambda               │
├─────────────────────────────────────┤
│     Mangum (ASGI → Lambda)          │
├─────────────────────────────────────┤
│     FastAPI (HTTP Interface)        │
├─────────────────────────────────────┤
│     Agent + OpenAI SDK              │
└─────────────────────────────────────┘
Component Role
FastAPI HTTP interface, request validation, OpenAPI docs
Mangum Adapts ASGI to Lambda event format
OpenAI SDK Unified LLM interface (supports OpenAI + Bedrock)

Project Structure

my-agent/
├── app/
│   ├── __init__.py
│   ├── main.py        # FastAPI + Lambda handler
│   ├── agent.py       # Agent logic
│   └── config.py      # Configuration
├── pyproject.toml
├── uv.lock
├── build.sh
└── terraform/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── terraform.tfvars   # Secrets (git-ignored)

Implementation

Configuration

# app/config.py
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    openai_api_key: str
    openai_base_url: str | None = None  # Set for Bedrock
    model_name: str = "gpt-4o-mini"


settings = Settings()

Agent

Minimal RAG pattern — replace in-memory store with vector DB in production.

# app/agent.py
from openai import OpenAI
from .config import settings

client = OpenAI(
    api_key=settings.openai_api_key,
    base_url=settings.openai_base_url,
)

KNOWLEDGE_BASE = [
    "Founded in 2020.",
    "Pricing: Free, Pro ($29/mo), Enterprise.",
    "Support: 9am-5pm EST, Mon-Fri.",
]


def retrieve(query: str, top_k: int = 2) -> list[str]:
    return KNOWLEDGE_BASE[:top_k]


def answer(question: str) -> str:
    context = "\n".join(f"- {c}" for c in retrieve(question))
    response = client.chat.completions.create(
        model=settings.model_name,
        messages=[
            {"role": "system", "content": f"Answer based on:\n{context}"},
            {"role": "user", "content": question},
        ],
        max_tokens=500,
    )
    return response.choices[0].message.content

API

# app/main.py
from fastapi import FastAPI
from mangum import Mangum
from pydantic import BaseModel
from .agent import answer

app = FastAPI(title="AI Agent")


class Query(BaseModel):
    question: str


class Response(BaseModel):
    answer: str


@app.post("/query", response_model=Response)
def query(q: Query) -> Response:
    return Response(answer=answer(q.question))


@app.get("/health")
def health():
    return {"status": "healthy"}


# Lambda entry point
handler = Mangum(app, lifespan="off")

Build Script

Implements the Clean Room Pattern:

#!/bin/bash
# build.sh
set -e

rm -rf package deployment.zip requirements.txt

# Export dependencies
uv export --frozen --no-dev --no-editable -o requirements.txt

# Build in Lambda environment (Clean Room)
docker run --rm \
    -v "$(pwd)":/var/task \
    -w /var/task \
    public.ecr.aws/lambda/python:3.12 \
    bash -c "pip install -r requirements.txt -t package/ -q"

# Package
cd package && zip -rq ../deployment.zip . && cd ..
zip -rq deployment.zip app/

rm -rf package requirements.txt
echo "Created deployment.zip ($(du -h deployment.zip | cut -f1))"

Terraform

Main configuration:

# terraform/main.tf
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

provider "aws" {
  region = var.aws_region
}

# IAM Role
resource "aws_iam_role" "lambda" {
  name = "${var.function_name}-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "basic" {
  role       = aws_iam_role.lambda.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

# Lambda Function
resource "aws_lambda_function" "agent" {
  filename         = "${path.module}/../deployment.zip"
  function_name    = var.function_name
  role             = aws_iam_role.lambda.arn
  handler          = "app.main.handler"
  runtime          = "python3.12"
  timeout          = 30
  memory_size      = 256
  source_code_hash = filebase64sha256("${path.module}/../deployment.zip")

  environment {
    variables = {
      OPENAI_API_KEY  = var.openai_api_key
      OPENAI_BASE_URL = var.openai_base_url
      MODEL_NAME      = var.model_name
    }
  }
}

# Public URL
resource "aws_lambda_function_url" "agent" {
  function_name      = aws_lambda_function.agent.function_name
  authorization_type = "NONE"
}

Variables:

# terraform/variables.tf
variable "aws_region"       { default = "us-west-2" }
variable "function_name"    { default = "ai-agent" }
variable "openai_api_key"   { sensitive = true }
variable "openai_base_url"  { default = "" }
variable "model_name"       { default = "gpt-4o-mini" }

Outputs:

# terraform/outputs.tf
output "endpoint" {
  value = aws_lambda_function_url.agent.function_url
}

Secrets (git-ignored):

# terraform/terraform.tfvars
openai_api_key = "sk-..."

Deploy:

cd terraform
terraform init
terraform apply

Switching to Bedrock

The OpenAI SDK supports Bedrock through compatible API endpoints.

Update terraform.tfvars:

openai_api_key  = "your-bedrock-api-key"
openai_base_url = "https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"
model_name      = "anthropic.claude-3-haiku-20240307-v1:0"

Add Bedrock permissions to main.tf:

resource "aws_iam_role_policy" "bedrock" {
  name = "${var.function_name}-bedrock"
  role = aws_iam_role.lambda.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["bedrock:InvokeModel"]
      Resource = "*"
    }]
  })
}

No code changes required — same agent, different provider.

Zip vs Container

Approach Size Limit Cold Start Use When
Zip 250 MB Fast (~100-500ms) API-calling agents (most cases)
Container 10 GB Slower (~500ms-2s) Bundled ML models, heavy deps

Typical AI agent deployment: <20 MB. Zip is the right choice.

Summary

Step Action
1 Structure code: app/ for logic, root for infra
2 Build in Clean Room: docker run with Lambda image
3 Deploy with Terraform: terraform apply
4 Switch providers: update terraform.tfvars