The Fastest Way to Deploy Your AI Agent to AWS Lambda

You’ve built an AI agent. You need it to get to the cloud in 10 minutes.

Why Lambda for AI Agents

AI agents calling external LLM APIs (OpenAI, Bedrock, Anthropic) share a key trait: they’re computationally lightweight. The LLM provider does the heavy lifting. Your agent just orchestrates.

This makes Lambda ideal:

  • Zero idle cost — pay only for actual requests
  • Auto-scaling — handles spikes without capacity planning
  • No operations — no servers to maintain

The Problem

A deployment artifact comes from three inputs:

Artifact = Build(SourceCode, Dependencies, Environment)
Input Description
SourceCode Your Python files
Dependencies Packages from lockfile
Environment OS where pip install runs

When you pip install on macOS, pip downloads macOS binaries. Packages with C extensions (pydantic-core, numpy, orjson) contain platform-specific code.

Copy these to Lambda’s Amazon Linux → crash.

Root cause: Environment is treated as implicit, assumed identical between dev and prod.

The Solution: Clean Room Pattern

Make Environment a constant by building inside Lambda’s environment:

Artifact = Build(SourceCode, Dependencies, Lambda_Linux)

AWS publishes official Lambda images. Build inside them:

public.ecr.aws/lambda/python:3.12

If it builds in this container, it runs on Lambda. Guaranteed.

Architecture

┌─────────────────────────────────────┐
│            AWS Lambda               │
├─────────────────────────────────────┤
│     Mangum (ASGI → Lambda)          │
├─────────────────────────────────────┤
│     FastAPI (HTTP Interface)        │
├─────────────────────────────────────┤
│     Agent + OpenAI SDK              │
└─────────────────────────────────────┘
Component Role
FastAPI HTTP interface, request validation, OpenAPI docs
Mangum Adapts ASGI to Lambda event format
OpenAI SDK Unified LLM interface (supports OpenAI + Bedrock)

Project Structure

my-agent/
├── app/
│   ├── __init__.py
│   ├── main.py        # FastAPI + Lambda handler
│   ├── agent.py       # Agent logic
│   └── config.py      # Configuration
├── pyproject.toml
├── uv.lock
├── build.sh
└── terraform/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── terraform.tfvars   # Secrets (git-ignored)

Implementation

Configuration

# app/config.py
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    openai_api_key: str
    openai_base_url: str | None = None  # Set for Bedrock
    model_name: str = "gpt-4o-mini"


settings = Settings()

Agent

Minimal RAG pattern — replace in-memory store with vector DB in production.

# app/agent.py
from openai import OpenAI
from .config import settings

client = OpenAI(
    api_key=settings.openai_api_key,
    base_url=settings.openai_base_url,
)

KNOWLEDGE_BASE = [
    "Founded in 2020.",
    "Pricing: Free, Pro ($29/mo), Enterprise.",
    "Support: 9am-5pm EST, Mon-Fri.",
]


def retrieve(query: str, top_k: int = 2) -> list[str]:
    return KNOWLEDGE_BASE[:top_k]


def answer(question: str) -> str:
    context = "\n".join(f"- {c}" for c in retrieve(question))
    response = client.chat.completions.create(
        model=settings.model_name,
        messages=[
            {"role": "system", "content": f"Answer based on:\n{context}"},
            {"role": "user", "content": question},
        ],
        max_tokens=500,
    )
    return response.choices[0].message.content

API

# app/main.py
from fastapi import FastAPI
from mangum import Mangum
from pydantic import BaseModel
from .agent import answer

app = FastAPI(title="AI Agent")


class Query(BaseModel):
    question: str


class Response(BaseModel):
    answer: str


@app.post("/query", response_model=Response)
def query(q: Query) -> Response:
    return Response(answer=answer(q.question))


@app.get("/health")
def health():
    return {"status": "healthy"}


# Lambda entry point
handler = Mangum(app, lifespan="off")

Build Script

Implements the Clean Room Pattern:

#!/bin/bash
# build.sh
set -e

rm -rf package deployment.zip requirements.txt

# Export dependencies
uv export --frozen --no-dev --no-editable -o requirements.txt

# Build in Lambda environment (Clean Room)
docker run --rm \
    -v "$(pwd)":/var/task \
    -w /var/task \
    public.ecr.aws/lambda/python:3.12 \
    bash -c "pip install -r requirements.txt -t package/ -q"

# Package
cd package && zip -rq ../deployment.zip . && cd ..
zip -rq deployment.zip app/

rm -rf package requirements.txt
echo "Created deployment.zip ($(du -h deployment.zip | cut -f1))"

Terraform

Main configuration:

# terraform/main.tf
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

provider "aws" {
  region = var.aws_region
}

# IAM Role
resource "aws_iam_role" "lambda" {
  name = "${var.function_name}-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "basic" {
  role       = aws_iam_role.lambda.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

# Lambda Function
resource "aws_lambda_function" "agent" {
  filename         = "${path.module}/../deployment.zip"
  function_name    = var.function_name
  role             = aws_iam_role.lambda.arn
  handler          = "app.main.handler"
  runtime          = "python3.12"
  timeout          = 30
  memory_size      = 256
  source_code_hash = filebase64sha256("${path.module}/../deployment.zip")

  environment {
    variables = {
      OPENAI_API_KEY  = var.openai_api_key
      OPENAI_BASE_URL = var.openai_base_url
      MODEL_NAME      = var.model_name
    }
  }
}

# Public URL
resource "aws_lambda_function_url" "agent" {
  function_name      = aws_lambda_function.agent.function_name
  authorization_type = "NONE"
}

Variables:

# terraform/variables.tf
variable "aws_region"       { default = "us-west-2" }
variable "function_name"    { default = "ai-agent" }
variable "openai_api_key"   { sensitive = true }
variable "openai_base_url"  { default = "" }
variable "model_name"       { default = "gpt-4o-mini" }

Outputs:

# terraform/outputs.tf
output "endpoint" {
  value = aws_lambda_function_url.agent.function_url
}

Secrets (git-ignored):

# terraform/terraform.tfvars
openai_api_key = "sk-..."

Deploy:

cd terraform
terraform init
terraform apply

Switching to Bedrock

The OpenAI SDK supports Bedrock through compatible API endpoints.

Update terraform.tfvars:

openai_api_key  = "your-bedrock-api-key"
openai_base_url = "https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"
model_name      = "anthropic.claude-3-haiku-20240307-v1:0"

Add Bedrock permissions to main.tf:

resource "aws_iam_role_policy" "bedrock" {
  name = "${var.function_name}-bedrock"
  role = aws_iam_role.lambda.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["bedrock:InvokeModel"]
      Resource = "*"
    }]
  })
}

No code changes required — same agent, different provider.

Zip vs Container

Approach Size Limit Cold Start Use When
Zip 250 MB Fast (~100-500ms) API-calling agents (most cases)
Container 10 GB Slower (~500ms-2s) Bundled ML models, heavy deps

Typical AI agent deployment: <20 MB. Zip is the right choice.

Summary

Step Action
1 Structure code: app/ for logic, root for infra
2 Build in Clean Room: docker run with Lambda image
3 Deploy with Terraform: terraform apply
4 Switch providers: update terraform.tfvars