The Fastest Way to Deploy Your AI Agent to AWS Lambda

You’ve built an AI agent. You need it to get to the cloud in 10 minutes.

Why Lambda for AI Agents

AI agents calling external LLM APIs (OpenAI, Bedrock, Anthropic) share a key trait: they’re computationally lightweight. The LLM provider does the heavy lifting. Your agent just orchestrates.

This makes Lambda ideal:

Zero idle cost — pay only for actual requests
Auto-scaling — handles spikes without capacity planning
No operations — no servers to maintain

The Problem

A deployment artifact comes from three inputs:

Artifact = Build(SourceCode, Dependencies, Environment)

Input	Description
SourceCode	Your Python files
Dependencies	Packages from lockfile
Environment	OS where `pip install` runs

When you pip install on macOS, pip downloads macOS binaries. Packages with C extensions (pydantic-core, numpy, orjson) contain platform-specific code.

Copy these to Lambda’s Amazon Linux → crash.

Root cause: Environment is treated as implicit, assumed identical between dev and prod.

The Solution: Clean Room Pattern

Make Environment a constant by building inside Lambda’s environment:

Artifact = Build(SourceCode, Dependencies, Lambda_Linux)

AWS publishes official Lambda images. Build inside them:

public.ecr.aws/lambda/python:3.12

If it builds in this container, it runs on Lambda. Guaranteed.

Architecture

┌─────────────────────────────────────┐
│            AWS Lambda               │
├─────────────────────────────────────┤
│     Mangum (ASGI → Lambda)          │
├─────────────────────────────────────┤
│     FastAPI (HTTP Interface)        │
├─────────────────────────────────────┤
│     Agent + OpenAI SDK              │
└─────────────────────────────────────┘

Component	Role
FastAPI	HTTP interface, request validation, OpenAPI docs
Mangum	Adapts ASGI to Lambda event format
OpenAI SDK	Unified LLM interface (supports OpenAI + Bedrock)

Project Structure

my-agent/
├── app/
│   ├── __init__.py
│   ├── main.py        # FastAPI + Lambda handler
│   ├── agent.py       # Agent logic
│   └── config.py      # Configuration
├── pyproject.toml
├── uv.lock
├── build.sh
└── terraform/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── terraform.tfvars   # Secrets (git-ignored)

Implementation

Configuration

# app/config.py
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    openai_api_key: str
    openai_base_url: str | None = None  # Set for Bedrock
    model_name: str = "gpt-4o-mini"


settings = Settings()

Agent

Minimal RAG pattern — replace in-memory store with vector DB in production.

# app/agent.py
from openai import OpenAI
from .config import settings

client = OpenAI(
    api_key=settings.openai_api_key,
    base_url=settings.openai_base_url,
)

KNOWLEDGE_BASE = [
    "Founded in 2020.",
    "Pricing: Free, Pro ($29/mo), Enterprise.",
    "Support: 9am-5pm EST, Mon-Fri.",
]


def retrieve(query: str, top_k: int = 2) -> list[str]:
    return KNOWLEDGE_BASE[:top_k]


def answer(question: str) -> str:
    context = "\n".join(f"- {c}" for c in retrieve(question))
    response = client.chat.completions.create(
        model=settings.model_name,
        messages=[
            {"role": "system", "content": f"Answer based on:\n{context}"},
            {"role": "user", "content": question},
        ],
        max_tokens=500,
    )
    return response.choices[0].message.content

API

# app/main.py
from fastapi import FastAPI
from mangum import Mangum
from pydantic import BaseModel
from .agent import answer

app = FastAPI(title="AI Agent")


class Query(BaseModel):
    question: str


class Response(BaseModel):
    answer: str


@app.post("/query", response_model=Response)
def query(q: Query) -> Response:
    return Response(answer=answer(q.question))


@app.get("/health")
def health():
    return {"status": "healthy"}


# Lambda entry point
handler = Mangum(app, lifespan="off")

Build Script

Implements the Clean Room Pattern:

#!/bin/bash
# build.sh
set -e

rm -rf package deployment.zip requirements.txt

# Export dependencies
uv export --frozen --no-dev --no-editable -o requirements.txt

# Build in Lambda environment (Clean Room)
docker run --rm \
    -v "$(pwd)":/var/task \
    -w /var/task \
    public.ecr.aws/lambda/python:3.12 \
    bash -c "pip install -r requirements.txt -t package/ -q"

# Package
cd package && zip -rq ../deployment.zip . && cd ..
zip -rq deployment.zip app/

rm -rf package requirements.txt
echo "Created deployment.zip ($(du -h deployment.zip | cut -f1))"

Terraform

Main configuration:

# terraform/main.tf
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

provider "aws" {
  region = var.aws_region
}

# IAM Role
resource "aws_iam_role" "lambda" {
  name = "${var.function_name}-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "basic" {
  role       = aws_iam_role.lambda.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

# Lambda Function
resource "aws_lambda_function" "agent" {
  filename         = "${path.module}/../deployment.zip"
  function_name    = var.function_name
  role             = aws_iam_role.lambda.arn
  handler          = "app.main.handler"
  runtime          = "python3.12"
  timeout          = 30
  memory_size      = 256
  source_code_hash = filebase64sha256("${path.module}/../deployment.zip")

  environment {
    variables = {
      OPENAI_API_KEY  = var.openai_api_key
      OPENAI_BASE_URL = var.openai_base_url
      MODEL_NAME      = var.model_name
    }
  }
}

# Public URL
resource "aws_lambda_function_url" "agent" {
  function_name      = aws_lambda_function.agent.function_name
  authorization_type = "NONE"
}

Variables:

# terraform/variables.tf
variable "aws_region"       { default = "us-west-2" }
variable "function_name"    { default = "ai-agent" }
variable "openai_api_key"   { sensitive = true }
variable "openai_base_url"  { default = "" }
variable "model_name"       { default = "gpt-4o-mini" }

Outputs:

# terraform/outputs.tf
output "endpoint" {
  value = aws_lambda_function_url.agent.function_url
}

Secrets (git-ignored):

# terraform/terraform.tfvars
openai_api_key = "sk-..."

Deploy:

cd terraform
terraform init
terraform apply

Switching to Bedrock

The OpenAI SDK supports Bedrock through compatible API endpoints.

Update terraform.tfvars:

openai_api_key  = "your-bedrock-api-key"
openai_base_url = "https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"
model_name      = "anthropic.claude-3-haiku-20240307-v1:0"

Add Bedrock permissions to main.tf:

resource "aws_iam_role_policy" "bedrock" {
  name = "${var.function_name}-bedrock"
  role = aws_iam_role.lambda.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["bedrock:InvokeModel"]
      Resource = "*"
    }]
  })
}

No code changes required — same agent, different provider.

Zip vs Container

Approach	Size Limit	Cold Start	Use When
Zip	250 MB	Fast (~100-500ms)	API-calling agents (most cases)
Container	10 GB	Slower (~500ms-2s)	Bundled ML models, heavy deps

Typical AI agent deployment: <20 MB. Zip is the right choice.

Summary

Step	Action
1	Structure code: `app/` for logic, root for infra
2	Build in Clean Room: `docker run` with Lambda image
3	Deploy with Terraform: `terraform apply`
4	Switch providers: update `terraform.tfvars`

Share on

Twitter Facebook Google+ LinkedIn

Moss GU