AI and Advanced Patterns

Cloudflare Workers are not limited to just JSON APIs and CRUD applications. The platform natively supports high-performance WebSockets and Serverless GPU inference via Workers AI.

Learning Focus

By the end of this module, you will understand how to spin up open-source LLMs (Large Language Models) natively inside a Worker, and how to maintain long-lived WebSocket connections.

1. Workers AI

Workers AI provides serverless inference on Cloudflare's global fleet of GPUs. This allows you to generate text (via LLaMa models), generate images (Stable Diffusion), or perform sentiment analysis immediately, without paying for expensive, idle AWS GPU instances.

Binding AI

First, add the AI binding to your wrangler.toml:

wrangler.toml
[ai]
binding = "AI"

And update your TypeScript environment:

src/index.ts
export interface Env {
  AI: any; // The precise AI type binding varies, 'any' is acceptable for quick setup
}

Running an LLM (Text Generation)

Here is a complete Worker that interprets user input via a LLaMa-3 model:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const query = url.searchParams.get("prompt") || "Tell me a joke about Cloudflare.";

    const messages = [
      { role: "system", content: "You are a helpful, brief assistant." },
      { role: "user", content: query }
    ];

    // Execute the model on the Edge GPU network
    const response = await env.AI.run(
      "@cf/meta/llama-3-8b-instruct", 
      { messages }
    );

    return Response.json(response);
  }
};

Image Classification

Run inference on an image to see what it contains:

export default {
  async fetch(request, env) {
    // Expecting an image binary in the POST body
    const imageBlob = await request.arrayBuffer();

    const input = {
      image: [...new Uint8Array(imageBlob)]
    };

    const response = await env.AI.run(
      "@cf/microsoft/resnet-50",
      input
    );

    // Returns an array of labels with confidence scores (e.g. "Cat: 0.99")
    return Response.json(response);
  }
};

2. Vectorize (Vector Database)

Vectorize is Cloudflare's globally distributed Vector database, explicitly designed to power AI search (RAG - Retrieval Augmented Generation).

Instead of matching strings (SELECT * WHERE name = "Bob"), Vector databases search via mathematical proximity (e.g., matching the concept of "Happy" to a stored text block about "Joy").

Usage via CLI

Create an embedding using Workers AI.
Insert that embedding into Vectorize:

// Querying the vector database
const queryVector = await env.AI.run("@cf/baai/bge-small-en-v1.5", {
  text: "How do I reset my password?"
});

const matches = await env.VECTOR_DB.query(queryVector.data[0], { topK: 3 });

return Response.json(matches); // Returns the IDs of the closest matching documents

3. WebSockets

Because a Worker Isolate handles the Fetch API identically to a browser, setting up native WebSockets is trivial. This allows for real-time bidirectional communication.

The Worker "upgrades" a standard HTTP connection into a persistent WebSocket.

export default {
  async fetch(request, env, ctx) {
    const upgradeHeader = request.headers.get('Upgrade');
    if (!upgradeHeader || upgradeHeader !== 'websocket') {
      return new Response('Expected Upgrade: websocket', { status: 426 });
    }

    // Create the WebSocket pair
    const webSocketPair = new WebSocketPair();
    const [client, server] = Object.values(webSocketPair);

    // Accept the connection on the server side
    server.accept();

    server.addEventListener('message', async event => {
      console.log('Received:', event.data);
      // Echo the message back reversed!
      server.send(String(event.data).split("").reverse().join(""));
    });

    server.addEventListener('close', event => {
      console.log('WebSocket closed');
    });

    // Return the response with status 101 Switching Protocols
    return new Response(null, {
      status: 101,
      webSocket: client, // Pass the client side back to the user's browser
    });
  }
};

Time Limits

Standard Workers have aggressive idle connection limits. If a WebSocket is inactive for too long, it is killed. For highly active, persistent WebSockets (like intense multiplayer games), you must use Durable Objects instead.

What's Next

Now that we have covered the full breadth of the Worker's capability, from standard routing to LLM integration, we must cover how to support the application when things go wrong in production.

Proceed to the final module: Module 11: Operations and Troubleshooting.

1. Workers AI​

Binding AI​

Running an LLM (Text Generation)​

Image Classification​

2. Vectorize (Vector Database)​

Usage via CLI​

3. WebSockets​

What's Next​