AI and Advanced Patterns
Cloudflare Workers are not limited to just JSON APIs and CRUD applications. The platform natively supports high-performance WebSockets and Serverless GPU inference via Workers AI.
By the end of this module, you will understand how to spin up open-source LLMs (Large Language Models) natively inside a Worker, and how to maintain long-lived WebSocket connections.
1. Workers AI
Workers AI provides serverless inference on Cloudflare's global fleet of GPUs. This allows you to generate text (via LLaMa models), generate images (Stable Diffusion), or perform sentiment analysis immediately, without paying for expensive, idle AWS GPU instances.
Binding AI
First, add the AI binding to your wrangler.toml:
[ai]
binding = "AI"
And update your TypeScript environment:
export interface Env {
AI: any; // The precise AI type binding varies, 'any' is acceptable for quick setup
}
Running an LLM (Text Generation)
Here is a complete Worker that interprets user input via a LLaMa-3 model:
export default {
async fetch(request, env) {
const url = new URL(request.url);
const query = url.searchParams.get("prompt") || "Tell me a joke about Cloudflare.";
const messages = [
{ role: "system", content: "You are a helpful, brief assistant." },
{ role: "user", content: query }
];
// Execute the model on the Edge GPU network
const response = await env.AI.run(
"@cf/meta/llama-3-8b-instruct",
{ messages }
);
return Response.json(response);
}
};
Image Classification
Run inference on an image to see what it contains:
export default {
async fetch(request, env) {
// Expecting an image binary in the POST body
const imageBlob = await request.arrayBuffer();
const input = {
image: [...new Uint8Array(imageBlob)]
};
const response = await env.AI.run(
"@cf/microsoft/resnet-50",
input
);
// Returns an array of labels with confidence scores (e.g. "Cat: 0.99")
return Response.json(response);
}
};
2. Vectorize (Vector Database)
Vectorize is Cloudflare's globally distributed Vector database, explicitly designed to power AI search (RAG - Retrieval Augmented Generation).
Instead of matching strings (SELECT * WHERE name = "Bob"), Vector databases search via mathematical proximity (e.g., matching the concept of "Happy" to a stored text block about "Joy").
Usage via CLI
- Create an embedding using Workers AI.
- Insert that embedding into Vectorize:
// Querying the vector database
const queryVector = await env.AI.run("@cf/baai/bge-small-en-v1.5", {
text: "How do I reset my password?"
});
const matches = await env.VECTOR_DB.query(queryVector.data[0], { topK: 3 });
return Response.json(matches); // Returns the IDs of the closest matching documents
3. WebSockets
Because a Worker Isolate handles the Fetch API identically to a browser, setting up native WebSockets is trivial. This allows for real-time bidirectional communication.
The Worker "upgrades" a standard HTTP connection into a persistent WebSocket.
export default {
async fetch(request, env, ctx) {
const upgradeHeader = request.headers.get('Upgrade');
if (!upgradeHeader || upgradeHeader !== 'websocket') {
return new Response('Expected Upgrade: websocket', { status: 426 });
}
// Create the WebSocket pair
const webSocketPair = new WebSocketPair();
const [client, server] = Object.values(webSocketPair);
// Accept the connection on the server side
server.accept();
server.addEventListener('message', async event => {
console.log('Received:', event.data);
// Echo the message back reversed!
server.send(String(event.data).split("").reverse().join(""));
});
server.addEventListener('close', event => {
console.log('WebSocket closed');
});
// Return the response with status 101 Switching Protocols
return new Response(null, {
status: 101,
webSocket: client, // Pass the client side back to the user's browser
});
}
};
Standard Workers have aggressive idle connection limits. If a WebSocket is inactive for too long, it is killed. For highly active, persistent WebSockets (like intense multiplayer games), you must use Durable Objects instead.
What's Next
Now that we have covered the full breadth of the Worker's capability, from standard routing to LLM integration, we must cover how to support the application when things go wrong in production.
Proceed to the final module: Module 11: Operations and Troubleshooting.