⬡ Dev Tools/2026-04-29Advanced

Building a Real-Time Collaborative App Backend with Rork and Cloudflare Durable Objects — Full Implementation Guide

A production-grade walkthrough for adding a self-hosted real-time collaboration backend to your Rork app using Cloudflare Durable Objects. Covers WebSocket lifecycle, hibernation-aware sessions, optimistic updates from React Native, and cost-aware design patterns — without depending on Liveblocks or Yjs hosting.

Rork⁴⁸⁰ Cloudflare² Durable Objects² WebSocket Real-time Backend⁴

✦ Premium Article

"I want to build a whiteboard or polling feature in Rork, but Liveblocks pricing does not scale with my user base." This is a question I hear constantly from indie developers right now. Hosted services like Liveblocks and Yjs are pleasant to start with, but the moment user counts grow, your monthly bill jumps in ways that hurt even apps that are already monetized.

I learned this the hard way. I built a small voting app on Rork and bolted Liveblocks onto it. The first month came in at three times my projection, and that was on a free-tier app. Over the next six months, I migrated everything to Cloudflare Durable Objects. The result was about a tenfold reduction in monthly cost, and latency actually felt snappier from Tokyo, where most of my users live.

This guide distills everything I learned from that migration. We will cover the conceptual model of Durable Objects, the WebSocket client you embed in your Rork app, and the operational pitfalls that bit me — all with code you can paste into a real project.

Why Durable Objects, Versus Other Options

There are several ways to build a real-time backend, and choosing without understanding the trade-offs hurts later. Here is how I think about it.

Liveblocks, PartyKit, Pusher: Beautiful SDKs and a fast on-ramp, but priced on monthly active connections. If your app is free or freemium and you have many concurrent collaborators, the math becomes unfriendly very quickly.
Supabase Realtime: Subscribes to Postgres changes. Excellent for persisting structured state, but ill-suited for the high-frequency, low-latency events that real-time collaboration generates (cursor positions, presence pings, in-progress strokes).
Self-hosted WebSockets on Node.js plus Redis: Full control, but the operational burden — autoscaling, sticky sessions, regional fallback — is realistically more than a solo developer wants to take on.
Cloudflare Durable Objects: Combines a WebSocket server and persistent storage into a single addressable object that runs at the edge. Pricing is based on object execution time, which makes forecasts predictable.

The single feature that made me commit to Durable Objects was the per-state object scoping. Every whiteboard URL, every chat room, every poll can map to its own object. Only the participants of that room ever route to that instance. When the room is idle, the object hibernates and you stop paying. This matches the cost shape of an indie app: you pay precisely for the rooms that are active right now.

The Architecture at a Glance

Before writing code, it helps to picture the request flow.

The Rork app holds a roomId (for example, the slug of a whiteboard) and opens a WebSocket against /ws/:roomId on the Worker.
The Worker translates that roomId into a Durable Object instance, then forwards the upgraded connection to it.
The Durable Object accepts the connection via state.acceptWebSocket(), registering it as a hibernation-aware WebSocket.
Messages from clients are mutated against in-memory state and broadcast to the other connections attached to the same object.
Anything that needs to outlive a single session (drawn shapes, chat history) is written to the object's SQLite storage.

Because Cloudflare guarantees one logical instance per object ID worldwide, you also get free message ordering within a room. There is no need for an external lock or queue.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦You can now build a real-time collaboration backend on Cloudflare Durable Objects without paying for Liveblocks or hosted Yjs services

✦You will master production patterns for WebSocket lifecycle, reconnection, and state synchronization, with code that copy-pastes into a real Rork project

✦You will be able to design a scope-per-state cost architecture so collaborative apps (whiteboards, polls, chat rooms) stay cheap to operate at scale

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Step 1: Bootstrap the Worker and the Durable Object Class

Generate a fresh Worker project. I keep this in a separate repository from the Rork frontend; mixing them tends to add friction to deploys.

# Create a new TypeScript Worker project
npm create cloudflare@latest realtime-backend -- \
  --type=worker --ts --no-deploy
 
cd realtime-backend

Add the Durable Object binding to wrangler.toml.

# wrangler.toml
name = "realtime-backend"
main = "src/index.ts"
compatibility_date = "2026-01-15"
compatibility_flags = ["nodejs_compat"]
 
[[durable_objects.bindings]]
name = "ROOM"
class_name = "RoomDO"
 
[[migrations]]
tag = "v1"
new_sqlite_classes = ["RoomDO"]

The new_sqlite_classes directive opts you into the SQLite storage backend Cloudflare made generally available in 2025. It is faster than the legacy KV-style storage and lets you write actual queries when your shapes get more complex.

Now the Durable Object itself.

// src/room.ts
export class RoomDO {
  private state: DurableObjectState;
  private env: Env;
  private sessions: Map<WebSocket, { userId: string; joinedAt: number }>;
 
  constructor(state: DurableObjectState, env: Env) {
    this.state = state;
    this.env = env;
    // After hibernation, restore the existing WebSockets and their metadata
    this.sessions = new Map();
    for (const ws of this.state.getWebSockets()) {
      const meta = ws.deserializeAttachment() as
        | { userId: string; joinedAt: number }
        | null;
      if (meta) this.sessions.set(ws, meta);
    }
  }
 
  async fetch(request: Request): Promise<Response> {
    const upgradeHeader = request.headers.get("Upgrade");
    if (upgradeHeader !== "websocket") {
      return new Response("Expected websocket", { status: 426 });
    }
 
    const url = new URL(request.url);
    const userId = url.searchParams.get("userId");
    if (!userId) {
      return new Response("userId is required", { status: 400 });
    }
 
    const pair = new WebSocketPair();
    const [client, server] = [pair[0], pair[1]];
 
    // Hibernation-aware accept
    this.state.acceptWebSocket(server);
    const meta = { userId, joinedAt: Date.now() };
    server.serializeAttachment(meta);
    this.sessions.set(server, meta);
 
    this.broadcast(
      { type: "user_joined", userId, at: meta.joinedAt },
      server,
    );
 
    return new Response(null, { status: 101, webSocket: client });
  }
 
  // Broadcast to every active connection in this room (optionally excluding one)
  private broadcast(message: unknown, except?: WebSocket) {
    const json = JSON.stringify(message);
    for (const [ws] of this.sessions) {
      if (ws === except) continue;
      try {
        ws.send(json);
      } catch (err) {
        // The peer may already be gone; cleanup happens on the next close event
        console.warn("send failed", err);
      }
    }
  }
 
  async webSocketMessage(ws: WebSocket, message: string | ArrayBuffer) {
    if (typeof message !== "string") return;
    let parsed: { type: string; payload?: unknown };
    try {
      parsed = JSON.parse(message);
    } catch {
      ws.send(JSON.stringify({ type: "error", reason: "invalid_json" }));
      return;
    }
 
    const meta = this.sessions.get(ws);
    if (!meta) return;
 
    if (parsed.type === "draw") {
      // Persist only the operations that need to survive disconnects
      await this.state.storage.put(
        `op:${Date.now()}:${meta.userId}`,
        parsed.payload,
      );
      this.broadcast({ ...parsed, userId: meta.userId }, ws);
    } else if (parsed.type === "cursor") {
      // Cursors are presence — keep them ephemeral
      this.broadcast({ ...parsed, userId: meta.userId }, ws);
    }
  }
 
  async webSocketClose(ws: WebSocket, code: number, reason: string) {
    const meta = this.sessions.get(ws);
    this.sessions.delete(ws);
    if (meta) {
      this.broadcast({ type: "user_left", userId: meta.userId, code, reason });
    }
  }
 
  async webSocketError(ws: WebSocket, error: unknown) {
    console.error("ws error", error);
    this.sessions.delete(ws);
  }
}

Three details deserve a callout.

Hibernation support: state.acceptWebSocket() is what unlocks the cost model. Idle connections do not bill while no messages are flowing, and the object boots back up only when the next message arrives.
Persisted versus ephemeral state: I write draw operations to SQLite but leave cursor events in memory. This is the single most important decision for cost control — a chatty event written every frame would dominate your bill.
Metadata recovery on rehydrate: When the object wakes up, deserializeAttachment() lets you recover the userId (or any other small JSON metadata) you stashed on the WebSocket.

Step 2: Worker Routing and Rate Limiting

The Worker is the front door. It maps the path to a Durable Object, applies request-level concerns, and forwards.

// src/index.ts
import { RoomDO } from "./room";
export { RoomDO };
 
interface Env {
  ROOM: DurableObjectNamespace;
}
 
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    const match = url.pathname.match(/^\/ws\/([\w-]+)$/);
    if (!match) {
      return new Response("Not found", { status: 404 });
    }
 
    const roomId = match[1];
 
    // Lightweight rate limiting: cap inbound connections per IP
    const ip = request.headers.get("CF-Connecting-IP") ?? "unknown";
    const limited = await this.rateLimit(env, ip);
    if (limited) {
      return new Response("Too many requests", { status: 429 });
    }
 
    // Same roomId always resolves to the same Durable Object instance
    const id = env.ROOM.idFromName(roomId);
    const stub = env.ROOM.get(id);
 
    return stub.fetch(request);
  },
 
  async rateLimit(env: Env, ip: string): Promise<boolean> {
    // Pair with Cloudflare's dashboard rate-limiting rules in production
    return false;
  },
};

env.ROOM.idFromName(roomId) is deterministic. Every user who joins the same roomId lands on the same instance. There is no service discovery, no shard map, no consistent-hash logic to maintain.

A note on rate limiting: trying to count requests inside the Worker via KV adds latency, and the eventual-consistency window can let bursts through. In production I keep the in-Worker check as a soft signal and rely on Cloudflare's dashboard rate-limiting rules for the hard limit.

Step 3: The WebSocket Client Inside Your Rork App

With Cloudflare ready, switch to the Rork side. Rork generates React Native + Expo, so we use the standard WebSocket API.

// app/lib/realtime-client.ts
type Listener<T = unknown> = (event: T) => void;
 
export class RealtimeClient {
  private ws: WebSocket | null = null;
  private url: string;
  private listeners = new Map<string, Set<Listener>>();
  private reconnectAttempts = 0;
  private reconnectTimer: ReturnType<typeof setTimeout> | null = null;
  private explicitlyClosed = false;
  private outboundQueue: string[] = [];
 
  constructor(roomId: string, userId: string) {
    const base = process.env.EXPO_PUBLIC_REALTIME_URL ?? "wss://example.dev";
    this.url = `${base}/ws/${roomId}?userId=${encodeURIComponent(userId)}`;
  }
 
  connect() {
    this.explicitlyClosed = false;
    this.ws = new WebSocket(this.url);
 
    this.ws.onopen = () => {
      this.reconnectAttempts = 0;
      // Flush messages that were queued during the outage
      while (this.outboundQueue.length > 0 && this.ws?.readyState === 1) {
        const msg = this.outboundQueue.shift();
        if (msg) this.ws.send(msg);
      }
      this.emit("open", null);
    };
 
    this.ws.onmessage = (e) => {
      try {
        const parsed = JSON.parse(e.data as string);
        this.emit(parsed.type ?? "message", parsed);
      } catch {
        // Skip malformed payloads silently
      }
    };
 
    this.ws.onclose = (e) => {
      this.emit("close", { code: e.code, reason: e.reason });
      if (!this.explicitlyClosed) {
        this.scheduleReconnect();
      }
    };
 
    this.ws.onerror = () => {
      // onclose will fire as well — leave this empty to avoid double handling
    };
  }
 
  private scheduleReconnect() {
    // Exponential backoff with jitter, capped at 30 seconds
    const base = Math.min(30_000, 1_000 * 2 ** this.reconnectAttempts);
    const jitter = Math.random() * 1_000;
    const delay = base + jitter;
    this.reconnectAttempts++;
    if (this.reconnectTimer) clearTimeout(this.reconnectTimer);
    this.reconnectTimer = setTimeout(() => this.connect(), delay);
  }
 
  send(message: object) {
    const json = JSON.stringify(message);
    if (this.ws?.readyState === 1) {
      this.ws.send(json);
    } else {
      // Queue while offline; flush on the next successful open
      this.outboundQueue.push(json);
    }
  }
 
  on<T = unknown>(type: string, listener: Listener<T>) {
    if (!this.listeners.has(type)) this.listeners.set(type, new Set());
    this.listeners.get(type)!.add(listener as Listener);
    return () => this.listeners.get(type)?.delete(listener as Listener);
  }
 
  private emit(type: string, data: unknown) {
    this.listeners.get(type)?.forEach((l) => l(data));
  }
 
  close() {
    this.explicitlyClosed = true;
    if (this.reconnectTimer) clearTimeout(this.reconnectTimer);
    this.ws?.close(1000, "client_close");
    this.ws = null;
  }
}

Wire it into a screen. The pattern I prefer is keeping the client in a useRef so re-renders never accidentally reconnect.

// app/screens/whiteboard.tsx
import { useEffect, useRef, useState } from "react";
import { View } from "react-native";
import { RealtimeClient } from "@/lib/realtime-client";
 
export default function Whiteboard({ roomId, userId }: Props) {
  const [strokes, setStrokes] = useState<Stroke[]>([]);
  const clientRef = useRef<RealtimeClient | null>(null);
 
  useEffect(() => {
    const client = new RealtimeClient(roomId, userId);
    clientRef.current = client;
 
    const offDraw = client.on<{ payload: Stroke }>("draw", (e) => {
      setStrokes((prev) => [...prev, e.payload]);
    });
 
    client.connect();
 
    return () => {
      offDraw();
      client.close();
    };
  }, [roomId, userId]);
 
  const handleStroke = (stroke: Stroke) => {
    // Optimistic local update first…
    setStrokes((prev) => [...prev, stroke]);
    // …then ship it over the wire
    clientRef.current?.send({ type: "draw", payload: stroke });
  };
 
  return <View style={{ flex: 1 }}>{/* drawing canvas */}</View>;
}

The optimistic update is what makes the experience feel "instant". You commit your own action to local state immediately, and the server fan-out simply tells everyone else. This is also how you keep input responsive even when the network is unsteady.

When you ask Rork to scaffold a screen like this, an explicit prompt such as "use a useRef-held WebSocket client, apply optimistic local state first, then call .send() to broadcast" produces dramatically more reliable code than a vague "make it real-time".

Pitfalls I Hit in Production

A few of these tripped me up enough that they deserve their own section.

Pitfall 1: All messages on the same Durable Object run on the same thread.

This is the source of the message-ordering guarantee, but it cuts both ways. Once, I added an await for an OpenAI Vision call inside webSocketMessage. Under load, every other message in the room queued behind it, and the room felt frozen for ten seconds at a time. The right answer is to push heavy work to a separate Worker, then have the Durable Object only react to the result.

Pitfall 2: setTimeout is gone after hibernation — use state.storage.setAlarm().

If you want a periodic snapshot or a "save in N seconds" timer, you must use the Alarms API. A naive setTimeout evaporates when the object hibernates and never fires. The docs do call this out, but it is buried.

Pitfall 3: Some clients refuse to reconnect after a 1011 close code.

My client treats anything other than 1000 (normal closure) as a candidate for reconnect, but a few mobile WebSocket implementations short-circuit on 1011 (server internal error). When you close the socket from the server during an error, prefer 1006 or 1008. Your reconnect logic gets noticeably more forgiving.

For debugging, the most efficient setup I have found is wrangler tail for live logs alongside websocat for ad-hoc connection tests.

# Stream logs from the Worker
wrangler tail realtime-backend
 
# In another terminal, exercise the WebSocket directly
websocat "wss://realtime-backend.example.workers.dev/ws/test-room?userId=u1"

Operations and Cost Optimization

Durable Objects bill across three axes.

Requests: Each WebSocket upgrade counts as one request.
Duration: Time the object is awake. Hibernation drops this to zero.
Storage: Reads, writes, and stored bytes against the SQLite backend.

For whiteboard-style apps, cursor messages can fire ten or more times per second per user. If you write those to SQLite, your storage line item dominates the bill. The "ephemeral by default, persistent on demand" rule we applied above is the single biggest lever.

For the app I actively run — peaks around 80 concurrent users across roughly ten thousand monthly rooms — Durable Objects bill out at around four US dollars a month. The same workload on Liveblocks ran me close to eighty. That is a twentyfold delta, paid for by accepting a steeper learning curve up front.

A practical security note: validate a JWT at the top of the Worker's fetch(). Issue the token from your Rork backend, attach it to the WebSocket URL as a query parameter, and decode it before forwarding to the Durable Object. Embedding the room ID in the token itself prevents cross-room access if a token leaks.

Testing Locally Before You Deploy

It is easy to convince yourself that a real-time system works because the happy path looks fine in development. The way I have learned to actually trust a Durable Object setup is to script the failure paths.

Run the Worker locally with wrangler dev and exercise it with a small Node script that simulates several clients at once. The script below opens twenty connections, drives random draw and cursor events, then asserts that every client sees every other client's draw events.

// scripts/load-smoke.ts
import WebSocket from "ws";
 
type Client = { id: string; ws: WebSocket; received: number };
 
async function spawn(roomId: string, userId: string): Promise<Client> {
  const url = `ws://127.0.0.1:8787/ws/${roomId}?userId=${userId}`;
  return new Promise((resolve) => {
    const ws = new WebSocket(url);
    const client: Client = { id: userId, ws, received: 0 };
    ws.on("open", () => resolve(client));
    ws.on("message", (raw) => {
      const msg = JSON.parse(raw.toString());
      if (msg.type === "draw") client.received++;
    });
  });
}
 
async function main() {
  const room = "smoke-" + Date.now();
  const clients: Client[] = [];
  for (let i = 0; i < 20; i++) {
    clients.push(await spawn(room, `u${i}`));
  }
 
  const sends = 50;
  for (let i = 0; i < sends; i++) {
    const sender = clients[i % clients.length];
    sender.ws.send(JSON.stringify({ type: "draw", payload: { x: i, y: i } }));
  }
 
  // Give the system a moment to settle
  await new Promise((r) => setTimeout(r, 1500));
 
  for (const c of clients) {
    // A client receives every draw EXCEPT the ones it sent itself
    const ownSends = Math.floor(sends / clients.length);
    const expected = sends - ownSends;
    if (c.received < expected - 2) {
      console.error(`${c.id} only saw ${c.received} of ~${expected}`);
      process.exit(1);
    }
  }
  console.log("✅ smoke pass");
  process.exit(0);
}
 
main();

I run this against wrangler dev before any production deploy, and again against the deployed Worker once it is live. It is the cheapest insurance you will ever buy. The first time I introduced the await on a heavy AI call (Pitfall 1 above), this script was what surfaced the regression: the received counts dropped because the object stopped processing fast enough.

A second test I run is a "kill the network" simulation. On macOS you can use tc or just toggle Wi-Fi for ten seconds while a real device is connected. With the reconnect logic shown earlier, you should see clients drop, watch the queue fill, then flush cleanly when the network returns. If you see duplicated messages or stuck queues, that is a bug in your reconnect path, not in Cloudflare.

When to Reach for Postgres or Redis Instead

Durable Objects are not a universal answer. Two situations push me back to a more traditional stack.

The first is when state is fundamentally relational and queried by many different shapes. If your real-time feature has to answer "who is online from this user's friend list" at every keystroke, a Durable Object that holds presence is the wrong shape — you want a Postgres or Redis index that can serve the question across rooms. In that case, I keep Durable Objects for the hot real-time loop and keep a separate Supabase or Cloudflare D1 layer for the "across all rooms" queries.

The second is when state must outlive a particular object boundary in complex ways. For example, a multi-room game where players move between rooms but share an inventory will not fit cleanly into the per-room scope. You can solve it (Durable Objects can call each other), but every cross-object call is a network hop and a future debugging tax. Sometimes a stateless Worker plus a shared Postgres is honestly easier.

I find this trade-off useful to write down somewhere visible: "Durable Objects for the inside-of-a-room loop, normal database for queries that span rooms." That single sentence has saved me from over-engineering more than once.

Migrating From Liveblocks Without a Big-Bang Rewrite

If you have a live app on Liveblocks and the bill is hurting, do not rewrite everything in one weekend. Here is the gradual path that worked for me.

Phase 1: Dual-write presence only. Cursors and "who is in the room" indicators are the highest-frequency, lowest-stakes events. Send those to your Durable Object backend in addition to Liveblocks for a week. Compare latency from real-user metrics. Nothing user-facing has changed.

Phase 2: Switch presence reads to your backend. Once latency looks acceptable, flip the read path so your app subscribes to your Durable Object for cursors and presence. Keep Liveblocks running, but it is now redundant for these events.

Phase 3: Move persistent state. This is the tricky part — operations like drawn shapes, board snapshots, room metadata. Write a one-time migration that reads from Liveblocks and writes to your Durable Object SQLite, then flip the write path. Plan a maintenance window for this; it is the only step that benefits from one.

Phase 4: Decommission Liveblocks. Cancel the subscription and delete the integration code. Watch the bill drop.

I went from "running on Liveblocks in production" to "running entirely on Durable Objects" in about three weeks of evening work. The dual-write phase is what makes this safe. You always have an escape hatch.

Three Apps I Actually Shipped on This Stack

Abstract explanation only goes so far. Here are three apps I built on Durable Objects and the design decisions that shaped each of them. I am writing this concretely so you can map it onto whatever you are building yourself.

1. A collaborative whiteboard (indie product, ~400 monthly active users).

I create one Durable Object per board, with a soft cap of ten simultaneous connections. Drawing strokes are persisted to SQLite as coordinate arrays, and the eraser is implemented as a tombstone flag rather than a physical delete. I initially wired up physical deletes, then ran into pain when I added undo/redo — the tombstone approach turned out to be dramatically easier to reason about. Cursors and scroll positions are kept in memory only. After three months, retention sits around thirty percent of week-1 users, which feels healthy for a one-person app.

The hardest part technically was state consistency on hibernation wake-up. While the object is hibernated, nothing is in memory; on the very first message after wake, I now read the latest snapshot from SQLite via state.storage.list() and broadcast it to currently connected clients. Without that, the experience after a few minutes idle is "I came back and someone else's strokes are missing", which is the kind of bug that quietly kills a collaborative product.

2. A real-time polling app for event organizers.

One Durable Object per event URL. Connections only flow during the actual session — typically thirty minutes — so hibernation's cost-shape really shines. I am running several hundred events per month for under two dollars total. Just after launch, organizers complained the live tally felt laggy. The cause was the browser doing aggregation work after every incoming vote; I moved aggregation server-side and broadcast a precomputed snapshot every 500 ms. Perceived latency improved immediately, and the client got dramatically simpler.

3. A live Q&A app for online seminars.

Comments persist; the host has a UI to pick or dismiss each question. The first incident was "we got 50 comments per second at peak", and per-message SQLite writes choked. I changed webSocketMessage to push into an in-memory array and registered an alarm that flushes the buffer via state.storage.transaction() every 250 ms. After that change, the system has handled spikes of well over a hundred per second without complaint.

Three patterns repeat across all of them: keep persistence frequency low, always reconcile state on hibernation wake-up, and aggregate on the server. SaaS solutions like Liveblocks or Pusher hide these decisions behind their abstractions; when you go self-hosted, you have to make them deliberately.

A Pragmatic Security Baseline

Real-time backends have a wide blast radius — anyone in the room sees everyone else's events — so security mistakes hurt disproportionately. Here is the baseline I run on my own indie projects, which I believe is appropriate without being overbuilt.

The non-negotiable starting point is JWT verification at the top of the Worker's fetch(). Issue short-lived tokens from your authentication endpoint (a Hono Worker is a fine place for it) when a user logs into the Rork app, and pass the token as a query parameter on the WebSocket URL. Verify the signature with jose inside the Worker, and confirm that the userId and roomId claims match the request path. With that single check, "someone shared a URL and a stranger walked in" stops being possible.

Keep token lifetimes short — fifteen minutes is my default. Long-lived tokens turn a single leak into a long-term incident. Pair this with a refresh-token flow on the client so users do not feel the expiry; if a 401 comes back, the client silently refreshes and reconnects.

Inside webSocketMessage, validate payloads with Zod or Valibot every time. Trusting client-side types is a category error: the bad actors do not run your TypeScript. If a draw message arrives with coordinates of NaN or a billion, that bad data lands in SQLite and quietly poisons later aggregations. Server-side validation is not a "nice to have"; it is the load-bearing wall of the whole system.

Rate limits work best in two layers. The Worker handles IP-based limits — that catches abusive connection attempts before they reach the Durable Object. Inside the Durable Object, I maintain a per-user limit ("no more than 20 messages per second from the same userId"). When exceeded, I send a soft rate_limited event back rather than dropping the connection; legitimate users sometimes brush against the limit during a burst, and a forced disconnect feels broken. The UI just shows a brief "slow down for a moment" hint.

Finally, treat logs as a leak surface. A naive console.log(message) in webSocketMessage will dump user drawings or chat text into Cloudflare's Worker Logs, where anyone with dashboard access can read them. In production, I log only metadata (type, userId, room ID) and store full payloads, when truly needed for debugging, in a short-retention SQLite table inside the Durable Object itself, where they are scoped to the room.

What Changes When You Bring This to a Team

This architecture is friendly to solo work but carries a few sharp edges when a small team picks it up.

The local-versus-production gap. wrangler dev runs Durable Objects in memory, so every restart wipes their state. In production, of course, state persists. The "it worked locally but breaks in production" pattern is much easier to fall into than usual. The first thing I tell a teammate joining a project like this is "assume local state evaporates; always run a seed script before testing".

Permission scoping. Durable Object deploys involve schema migrations, which means uncontrolled access to the production account is risky. Cloudflare's scoped API tokens are your friend: issue everyone a dev-only token and let only CI (GitHub Actions, in my case) deploy to production.

PR review focus areas. Two things have to be checked on every Durable Object change: are there any awaited external calls inside webSocketMessage, and is anyone using setTimeout instead of the alarm API. Both are easy to miss in a normal review. I keep a tiny "Durable Objects PR checklist" Markdown file in our repo and paste it into the PR template; that small ritual has caught bugs more than once.

Observability — How to Know What Is Actually Happening

Real-time systems are notoriously hard to debug from afar. Stack traces alone are not enough; you need a sense of the room over time. Three lightweight signals have served me well, in roughly this order of usefulness.

Per-room event counters. Inside the Durable Object, increment small counters for messages_in, messages_out, and connections whenever those events happen, and dump them once per minute via the alarm API. Push them to your analytics backend (PostHog, Plausible, or even just a Cloudflare Analytics Engine table). When a user reports "the room felt slow at 3pm", you can see whether the room actually was busier or whether something else was happening. Anecdotally, half the "lag" reports I have chased turned out to be the user's own hotel Wi-Fi, and these counters made that easy to confirm.

Connection lifecycle logs at the Worker layer. Log a single line for each open, close, and rejected upgrade with the room ID, country code (Cloudflare exposes request.cf.country), and the close reason. These are tiny and they let you spot whole regions failing — for example, if your Cloudflare Worker happens to be misconfigured for a particular network, you will see a country-shaped pattern in the rejections within minutes.

A "presence ledger" durable to restarts. When something serious goes wrong and you need to reproduce it, the most painful question is "who was in the room when this happened". I keep a tiny ledger in SQLite that records (roomId, userId, joinedAt, leftAt) per session. It costs almost nothing in storage, but having an authoritative answer to that question has saved me hours during incident reviews.

You do not need a full APM solution to operate a Durable Object backend. The combination of wrangler tail, the in-room counters, and a simple ledger has been enough for every production issue I have hit so far. If you do reach for an APM later, OpenTelemetry has a Workers exporter that drops in fairly cleanly.

A final note on alerts: do not alert on raw error rates. Real-time backends are noisy by nature — clients close ungracefully constantly, especially mobile ones. Instead, alert on "median message latency for sessions longer than 30 seconds". That number is stable in healthy systems and catches the regressions that actually matter.

How to Estimate Cost and Region in Practice

"What will this actually cost" is usually the final decision point. Here are the numbers from the apps I run, and how I think about estimating before you launch.

The Durable Objects pricing model evolved significantly in 2025. The Workers Paid plan, at five US dollars a month, now includes a generous baseline of Durable Objects usage: roughly one million requests, four hundred thousand GB-seconds of duration, and five gigabytes of SQLite storage. You only pay metered prices on top of that.

For my whiteboard app — peaks of around eighty concurrent users across about ten thousand monthly rooms — the rough monthly bill of materials looks like 120,000 requests, 30,000 GB-seconds of duration, and half a gigabyte of storage. All of that fits comfortably inside the Workers Paid baseline, so the only thing I actually pay Cloudflare is the five-dollar plan itself.

The same workload on Liveblocks ran me somewhere between seventy and ninety dollars a month. That gap — close to twentyfold — is meaningful for a freshly monetized indie app. It is the difference between "I am breaking even" and "I am losing money".

A note on regions. Durable Object instances tend to be created in the Cloudflare colo nearest to the very first connection that materialized them. If your audience is mostly in one country, latency is more stable when the first connection of each room comes from that country. After deploying, I always run a smoke test from a real device in my primary region; the Cloudflare dashboard exposes a colo field that confirms where the object actually lives.

Cloudflare also offers location_hint as of 2025, which lets you steer instance placement deliberately. Globally distributed apps eventually want this — for example, separate Durable Object pools per region with the client choosing which to connect to. In practice, this complexity is only worth introducing once you have crossed roughly tens of thousands of daily active users; before that, the auto-placement behavior is fine.

Wrapping Up — One Concrete Next Step

Thank you for reading this far. Durable Objects ask for some upfront learning, but once they click, they become a quietly powerful tool for shipping real-time features without surrendering your margins.

If you take only one action today, make it this: add cursor presence to one of your existing Rork apps. Skip persistence, skip drawing logic. Just two devices showing each other's cursors in real time. The first time it works on a real phone, you will feel the same "I really did run my own real-time backend" lift I did, and from there you can build out whatever collaborative feature you are dreaming about.

I hope this guide nudges your next launch a little closer. If you want to keep reading along these lines, Build a REST API with Rork, Hono, and Cloudflare Workers and Build a Type-Safe Backend with Rork, tRPC, and Cloudflare Workers are good next stops in the same Cloudflare-on-Rork direction.

A final word from one indie developer to another: the work of stitching real-time features into your app does not have to mean accepting a bill that scales faster than your revenue. Cloudflare's primitives reward patient design with very low operating costs, and Rork makes the front-end side of the loop dramatically faster to iterate on. Used together, they let a one-person team ship the kind of multi-user features that used to require an entire backend team. That feels like a quietly enormous shift, and I hope this guide gives you a head start in taking advantage of it.

Thank You for Reading

Rork Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.