RORK LABJP
PRODUCT — Rork Max generates native Swift apps for iPhone, iPad, Apple Watch, Apple TV, Vision Pro, and iMessageNATIVE — Rork Max unlocks AR/LiDAR, Metal 3D games, Dynamic Island, Live Activities, HealthKit, and Core MLCLASSIC — The original Rork uses React Native (Expo), turning plain-English prompts into shippable iOS/Android appsFUNDING — Rork raised $2.8M from a16z (plus $15M more), reaching 743,000 monthly visits at 85% growthPRICING — Rork is free to start, with paid plans from $25/month; Rork Max is $200/monthCHOICE — Pick cross-platform Rork or Rork Max for deep Apple-native capabilities, depending on your goalPRODUCT — Rork Max generates native Swift apps for iPhone, iPad, Apple Watch, Apple TV, Vision Pro, and iMessageNATIVE — Rork Max unlocks AR/LiDAR, Metal 3D games, Dynamic Island, Live Activities, HealthKit, and Core MLCLASSIC — The original Rork uses React Native (Expo), turning plain-English prompts into shippable iOS/Android appsFUNDING — Rork raised $2.8M from a16z (plus $15M more), reaching 743,000 monthly visits at 85% growthPRICING — Rork is free to start, with paid plans from $25/month; Rork Max is $200/monthCHOICE — Pick cross-platform Rork or Rork Max for deep Apple-native capabilities, depending on your goal
Articles/Dev Tools
Dev Tools/2026-04-28Intermediate

Building AI Backends for Rork Mobile Apps: When to Use Cloudflare Containers vs Workers AI

A practical guide to choosing between Cloudflare Containers and Workers AI when building AI features for Rork-generated mobile apps, with cost numbers and code examples.

Cloudflare ContainersWorkers AIRork442AI BackendMobile App2GPU InferenceEdge AI2

When you build something with Rork and decide you want a real AI feature — image recognition, custom inference, on-the-fly summarization — you quickly hit the wall of choosing where the inference actually runs. Workers AI is convenient but limited to a curated catalog. Spinning up a GPU instance on AWS or GCP is overkill for a solo developer. For a long time, the middle ground was missing. Cloudflare Containers, which hit GA in 2025, finally fills it.

This article walks through how I decide between Cloudflare Containers and Workers AI when shipping an AI backend behind a Rork-generated React Native app. I'll share the actual architecture and cost numbers from one of my production apps.

Always Try Workers AI First

Before reaching for Containers, ask whether Workers AI can do the job. If a model in the Workers AI catalog covers your use case, it's almost always cheaper, faster to ship, and easier to operate.

The catalog covers the common ground:

  • Conversational and summarization models (Llama, Mistral, Qwen families)
  • Image generation (Flux, Stable Diffusion XL Lightning)
  • Embeddings (BGE, E5)
  • Speech-to-text (small Whisper variants)

For example, if your app summarizes a user's diary entry into three lines, or classifies the sentiment of a product review, there's no reason to involve Containers. Just hit Workers AI:

// app/api/summarize+api.ts (Expo Router API route — what Rork generates)
export async function POST(request: Request) {
  const { text } = await request.json();
 
  const ai = (request as any).env.AI; // Workers binding
  const result = await ai.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      { role: "system", content: "Summarize the following text in three lines." },
      { role: "user", content: text },
    ],
  });
 
  return Response.json({ summary: result.response });
}

This is a single fetch from the mobile side. Cold starts are negligible, the free tier is generous, and for most Rork-driven apps it's all you need.

So when does Containers actually earn its place?

Three Situations Where Containers Wins

Across my own builds, I've reached for Containers in exactly three situations.

The first is custom or fine-tuned models. If you've fine-tuned a YOLOv8 for product detection, or converted a CoreML model to ONNX for cross-platform use, Workers AI simply can't host it. Containers is the only option without leaving the Cloudflare stack.

The second is heavy pre- or post-processing. Image resizing and normalization, table extraction from PDFs, multi-image compositing — these can blow past Workers' CPU time limits (10ms on the free plan, 30 seconds on paid) under realistic loads. Containers gives you full Linux behavior with proper limits.

The third is long-running jobs. Transcribing a 10-minute video, parsing a 100-page PDF, batch-loading vectors into a database — anything in the tens-of-seconds-to-minutes range belongs in Containers, ideally fronted by Cloudflare Queues for retry behavior.

If your workload doesn't match one of these, skip Containers. Adopting it "because it sounds powerful" leaves you with cold starts and minimum spend that don't earn their keep.

Minimal Architecture: Rork App → Worker → Container

Here's the shape I use whenever a Rork app needs to call into a Container:

[Rork App] → [Cloudflare Worker (auth, billing, queue)] → [Container (inference)]

Three reasons the Worker sits in front. First, auth token verification. Second, billing/quota checks (Stripe subscription status, free-tier limits). Third, the option to make the job asynchronous via Queues. Mobile clients can technically hit Containers directly, but in production you'll want this proxy layer for abuse prevention.

Here's a typical wrangler.toml:

name = "rork-ai-backend"
main = "src/index.ts"
compatibility_date = "2026-04-01"
 
[[containers]]
name = "yolo-detector"
image = "./containers/yolo/Dockerfile"
instance_type = "standard-2" # 2 vCPU / 4GB
max_instances = 5
 
[[ai]]
binding = "AI"
 
[[queues.producers]]
binding = "DETECT_QUEUE"
queue = "yolo-detect"

The biggest decision here is instance_type. GPU types (gpu-*) jump the per-second price several-fold, so I strongly recommend trying CPU types first. In my case, converting the model to ONNX Runtime made CPU instances fast enough for production traffic.

The Worker entrypoint looks like this:

// src/index.ts
import { Container } from "@cloudflare/containers";
 
export class YoloDetector extends Container {
  defaultPort = 8080;
  sleepAfter = "30s"; // scale to zero after idle
}
 
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
 
    if (url.pathname === "/api/detect" && request.method === "POST") {
      const userId = await verifyAuth(request, env);
      if (!userId) return new Response("Unauthorized", { status: 401 });
 
      const allowed = await checkQuota(userId, env);
      if (!allowed) return new Response("Quota exceeded", { status: 402 });
 
      const id = env.YOLO_DETECTOR.idFromName("singleton");
      const container = env.YOLO_DETECTOR.get(id);
      return container.fetch(request);
    }
 
    return new Response("Not Found", { status: 404 });
  },
};

sleepAfter = "30s" is the cost-control lever. Thirty seconds after the last request, the container shuts down and billing stops. The next request triggers a cold start — about 2–3 seconds in my deployments. If your feature requires consistent latency, set sleepAfter higher or send periodic warm-up pings.

Calling It from the Rork-Generated Code

Rork emits React Native + Expo. The fetch call carries the auth token and surfaces the quota state cleanly:

// app/(tabs)/detect.tsx
async function detectObjects(imageUri: string, token: string) {
  const formData = new FormData();
  formData.append("image", {
    uri: imageUri,
    name: "photo.jpg",
    type: "image/jpeg",
  } as any);
 
  const res = await fetch("https://rork-ai-backend.example.workers.dev/api/detect", {
    method: "POST",
    headers: { Authorization: `Bearer ${token}` },
    body: formData,
  });
 
  if (res.status === 402) throw new Error("PLAN_REQUIRED");
  if (!res.ok) throw new Error(`Detection failed: ${res.status}`);
  return res.json() as Promise<{ objects: Array<{ label: string; score: number }> }>;
}

I throw PLAN_REQUIRED as its own error so the UI can route quota-exceeded events to a paywall modal instead of a generic error toast. That handoff plugs into the subscription flow I describe in the complete monetization guide for Rork apps.

What This Actually Costs

Real numbers from one of my production apps (≈200 DAU, ≈15K inferences per month):

  • Workers requests: within the free tier
  • Containers compute: ~$4.20 (standard-2, only running during peak hours)
  • R2 storage (input/output images): ~$0.30
  • Total: under $5/month

That's with no GPU — the model was converted to ONNX and runs on CPU. GPU instances will roughly multiply your container cost by 5–10x, so verify CPU isn't enough before reaching for them.

For comparison, running the same workload on an always-on AWS g4dn.xlarge would cost over $300/month. The scale-to-zero behavior of Containers is a decisive advantage for solo developers. If you want to go deeper on the backend layer itself, the Hono + Cloudflare Workers REST API guide covers the surrounding architecture.

Three Operational Pitfalls I Hit

Three lessons from running this in production:

First, requests that arrive during a cold start. The container boot blocks the response, and if your Worker timeout is shorter than the mobile client's fetch timeout (often ~60 seconds in the popular libraries), the user sees an error while inference actually completes successfully on the backend. Set a Worker-side timeout that's tighter than the client's, and surface a clean retry path in the UI.

Second, log aggregation. Container logs are separate from console.log in your Worker by default. I added Sentry inside the container app so Worker errors and Container errors land in the same dashboard.

Third, deploy atomicity. Updating the Worker and pushing a new container image are separate operations, which means a brief window where the new Worker talks to the old container image. I version the API path (/api/v2/detect) so the old and new schemas can coexist during deploys.

Closing Thoughts

Cloudflare Containers fits squarely into the gap that solo developers have struggled with for years: too much for Workers AI alone, too expensive to justify a dedicated GPU box. For Rork-generated apps, the most cost-efficient approach I've found is to start everything on Workers AI, then peel off only the features that hit a real wall and move them to Containers.

Don't migrate everything at once. Try one feature — image classification is a good first candidate — and run wrangler containers deploy. The moment your Rork app starts behaving like a real AI-backed service, you'll feel the shift in what's possible.

Share

Thank You for Reading

Rork Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

Dev Tools2026-05-06
to Production Edge AI in Rork Apps— Ollama Streaming, Conversation History, and Cost Architecture
A complete production guide to integrating Ollama-powered local LLMs into Rork apps. Covers token streaming, SQLite conversation history, cloud fallback routing, and sustainable monetization for indie developers.
Dev Tools2026-06-23
DAU Went Up but Retention Didn't — Rebuilding Gamification That Actually Sticks in Rork Apps
Points, badges, and leaderboards lift DAU, but retention is a different story. Field notes on a server-authoritative point ledger, streaks that forgive, and leaderboards that don't crush newcomers — with working code for Rork apps.
Dev Tools2026-06-23
The Private Screen That Lingers in the App Switcher — Hiding the Snapshot iOS Takes the Moment You Background Your App
When you send a React Native app generated by Rork to the background, iOS photographs the current screen for the App Switcher and writes it to disk. Journals and personal input screens linger there in plain sight. This walks through the iOS privacy overlay (why inactive, not background), Android's FLAG_SECURE, scoping it to sensitive screens only, and screenshot detection — all in working code.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →