RORK LABJP
FUNDING — Rork raises $15M, drawing fresh attention to its mobile-first no-code AI positioningMAX-NATIVE — Rork Max reaches native territory React Native can't: AR/LiDAR, Metal 3D, widgets, Dynamic Island, Live Activities, HealthKit, and on-device Core MLMOBILE-FIRST — While Bolt and Lovable focus on web apps, Rork builds mobile apps — production-ready from a plain-language descriptionWWDC — WWDC26 wraps with AI becoming a core OS capability; the iOS 27 generation raises the value of widgets and Live ActivitiesPRICING — Free to start, paid plans from $25/mo, Rork Max at $200/mo — ship fast on Expo, then go native with Max where it pays offALL-APPLE — Rork Max generates pure Swift covering iPhone, iPad, Apple Watch, Apple TV, Vision Pro, and iMessageFUNDING — Rork raises $15M, drawing fresh attention to its mobile-first no-code AI positioningMAX-NATIVE — Rork Max reaches native territory React Native can't: AR/LiDAR, Metal 3D, widgets, Dynamic Island, Live Activities, HealthKit, and on-device Core MLMOBILE-FIRST — While Bolt and Lovable focus on web apps, Rork builds mobile apps — production-ready from a plain-language descriptionWWDC — WWDC26 wraps with AI becoming a core OS capability; the iOS 27 generation raises the value of widgets and Live ActivitiesPRICING — Free to start, paid plans from $25/mo, Rork Max at $200/mo — ship fast on Expo, then go native with Max where it pays offALL-APPLE — Rork Max generates pure Swift covering iPhone, iPad, Apple Watch, Apple TV, Vision Pro, and iMessage
Articles/AI Models
AI Models/2026-06-13Advanced

Routing inference on-device first and escaping to the cloud only when it's worth it, in a Rork app

Build a tiered, fallback-based inference router in a Rork (Expo) app: cache to on-device to Private Cloud Compute to a remote API (Claude/Gemini). Working TypeScript covering budgets, timeouts, caching, and image routing.

Rork392On-device AIFoundation ModelsReact Native156Inference routerCost design

Premium Article

The week WWDC26 wrapped, I was looking again at the cloud bill for a small "generate a one-line comment" feature in an app I run as an indie developer. A few hundred yen a month sounds trivial, but it scales linearly with users, and for a free app that's a quietly heavy fixed cost.

Then Apple's State of the Union landed: developers with fewer than two million first-time downloads can use Foundation Models on Private Cloud Compute for free, and the same Swift API is moving toward image input and server-side third-party models (Claude, Gemini). For the first time, "cheap things on-device for free, expensive things on a paid path" becomes a cost design you can actually build, not just talk about.

There's a catch: Rork generates production Expo (React Native) apps, and React Native can't reach Apple's on-device model directly. What you need is a single place that decides which inference goes down which path — an inference router. This article lays out that design with code you can run.

Why hitting a single API directly falls apart in production

My first naive version called a remote API with fetch per feature. It worked. The moment it hit real usage, several problems erupted at once.

  • The feature died completely offline (open the app on the subway, get an error)
  • The same input was billed every time (even for deterministic tasks like summaries)
  • Light and heavy tasks both flowed to the same expensive model
  • A sloppy retry double-billed me when a request re-fired after a timeout

A smarter model fixes none of this. The root cause is that route selection is scattered across your app logic. Pull it into one layer — the router — and fallback, budgeting, and caching all live inside that layer.

The fallback ladder

The structure I use is a ladder: try the cheap, fast path first and drop to the next rung when a path can't handle the task.

  1. Tier 0 — Cache: deterministic tasks (same input, same acceptable output) check a local cache first. A hit costs nothing and touches no network
  2. Tier 1 — On-device: short classification, summarization, and cleanup go to Foundation Models via a native module. Free, low latency, works offline
  3. Tier 2 — Private Cloud Compute: when on-device accuracy isn't enough but you don't want to pay a third party. Used within the free allowance
  4. Tier 3 — Remote API (Claude / Gemini): only the genuinely heavy work — multimodal with image input, or long high-quality generation

The concrete steps for implementing the on-device rung as a native module are covered separately in Using Apple FoundationModels in a Rork app. For framing the free allowance (PCC) as a three-tier cost design, see Rebuilding Rork AI cost as three tiers with the free Foundation Models.

The key is the assumption that each rung down is more expensive, so you try tasks from the top. Each task carries a floor — "this needs at least this rung" — and the router tries the cheapest rung at or above that floor.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
If you've held off on adding an AI feature because the API bill was unpredictable, you'll get a fallback design that keeps your monthly AI cost close to zero
You can paste in a typed router that decides what goes on-device, to PCC, or to a remote API based on task type and budget
You'll dodge the production-only traps — double billing on retries and races on app resume — before you hit them
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Rork Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

AI Models2026-05-08
How I Stop Rork's AI From Generating Outdated Library Code — My Version-Pinned Prompt Template
A practical look at why Rork Max sometimes generates outdated API code, the version-pinned prompt template I rely on every day, and the device-level checks I use to catch the cases where mismatches still slip through.
AI Models2026-04-22
Rork × LiveKit: Production Voice Agent Infrastructure for AI-Powered Apps
A complete guide to wiring LiveKit Agents into a Rork-generated React Native app and running low-latency AI voice agents in production.
AI Models2026-04-14
Rork × Claude Code Review Flow: A Practical Guide to Polishing AI-Generated Mobile App Code
Learn how to use Claude Code as a code reviewer for Rork-generated React Native apps. Covers common pitfalls — missing error handling, memory leaks, weak TypeScript types — with concrete Before/After examples.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →