RORK LABJP
RORK MAX — Rork Max can now build native Swift apps for iPhone, iPad, Apple Watch, Apple TV, and Vision ProPUBLISH — Rork Max offers two-click App Store publishing with no Xcode required, cutting the friction of getting an app shippedEXPO — The standard Rork is built on React Native (Expo), generating native iOS and Android apps from plain-English descriptionsPRICING — Rork is free to start, with paid plans beginning at $25/month, an accessible tier for solo developersFUNDING — Rork raised $2.8M from a16z (Andreessen Horowitz) as investment keeps flowing into AI app buildersREVIEW — In real use the keys are generated-code readability and maintainability, Expo-related constraints, and how easily billing, push, and ad SDKs slot inRORK MAX — Rork Max can now build native Swift apps for iPhone, iPad, Apple Watch, Apple TV, and Vision ProPUBLISH — Rork Max offers two-click App Store publishing with no Xcode required, cutting the friction of getting an app shippedEXPO — The standard Rork is built on React Native (Expo), generating native iOS and Android apps from plain-English descriptionsPRICING — Rork is free to start, with paid plans beginning at $25/month, an accessible tier for solo developersFUNDING — Rork raised $2.8M from a16z (Andreessen Horowitz) as investment keeps flowing into AI app buildersREVIEW — In real use the keys are generated-code readability and maintainability, Expo-related constraints, and how easily billing, push, and ad SDKs slot in
Articles/AI Models
AI Models/2026-06-14Advanced

On-Device Image Tagging in Rork Max Swift Apps with Foundation Models Image Input

WWDC26 gave the on-device Foundation Models model image input. Here is how to add image tagging and captioning to a Rork Max Swift app entirely on-device, including the availability gate, structured output, and Vision interop.

Rork Max155Foundation Models3Image InputOn-Device AI2SwiftUI48

Premium Article

When you build a feature that takes a single photo and answers "what is this?" or "what tags belong on it?", the default for years has been to ship the image off to a cloud multimodal API. In the wallpaper app I run as an indie developer, I have repeatedly wanted to auto-assign categories and keywords to newly added images, and each time I ran into the same question: is it really acceptable to send a picture sitting on the user's own device up to my server or a cloud LLM?

WWDC26 changed that calculus. The on-device Foundation Models in iOS 27 can now read images: you drop a picture into the prompt next to the text and ask the model about it. Apple frames this not as a new pipeline but as "a natural extension of the existing prompt builders." Everything you learned in iOS 26 — LanguageModelSession, @Generable — keeps working unchanged. The prompt simply grew a picture.

Built on a Rork Max Swift app, this walkthrough assembles an image tagging and captioning feature that runs entirely on-device, including availability checks and Vision interop. Because Rork Max generates native Swift rather than React Native, it lines up cleanly with Apple-native frameworks like Foundation Models.

Why "keep the image off the cloud" pays off

Hand image understanding to a cloud LLM and three costs arrive at once: money (per-image inference billing), latency (the network round trip), and a privacy story you have to explain. For apps that handle personal images that live on the device — wallpapers, health, photos — that third cost weighs the most.

The on-device Foundation Models model lightens all three. Inference stays on the device, so for an indie app under a couple million downloads you can add image understanding at effectively zero marginal cost. The round trip disappears, so it works offline, and because the image never leaves the device, the privacy explanation gets short.

It is not a free lunch, though. The on-device context is 4K; the Private Cloud Compute (PCC) server model is 32K, and an image spends from that token budget. Apple says it plainly: larger images consume more tokens and add latency. The design starting point is "measure on-device first, escalate to the server only when you have to."

The shape of the feature: three layers

It helps to think of the implementation in three layers.

  1. Availability gate: confirm the device can run Apple Intelligence (the on-device model) and hide or fall back when it cannot.
  2. Structured tag generation: attach the image to the prompt and receive a fixed shape — tag array, category, one-line caption — via @Generable.
  3. Vision interop and staged escalation: let Vision handle fast, fixed tasks, let on-device Foundation Models do the descriptive language, and push only long or multi-image batches to PCC.

Let's build them in order.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
If you've been sending images to a cloud multimodal API for tagging or captioning, you'll be able to switch to a fully on-device implementation inside your Rork Max Swift app
You'll walk away with copy-and-run Swift code covering @Generable structured output, Vision interop via tool calling, and the availability gate
Understanding the 4K on-device vs 32K cloud token budget, you'll be able to ship image understanding to production without a monthly API bill
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Rork Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

AI Models2026-06-14
Calling Apple Foundation Models from a Rork (Expo) App: Bridging On-Device AI Through a Native Module
Rork generates Expo (React Native) apps, but Apple Foundation Models ships as a Swift framework you can't touch from JavaScript. Here's how to write an Expo Modules API bridge, gate it by availability, and fall back to the cloud on unsupported devices.
AI Models2026-05-04
What Can Rork Max Actually Generate in SwiftUI? — Real-Device Testing in 2026
An honest assessment of Rork Max's SwiftUI native app generation — what it handles well, where it struggles, and what that means for your App Store submission. Based on real-device testing.
AI Models2026-05-04
Rork Max SwiftUI Indie App Revenue Blueprint 2026 — Native Generation, AdMob, and Subscriptions Toward Six Figures
A practical blueprint for indie developers using Rork Max's SwiftUI native app generation to build app revenue toward six figures monthly. Covers ideation scoring, AdMob placement, subscription onboarding, retention design, and long-term operations from a developer with 12 years of indie iOS experience.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →