RORK LABJP
TEST — The Rork Companion app lets you test on a real iPhone without a paid Apple Developer accountCLOUD — Code compiles on a cloud Mac, streaming a 60fps live simulator with real touch inputBROWSER — Design, code, and test entirely in Chrome or Safari — no Xcode requiredPUBLISH — Two-click App Store publishing keeps the submission process simpleMAX — Rork Max builds native Swift apps for iPhone, iPad, Apple Watch, and Vision ProRN — Standard Rork generates iOS and Android apps together with React Native (Expo)TEST — The Rork Companion app lets you test on a real iPhone without a paid Apple Developer accountCLOUD — Code compiles on a cloud Mac, streaming a 60fps live simulator with real touch inputBROWSER — Design, code, and test entirely in Chrome or Safari — no Xcode requiredPUBLISH — Two-click App Store publishing keeps the submission process simpleMAX — Rork Max builds native Swift apps for iPhone, iPad, Apple Watch, and Vision ProRN — Standard Rork generates iOS and Android apps together with React Native (Expo)
Articles/App Dev
App Dev/2026-06-28Advanced

Design On-Device Core ML So Cold Start and Heat Don't Break It

Put on-device Core ML in the native Swift that Rork Max generates and you hit two walls before accuracy: the first inference is slow, and the device heats up and slows down. Here is a design built around cold start and a thermal budget, with working Swift.

Rork Max187Core ML4Swift31on-device AI4indie developer31

Premium Article

Because Rork Max can generate native Swift apps, on-device Core ML inference — long out of easy reach in React Native — is now within reach even for indie development. But put it on a real device and there are stumbling points before you ever get to accuracy. The first inference is oddly slow. After a while the device warms up and everything feels sluggish. Both are design problems about when and how much you run inference, not about whether the model is good.

As an indie developer at Dolice, when I built on-device inference into an app I run, I struggled with a roughly one-second freeze on the very first call. The cause was not model accuracy; it was running the first load and inference on the main thread right at launch.

This article lays out how to design Core ML around two constraints — cold start and a thermal budget — with Swift code.

Why the first inference is slow

A Core ML model runs two heavy operations the first time you use it. One is loading and compiling the model (optimized for the device's Neural Engine); the other is the first inference, which allocates internal buffers. It is normal for the first call to be an order of magnitude slower than later ones.

StageWhat mainly happensFelt impact
First loadModel compile and placementHundreds of ms to 1 s
First inferenceBuffer allocation, warmupTens to hundreds of ms
SubsequentRun on the allocated pathOften around 10-30 ms

The problem is throwing that heavy first call at the moment the user is waiting for a result. The design goal is simple: move the heavy first call earlier, to a time when the user is not waiting.

Pull model load and warmup out of the launch flow

First, do not synchronously load the model at app launch. Defer with lazy, and do the load and warmup (one inference on dummy input) on a background queue.

import CoreML
 
actor InferenceEngine {
    private var model: MyModel?
 
    // warm up in the background; call after the first screen appears, not at launch
    func warmUp() async {
        guard model == nil else { return }
        let config = MLModelConfiguration()
        config.computeUnits = .all        // let it use the Neural Engine too
        do {
            let loaded = try MyModel(configuration: config)
            // run the first inference on dummy input to allocate buffers
            _ = try? loaded.prediction(input: .dummy)
            model = loaded
        } catch {
            model = nil                   // do not block launch even on failure
        }
    }
 
    func predict(_ input: MyModelInput) async throws -> MyModelOutput {
        if model == nil { await warmUp() }
        guard let model else { throw InferenceError.unavailable }
        return try model.prediction(input: input)
    }
}

The caller invokes warmUp() in the brief gap after the first screen renders and before the user starts interacting.

.task {
    // warm up during idle time after the screen shows
    await engine.warmUp()
}

The actor is what pays off here. Even if predict is called from several places at once, the language guarantees the load does not run twice. Early on I forgot this protection and multi-loaded the model on every screen transition, needlessly bloating memory. Concurrent access is a quiet pitfall in on-device inference.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
What cold start really is (first compile, first inference) and pulling it out of the launch flow with an actor
Gating that reads ProcessInfo.thermalState and steps inference down across full / reduced / suspended
Releasing the model on memory warnings and backgrounding, then warming it up again on return
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Rork Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

App Dev2026-06-18
Building an Ambient Display App for Apple TV with Rork Max
Use the native Swift that Rork Max generates as a foundation for an Apple TV app that quietly plays on a loop. We cover Top Shelf, the focus engine, and seamless video loops, with the practical lessons that only surface in real operation.
App Dev2026-06-16
Adding an Apple Wallet Stamp Card to a Rork Max App — Signing and Updates
You want a shop stamp card in Apple Wallet. When you issue a PassKit pass from a Swift app generated by Rork Max, the hard parts are not design — they are signing and remote updates. Here are the implementation essentials.
App Dev2026-06-16
Staging Wallpaper Packs Before the First Launch: Where Rork Max and Background Assets Fit
Content-heavy apps tend to greet new users with an empty grid. Background Assets downloads content out-of-band, ahead of the first launch. Here is how I implement it in Rork Max's native Swift, a domain Rork (Expo) cannot reach easily, plus how I decide when it is worth it.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →