RORK LABJP
MAX — Rork Max generates native Swift apps for iPhone, iPad, Watch, TV, Vision Pro, and iMessageNATIVE — It reaches AR/LiDAR, Metal 3D, Dynamic Island, Live Activities, HealthKit, and Core MLPUBLISH — Two-click App Store submission sharply cuts the overhead of shipping an appPRICING — Rork Max is 00/month, while the original Rork starts free with paid plans from 5/monthFUNDING — Rork raised .8M from a16z, with over 743k monthly visits and 85% growthTOOL — The original Rork builds native iOS and Android apps from plain English using React Native (Expo)MAX — Rork Max generates native Swift apps for iPhone, iPad, Watch, TV, Vision Pro, and iMessageNATIVE — It reaches AR/LiDAR, Metal 3D, Dynamic Island, Live Activities, HealthKit, and Core MLPUBLISH — Two-click App Store submission sharply cuts the overhead of shipping an appPRICING — Rork Max is 00/month, while the original Rork starts free with paid plans from 5/monthFUNDING — Rork raised .8M from a16z, with over 743k monthly visits and 85% growthTOOL — The original Rork builds native iOS and Android apps from plain English using React Native (Expo)
Articles/AI Models
AI Models/2026-07-01Advanced

Long-Form On-Device Transcription with SpeechAnalyzer in Rork Max's Native Swift

Implementation notes on rebuilding long-form, offline transcription with iOS 26's SpeechAnalyzer and SpeechTranscriber after hitting the walls of SFSpeechRecognizer. Covers model asset downloads, feeding audio through an AsyncStream, drawing volatile vs. final results, and the boundary design for Rork Max native code and bridging from Expo — with the pitfalls I actually hit.

SpeechAnalyzerRork Max199Speech Recognition2iOS93On-Device AI5

Premium Article

While tinkering with a voice-journaling app as an indie developer, transcription of anything longer than a minute just wouldn't hold up. The SFSpeechRecognizer I was using at the time effectively cut off around the one-minute mark even with on-device recognition, so a long, rambling monologue would stop returning results partway through. Sending audio to a server wasn't a happy alternative either — for a journaling app, privacy weighs on you, and a weak signal means waiting. I couldn't compromise on any of the three: long-form, offline, and private. So the feature sat unfinished for a while.

iOS 26's SpeechAnalyzer solves all three head-on. It's a newly designed API that transcribes long-form audio entirely on device, without sending anything to the cloud. And because Rork Max now generates native Swift, this API is something you can realistically wire into a small indie app. Here is what I learned porting my journaling app over to it, step by step.

What changed from SFSpeechRecognizer

First, let's line up the difference in character between the two. Getting this wrong tends to leave you with a port that "works but is slow."

AspectSFSpeechRecognizerSpeechAnalyzer (iOS 26)
Intended lengthShort utterances and commandsLong form: meetings, dictation
Where it runsOn-device possible, but constraints remainFully on device once assets are installed
API shapeDelegate plus requestSwift Concurrency (AsyncSequence)
Model managementOpaque, left to the OSExplicit download and management of language assets
CompositionMostly monolithicModules attached to an analysis session

The key point is that SpeechAnalyzer is assembled by "attaching purpose-built modules to an analysis session." For transcription you attach SpeechTranscriber; for voice-activity detection you attach SpeechDetector. A module only processes audio from the point it was attached, so a design that adds capability mid-session reads cleanly.

Note that supported platforms are iOS 26, iPadOS 26, macOS 26, visionOS 26, and tvOS 26; the current SDK does not support watchOS. Deciding up front on a split — record on Apple Watch, analyze on the phone — saves trouble later.

Check whether the model asset exists, and fetch it if not

SpeechTranscriber uses a per-language model asset. That asset isn't guaranteed to ship on the device, so before running you check "is this language usable" and "is the asset present," and download it if not. Skip this and you get a bug that silently fails only on first launch — the hardest kind to reproduce. I skipped the check at first and spent half a day puzzled by empty results that appeared only on a fresh device.

Here's a minimal flow that checks whether a locale is supported and waits for the download if the asset isn't installed.

import Speech
 
/// Prepare the assets for the locale used for transcription.
/// The returned Bool means "may we start transcribing in this locale."
func ensureTranscriberAssets(for locale: Locale) async throws -> Bool {
    // 1. Is this locale even a SpeechTranscriber target?
    let supported = await SpeechTranscriber.supportedLocales
    guard supported.contains(where: { $0.identifier(.bcp47) == locale.identifier(.bcp47) }) else {
        return false
    }
 
    // 2. Is the asset installed on the device?
    let installed = await SpeechTranscriber.installedLocales
    if installed.contains(where: { $0.identifier(.bcp47) == locale.identifier(.bcp47) }) {
        return true
    }
 
    // 3. If not, request the download of the needed assets and wait
    let transcriber = SpeechTranscriber(locale: locale, preset: .progressiveLiveTranscription)
    if let request = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) {
        try await request.downloadAndInstall()
    }
    return true
}

Pick a preset: that matches your use. For live, incremental display, a setting that actively returns volatile results — like progressiveLiveTranscription — fits well. For processing a finished recording in bulk, a preset that favors only final results avoids flicker on screen.

Because downloading uses data, it's kind to build a path in the app: prefetch on Wi‑Fi, or let the user explicitly "enable speech recognition for English" in settings. In my case I run this download at the end of first-run onboarding and show progress with a bar, which eased the perceived uncertainty.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How iOS 26's SpeechAnalyzer removes SFSpeechRecognizer's roughly one-minute limit and cloud round-trips, and when it's worth switching
A complete, working Swift flow: checking and downloading model assets, feeding audio through an AsyncStream, and drawing volatile vs. final results
How to use SpeechAnalyzer in the native Swift Rork Max generates, and the boundary design for bridging it from an Expo / React Native native module
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Rork Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

AI Models2026-06-24
Receiving On-Device AI Output as Typed Data with Foundation Models Guided Generation
How to receive Foundation Models output as typed Swift structs instead of free text, with working code for Guided Generation and Tool Calling on-device.
AI Models2026-06-14
On-Device Image Tagging in Rork Max Swift Apps with Foundation Models Image Input
WWDC26 gave the on-device Foundation Models model image input. Here is how to add image tagging and captioning to a Rork Max Swift app entirely on-device, including the availability gate, structured output, and Vision interop.
AI Models2026-05-04
Rork Max SwiftUI Indie App Revenue Blueprint 2026 — Native Generation, AdMob, and Subscriptions Toward Six Figures
A practical blueprint for indie developers using Rork Max's SwiftUI native app generation to build app revenue toward six figures monthly. Covers ideation scoring, AdMob placement, subscription onboarding, retention design, and long-term operations from a developer with 12 years of indie iOS experience.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →