Once, in a café I'd wandered into with a friend, a song played that felt familiar but neither of us could name, and we sat there stumped for a while. It would be fun, I thought, to have that "hold up your phone and instantly know" experience as a feature of one of my own small apps. I've built a handful of utility apps as an indie developer, and the feel of "point it and it just knows" carries a pleasure that needs no explanation.
The doorway to that is ShazamKit. It's Apple's music-recognition machinery you can embed in an app, matching a song from the ambient sound or from your app's own playback. It used to require you to manage the recording yourself, but with iOS 17's SHManagedSession you can hand most of that off. And because Rork Max now generates native Swift, this API is something you can wire into an indie app cleanly. Here is what I learned building a small "point it and it knows" app, step by step.
Hand-rolled classic, or the managed session?
ShazamKit offers two broad ways to build. Choosing this first settles the rest of your design.
| Aspect | SHSession (hand-rolled) | SHManagedSession (iOS 17+) |
|---|---|---|
| Recording management | You build AVAudioEngine yourself | Left to the framework |
| Mic permission | You request and check it | The session takes care of it |
| Receiving results | Delegate | Async result sequence (results()) |
| State visibility | Manage it yourself | Observe idle / prerecording / matching |
| Best for | Fine control, e.g. matching against playback | The classic "point and recognize ambient song" |
For the classic use — hold up the phone and identify the song playing around you — I reach for SHManagedSession without hesitation. It carries the recording and mic-permission burden for you, so the amount you write visibly shrinks. I hand-rolled AVAudioEngine at first, meaning to learn, but permissions and buffer wrangling ate more time than expected; switching to SHManagedSession left only the essence.
Map the state onto SwiftUI
SHManagedSession conforms to Observable, so SwiftUI picks up its state changes directly. There are three states:
idle: waiting, neither recording nor matchingprerecording: prepared for matching, prerecording ahead of timematching: actively attempting a match
Reflecting these three straight onto the button's look tells the user "what it's doing right now." Here's a minimal screen that changes its display by state.
import SwiftUI
import ShazamKit
@MainActor
final class RecognizerModel: ObservableObject {
let session = SHManagedSession()
@Published var title: String?
@Published var artist: String?
@Published var isWorking = false
/// Point and match. Results arrive from the results() async sequence.
func recognize() async {
isWorking = true
defer { isWorking = false }
// Attempt one match, stop on the first result
for await result in session.results {
switch result {
case .match(let match):
if let item = match.mediaItems.first {
self.title = item.title
self.artist = item.artist
}
return // Stop once we've got a hit
case .noMatch:
self.title = "No match"
return
case .error(let error, _):
self.title = "Error: \(error.localizedDescription)"
return
@unknown default:
return
}
}
}
}
struct RecognizeView: View {
@StateObject private var model = RecognizerModel()
var body: some View {
VStack(spacing: 24) {
if let title = model.title {
Text(title).font(.title2).bold()
if let artist = model.artist {
Text(artist).foregroundStyle(.secondary)
}
} else {
Text(model.isWorking ? "Listening…" : "Tap to recognize a song")
.foregroundStyle(.secondary)
}
Button {
Task { await model.recognize() }
} label: {
Image(systemName: "shazam.logo.fill")
.font(.system(size: 64))
}
.disabled(model.isWorking)
}
.padding()
}
}session.results returns an async sequence of match results. Returning once you get a hit is because one identified song is enough for this use. To keep identifying continuously, don't return — keep spinning the sequence and results arrive each time the playing song changes.