Why On-Device AI Changes Everything for iOS Developers
With iOS 26, Apple gave developers something genuinely exciting — direct access to the same large language model that powers Apple Intelligence. The Foundation Models framework wraps a compact, quantized ~3 billion-parameter model that runs entirely on the user's device. No API keys, no cloud costs, no internet connection required. Every byte of data stays private.
That shift matters way more than it might seem at first glance.
Cloud-based language models introduce latency, require network availability, and raise privacy concerns that can be deal-breakers for many app categories — health, finance, education, journaling. Foundation Models removes all three obstacles in one stroke, which (if you've ever dealt with HIPAA compliance for a health app) is kind of a big deal.
This guide walks through every major capability of the framework: checking model availability, prompting with sessions, streaming responses into SwiftUI views, generating type-safe structured output with the @Generable macro, constraining output with @Guide, calling custom tools, and handling the real-world edge cases that tutorials usually skip. All code targets Xcode 26 and iOS 26.
Prerequisites and Project Setup
Before writing any code, make sure your environment meets these requirements:
- Xcode 26 or later installed on your Mac.
- macOS Tahoe (macOS 26) as the host operating system.
- A physical device with Apple Intelligence support (A17 Pro chip or later, or any M-series chip) for on-device testing. The Simulator works for compilation and basic testing, but model inference is limited.
- Apple Intelligence enabled in Settings > Apple Intelligence & Siri.
Create a new SwiftUI project in Xcode 26 and add the import at the top of any file where you'll be using the framework:
import FoundationModels
That's it. No SPM packages, no CocoaPods, no entitlements. The framework ships right with the SDK.
Checking Model Availability Before You Start
The on-device model isn't guaranteed to be ready. The device might not be eligible, the user might have Apple Intelligence turned off, or the model assets might still be downloading. Skipping this check leads to runtime crashes or silent failures — and honestly, both are unacceptable in production code.
The Quick Check
For a simple guard clause, the isAvailable property is enough:
let model = SystemLanguageModel.default
guard model.isAvailable else {
// Show a fallback UI or disable AI features
return
}
The Detailed Check
When you need to show the user why the feature is unavailable and what they can do about it, switch on the availability property instead:
struct AIFeatureView: View {
var body: some View {
Group {
switch SystemLanguageModel.default.availability {
case .available:
ChatView()
case .unavailable(.deviceNotEligible):
ContentUnavailableView(
"Device Not Supported",
systemImage: "iphone.slash",
description: Text("This feature requires iPhone 15 Pro or later.")
)
case .unavailable(.appleIntelligenceNotEnabled):
ContentUnavailableView(
"Enable Apple Intelligence",
systemImage: "brain",
description: Text("Turn on Apple Intelligence in Settings to use this feature.")
)
case .unavailable(.modelNotReady):
ProgressView("Preparing AI model…")
default:
ContentUnavailableView(
"Unavailable",
systemImage: "exclamationmark.triangle",
description: Text("AI features are not available right now.")
)
}
}
}
}
The .modelNotReady case is temporary — the model assets may still be downloading in the background. Showing a progress indicator and rechecking periodically is the right call here. Don't just throw up an error screen and call it a day.
Creating a Session and Generating Your First Response
All interaction with the model goes through LanguageModelSession. A session holds the conversation transcript — the running history of prompts and responses — and uses it as context for subsequent calls.
import FoundationModels
let session = LanguageModelSession()
let response = try await session.respond(to: "Explain the MVVM pattern in two sentences.")
print(response.content)
// "MVVM separates an app into Model, View, and ViewModel layers..."
The respond(to:) method is asynchronous. It sends the prompt to the on-device model, waits for the full response, and returns a LanguageModelSession.Response whose .content property holds the generated text. Pretty straightforward.
Adding System Instructions
Instructions set the model's persona, tone, and constraints before the user ever sends a prompt. Unlike prompts, instructions are defined once by the developer and persist across the entire session.
let session = LanguageModelSession(
instructions: """
You are a senior iOS engineer who gives concise, code-first answers.
Always include a Swift code example when relevant.
Never invent API names that do not exist in the Apple SDK.
"""
)
let answer = try await session.respond(
to: "How do I debounce a search field in SwiftUI?"
)
print(answer.content)
Instructions and prompts serve different roles. Instructions shape how the model responds; prompts supply what it responds to. Keeping them separate makes your code easier to reason about and — this is the part that matters in practice — harder for users to override through creative prompt injection.
Streaming Responses Into SwiftUI Views
Calling respond(to:) waits for the entire answer before returning. That works fine for short outputs, but for anything longer than a sentence or two the delay feels sluggish. Streaming fixes this by delivering partial results as the model generates them, token by token.
import SwiftUI
import FoundationModels
struct StreamingDemoView: View {
@State private var output = ""
@State private var isGenerating = false
private let session = LanguageModelSession(
instructions: "You are a helpful Swift programming assistant."
)
var body: some View {
ScrollView {
Text(output)
.padding()
.contentTransition(.opacity)
.animation(.smooth, value: output)
}
.overlay(alignment: .bottom) {
Button("Generate Explanation") {
Task { await generate() }
}
.buttonStyle(.borderedProminent)
.disabled(isGenerating)
.padding()
}
}
private func generate() async {
isGenerating = true
output = ""
do {
let stream = session.streamResponse(
to: "Explain how Swift actors prevent data races. Include a code example."
)
for try await partial in stream {
output = partial.content
}
} catch {
output = "Generation failed: \(error.localizedDescription)"
}
isGenerating = false
}
}
A few things worth noticing here. The streamResponse(to:) method returns an AsyncThrowingSequence of partial snapshots. Each snapshot contains the cumulative text generated so far, so you simply assign it to your state variable — no need to append strings manually. The .contentTransition(.opacity) modifier gives the text a smooth fade-in as new tokens appear, which is a nice touch that takes zero extra effort.
Guided Generation: Type-Safe Structured Output with @Generable
Free-form text is great for chat interfaces, but most real features need structured data. You want the model to return a recipe with separate fields for title, ingredients, and steps — not a blob of markdown you have to parse yourself at 2 AM. The @Generable macro makes this possible through constrained decoding: the model is forced to produce valid output matching your Swift types at the token level.
Defining a Generable Type
import FoundationModels
@Generable
struct QuizQuestion {
var question: String
var choices: [String]
var correctAnswerIndex: Int
var explanation: String
}
At compile time, @Generable synthesizes a JSON schema from the struct's stored properties and an initializer that decodes the model's output into a fully typed Swift value. Every stored property must itself be generable — primitive types like String, Int, Bool, Double, arrays of generable types, and optionals all work out of the box.
Generating a Typed Response
let session = LanguageModelSession(
instructions: "Generate quiz questions about Swift programming."
)
let question: QuizQuestion = try await session.respond(
to: "Create a question about optionals",
generating: QuizQuestion.self
).content
print(question.question) // "What is the default value of an optional in Swift?"
print(question.choices) // ["nil", "0", "false", "empty string"]
print(question.correctAnswerIndex) // 0
No JSON parsing. No string manipulation. The result is a native Swift struct you can use immediately in your view models, databases, or network payloads. I genuinely wish more frameworks worked this way.
Streaming Structured Output
Guided generation supports streaming too. The framework generates a companion PartiallyGenerated type where every property is optional. Fields fill in one by one as the model produces them:
let stream = session.streamResponse(
to: "Create a question about closures",
generating: QuizQuestion.self
)
for try await partial in stream {
if let q = partial.question {
print("Question ready: \(q)")
}
if let choices = partial.choices {
print("Choices ready: \(choices)")
}
}
This is perfect for SwiftUI views that display each field as soon as it becomes available — it creates a polished progressive disclosure effect that users really appreciate.
Fine-Tuning Output with the @Guide Macro
So @Generable controls the structure of the output. But what about the content? That's where @Guide comes in — it lets you attach natural-language descriptions and programmatic constraints to individual properties.
@Generable
struct RecipeCard {
@Guide(description: "A short, catchy recipe title under 60 characters")
var title: String
@Guide(description: "Preparation time in minutes", .range(5...120))
var prepTimeMinutes: Int
@Guide(.count(4...8))
var ingredients: [String]
@Guide(description: "Step-by-step cooking instructions")
var steps: [String]
@Guide(.anyOf(["Easy", "Medium", "Hard"]))
var difficulty: String
}
Here's what each guide does:
description:provides natural-language context to the model about what the property should contain..range(5...120)constrains a numeric value to a specific range..count(4...8)constrains the number of elements in an array..anyOf([...])restricts the value to one of the specified options — basically an enum without defining one.
You can also use regex patterns with @Guide to constrain string formats — for example, ensuring a property matches an email pattern or a specific identifier format. Swift's regex builder syntax is fully supported here.
One important detail that'll save you debugging time: properties are generated in declaration order. If steps depends on ingredients, put ingredients first. The model can only reference properties it has already generated.
Tool Calling: Extending the Model with Live Data
The on-device model has no access to the internet, your app's database, or any external service. Its knowledge is frozen at training time. Tool calling bridges that gap by letting you define Swift functions that the model can invoke during generation to fetch real-time data or perform actions.
Defining a Tool
A tool is a struct conforming to the Tool protocol. You provide a name, description, an Arguments struct (annotated with @Generable), and a call method:
import FoundationModels
struct FetchWeatherTool: Tool {
let name = "fetchWeather"
let description = "Fetches the current weather for a given city name"
@Generable
struct Arguments {
@Guide(description: "The city name to look up weather for")
var city: String
}
func call(arguments: Arguments) async throws -> String {
// In production, call a real weather API here
let weather = try await WeatherService.current(for: arguments.city)
return "\(weather.condition), \(weather.temperature)°C in \(arguments.city)"
}
}
Registering Tools with a Session
let weatherTool = FetchWeatherTool()
let session = LanguageModelSession(
tools: [weatherTool],
instructions: """
You are a travel planning assistant.
When the user asks about weather, always use the fetchWeather tool to get current conditions.
"""
)
let response = try await session.respond(
to: "What's the weather like in Tokyo right now?"
)
print(response.content)
// "It's currently partly cloudy and 18°C in Tokyo."
The model decides autonomously whether and when to call a tool based on the prompt and the tool's description. You don't explicitly invoke the tool — the model does it during generation, receives the result, and incorporates it into its answer. It's surprisingly seamless once you see it in action.
Multiple Tools
You can register multiple tools in a single session. The model will choose the appropriate tool for each query — or call several tools in sequence if the task requires it:
let session = LanguageModelSession(
tools: [weatherTool, hotelSearchTool, restaurantFinderTool],
instructions: "You are a travel concierge. Use your tools to give accurate, real-time information."
)
Session Management and Conversation Context
A LanguageModelSession maintains a transcript of all prompts and responses. Each new call to respond(to:) or streamResponse(to:) includes the full transcript as context, so the model can reference earlier parts of the conversation.
let session = LanguageModelSession()
// First turn
_ = try await session.respond(to: "My favorite programming language is Swift.")
// Second turn — the model remembers the first turn
let response = try await session.respond(to: "Why is my favorite language a good choice for iOS?")
print(response.content)
// "Swift is an excellent choice for iOS development because..."
Here's the catch, though. The current combined input and output token limit is 4,096. For long conversations, the earliest messages will eventually get dropped from context. If your app needs to maintain critical context across many turns, consider summarizing previous exchanges and injecting the summary as an instruction update.
Inspecting the Transcript
You can inspect what the model actually sees by reading the session's transcript property. This is invaluable for debugging — it shows you every prompt, response, and tool call in order:
for entry in session.transcript.entries {
print(entry)
}
One Session per Feature
Create a new LanguageModelSession for each independent feature or conversation thread. Reusing a single session across unrelated features pollutes the transcript and wastes the limited context window. Trust me on this one — I learned the hard way when my recipe generator started referencing weather data from a completely different feature.
Performance Optimization: Making It Feel Instant
The on-device model is fast, but loading it into memory takes time — especially the first call after a cold start. Here are some concrete strategies to minimize perceived latency.
Prewarm the Model
Call prewarm() as early as possible when you know the user will likely use AI features. This loads model weights into memory in the background:
@main
struct MyApp: App {
init() {
Task {
try? await LanguageModelSession().prewarm()
}
}
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
If your app only uses the model on a specific screen, prewarm when that screen's parent appears instead of at launch. No need to consume resources for a feature the user might never reach.
Use Streaming by Default
Even when you don't need to display partial results, streaming feels faster because the first tokens arrive almost immediately. Users perceive the feature as responsive even if total generation time is identical. Perception is reality when it comes to UX.
Keep @Generable Types Small
Generating a struct with 20 properties is noticeably slower than generating one with 5. If you need complex output, consider breaking it into multiple smaller generations or generating an array of simpler items instead.
Control Temperature for Determinism
A lower temperature produces more predictable (and generally faster) output because the model spends less time evaluating probability distributions:
let options = GenerationOptions(temperature: 0.3)
let response = try await session.respond(to: prompt, options: options)
Real-World Architecture: Putting It All Together
Alright, let's bring everything together. Here's a complete, production-ready pattern that wraps Foundation Models into a clean SwiftUI architecture. This example builds a recipe suggestion feature that streams structured output into the UI.
import SwiftUI
import FoundationModels
// MARK: - Model
@Generable
struct Recipe {
@Guide(description: "A creative recipe name")
var name: String
@Guide(.count(3...10))
var ingredients: [String]
@Guide(description: "Step-by-step cooking instructions")
var steps: [String]
@Guide(.range(5...180))
var cookingTimeMinutes: Int
@Guide(.anyOf(["Easy", "Medium", "Hard"]))
var difficulty: String
}
// MARK: - ViewModel
@Observable
final class RecipeViewModel {
var recipe: Recipe.PartiallyGenerated?
var isGenerating = false
var errorMessage: String?
private var session: LanguageModelSession?
func generate(prompt: String) async {
guard SystemLanguageModel.default.isAvailable else {
errorMessage = "AI features are not available on this device."
return
}
isGenerating = true
errorMessage = nil
recipe = nil
let session = LanguageModelSession(
instructions: "You are a professional chef. Generate creative, practical recipes."
)
self.session = session
do {
let stream = session.streamResponse(
to: prompt,
generating: Recipe.self
)
for try await partial in stream {
recipe = partial
}
} catch {
errorMessage = "Failed to generate recipe: \(error.localizedDescription)"
}
isGenerating = false
}
}
// MARK: - View
struct RecipeSuggestionView: View {
@State private var viewModel = RecipeViewModel()
@State private var userPrompt = ""
var body: some View {
NavigationStack {
ScrollView {
VStack(alignment: .leading, spacing: 16) {
promptField
if let error = viewModel.errorMessage {
Text(error)
.foregroundStyle(.red)
.font(.callout)
}
if let recipe = viewModel.recipe {
recipeCard(recipe)
}
}
.padding()
}
.navigationTitle("Recipe AI")
}
}
private var promptField: some View {
HStack {
TextField("What are you in the mood for?", text: $userPrompt)
.textFieldStyle(.roundedBorder)
Button("Generate") {
Task { await viewModel.generate(prompt: userPrompt) }
}
.buttonStyle(.borderedProminent)
.disabled(userPrompt.isEmpty || viewModel.isGenerating)
}
}
@ViewBuilder
private func recipeCard(_ recipe: Recipe.PartiallyGenerated) -> some View {
VStack(alignment: .leading, spacing: 12) {
if let name = recipe.name {
Text(name)
.font(.title2.bold())
}
if let difficulty = recipe.difficulty {
Text(difficulty)
.font(.caption)
.padding(.horizontal, 8)
.padding(.vertical, 4)
.background(.blue.opacity(0.1), in: .capsule)
}
if let ingredients = recipe.ingredients {
Text("Ingredients")
.font(.headline)
ForEach(ingredients, id: \.self) { item in
Label(item, systemImage: "circle.fill")
.font(.subheadline)
}
}
if let steps = recipe.steps {
Text("Instructions")
.font(.headline)
ForEach(Array(steps.enumerated()), id: \.offset) { index, step in
HStack(alignment: .top) {
Text("\(index + 1).")
.fontWeight(.semibold)
.frame(width: 24, alignment: .trailing)
Text(step)
}
.font(.subheadline)
}
}
}
.animation(.smooth, value: recipe.name)
}
}
This pattern separates concerns cleanly: the @Generable struct defines the data contract, the @Observable view model manages the session lifecycle and error states, and the view simply binds to partially generated output. Each field appears in the UI the moment the model produces it — and that progressive reveal feels really polished in practice.
Common Pitfalls and How to Avoid Them
Forgetting the Availability Check
Calling respond(to:) when the model is unavailable throws an error. Always guard with an availability check first — and provide a meaningful fallback UI rather than a generic error screen. Your users will thank you.
Declaring @Generable Properties in the Wrong Order
The model generates properties in declaration order. If a property depends on another property's value, it must come after the dependency. For example, a summary field should be declared after the properties it summarizes. This one bites people more often than you'd think.
Exceeding the 4,096 Token Limit
The combined input (instructions + transcript + prompt) and output can't exceed 4,096 tokens. For long conversations, create a new session periodically and inject a summary of the previous context as instructions. It's a bit of extra work, but it keeps things running smoothly.
Using Foundation Models for Tasks It Wasn't Designed For
The on-device model excels at language understanding, summarization, extraction, and short-form generation. It's not a general-knowledge chatbot and shouldn't be used as one. For complex reasoning, multi-step math, or tasks requiring vast world knowledge, consider combining it with a cloud API as a fallback.
Frequently Asked Questions
What devices support the Foundation Models framework?
Foundation Models requires Apple Intelligence, which is available on iPhone 15 Pro and later (A17 Pro chip), all iPads and Macs with M1 or later, and Apple Vision Pro. The device must also be running iOS 26, iPadOS 26, macOS Tahoe, or visionOS 26. And yes, Apple Intelligence needs to be enabled in Settings — it's not on by default.
Does Foundation Models work offline?
Yes! Once the model assets have been downloaded to the device, inference runs entirely on-device with no network dependency. This makes it ideal for privacy-sensitive apps, remote or low-connectivity environments, and features that simply must work regardless of network conditions. The initial model download does require an internet connection, though.
How is Foundation Models different from using the ChatGPT or Claude API?
The key differences are privacy (data never leaves the device), cost (no per-token API fees), latency (no network round-trips), and offline support. The trade-off is capability — the ~3B parameter on-device model is smaller than cloud models and less capable at complex reasoning or tasks requiring broad world knowledge. In practice, many apps benefit from using Foundation Models for latency-sensitive or privacy-critical tasks while falling back to a cloud API for more demanding queries.
Can I use @Generable with enums and nested types?
Absolutely. The @Generable macro works with both structs and enums. You can nest generable types inside other generable types — for example, a @Generable struct Quiz containing an array of @Generable struct Question. Every stored property must itself be a generable type (primitives, arrays, optionals, or other @Generable types).
What is the maximum token limit for input and output?
The current combined limit for input (instructions, transcript, and prompt) and output is 4,096 tokens. For context, one token is roughly three-quarters of a word in English. If you need to process longer content, break it into chunks and process each chunk in a separate session call. You can also keep sessions shorter by summarizing earlier conversation turns rather than carrying the full transcript forward.