If you've been building apps with Rork and wondering how to add a genuinely useful AI assistant, this guide is for you. There's a lot of content about Gemini integration out there, but Claude tends to get overlooked — which is a shame, because the Claude API is surprisingly clean to work with and pairs really well with Rork's development workflow.
In this tutorial, we'll build a personal AI memo assistant using Claude claude-sonnet-4-6 (Anthropic's latest mid-tier model). We're going beyond a basic chat screen — we'll handle streaming responses, conversation history, and token cost management in a way that's actually ready for production.
Why Claude claude-sonnet-4-6?
As of May 2026, claude-sonnet-4-6 is Anthropic's best balance between capability and cost. For a conversational assistant app, there are three things that stand out:
The cost-to-quality ratio is excellent. Opus 4.6 is the most powerful model, but for assistant-style interactions, claude-sonnet-4-6 performs comparably at roughly one-third the token cost. That matters a lot when you're paying per message.
The context window is large enough for real conversations. You can pass in substantial conversation history or reference documents without hitting limits. Users notice when an AI "remembers" what you said three messages ago — it makes the app feel fundamentally different.
The multilingual quality is strong. If you're building for non-English markets, Claude handles nuanced language particularly well compared to alternatives.
What We're Building: A Personal AI Memo Assistant
By the end of this tutorial, you'll have an app that:
- Lets users have back-and-forth conversations with Claude
- Maintains conversation context across multiple messages
- Shows streaming responses (text appears incrementally, not all at once)
- Saves and restores conversation history locally
This pattern applies to almost any AI-powered app: cooking assistants, study helpers, journaling apps, customer support bots. Learn it once, use it everywhere.
Getting Started: API Key Setup
Before touching Rork, grab an API key from Anthropic's console at console.anthropic.com. Keep it as an environment variable — never hardcode it in your source files.
Kick things off in Rork with this prompt:
Create an AI memo assistant app with the following:
- Chat UI with speech bubbles (user on right, AI on left)
- Text input and Send button at the bottom
- Claude API integration (model: claude-sonnet-4-6)
- Conversation history state that persists between messages
- Loading indicator while waiting for AI response
Keep the design clean and minimal with a calm color palette.
Once Rork generates the base structure, we'll layer in the API logic step by step.
Step 1: Claude API Integration
Create a dedicated service file to keep API logic separate from UI components. This makes testing and debugging much easier.
// services/claude.ts
// Claude claude-sonnet-4-6 API integration service
const CLAUDE_API_URL = 'https://api.anthropic.com/v1/messages';
export interface Message {
role: 'user' | 'assistant';
content: string;
}
interface ClaudeRequest {
model: string;
max_tokens: number;
messages: Message[];
system?: string;
}
/**
* Send messages to Claude and get a response
* @param messages Conversation history (alternating user/assistant messages)
* @param systemPrompt Defines Claude's role and behavior
*/
export async function callClaude(
messages: Message[],
systemPrompt?: string
): Promise<string> {
const apiKey = process.env.EXPO_PUBLIC_ANTHROPIC_API_KEY;
if (!apiKey) {
throw new Error('Anthropic API key is not configured');
}
const requestBody: ClaudeRequest = {
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages,
system: systemPrompt,
};
const response = await fetch(CLAUDE_API_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': apiKey,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify(requestBody),
});
if (!response.ok) {
const error = await response.json();
throw new Error(`Claude API error: ${error.error?.message ?? 'Unknown error'}`);
}
const data = await response.json();
return data.content[0]?.text ?? '';
}Two things are worth calling out here.
Always handle errors explicitly. Mobile apps run in all kinds of network conditions, with misconfigured environments, and users who find surprising ways to break things. If you swallow errors silently, you'll spend hours debugging production crashes that a proper error message would have surfaced immediately.
Use the EXPO_PUBLIC_ prefix for environment variables. In Rork (which is built on Expo), only variables with this prefix are accessible on the client side. It's a common gotcha that wastes 30 minutes the first time you encounter it.
Step 2: Conversation History Management
The "contextual memory" experience — where the AI remembers what you said earlier — comes down to how you manage the message array that gets sent with each request. Here's a custom hook that handles it cleanly:
// hooks/useChat.ts
// Conversation management hook
import { useState, useCallback } from 'react';
import { callClaude, Message } from '../services/claude';
const SYSTEM_PROMPT = `You are a helpful personal assistant.
Help users organize their thoughts, notes, and ideas.
Keep responses concise and warm in tone.`;
// Only send the most recent N messages to control token costs
const MAX_HISTORY = 10;
export function useChat() {
const [messages, setMessages] = useState<Message[]>([]);
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const sendMessage = useCallback(async (userText: string) => {
if (!userText.trim() || isLoading) return;
const userMessage: Message = { role: 'user', content: userText };
const updatedMessages = [...messages, userMessage];
setMessages(updatedMessages);
setIsLoading(true);
setError(null);
try {
// Trim history to MAX_HISTORY before sending — keeps costs predictable
const recentMessages = updatedMessages.slice(-MAX_HISTORY);
const assistantText = await callClaude(recentMessages, SYSTEM_PROMPT);
const assistantMessage: Message = {
role: 'assistant',
content: assistantText,
};
setMessages(prev => [...prev, assistantMessage]);
} catch (err) {
const errorMessage = err instanceof Error
? err.message
: 'Failed to send message. Please try again.';
setError(errorMessage);
// Revert the optimistic update so the user can retry
setMessages(messages);
} finally {
setIsLoading(false);
}
}, [messages, isLoading]);
const clearHistory = useCallback(() => {
setMessages([]);
setError(null);
}, []);
return { messages, isLoading, error, sendMessage, clearHistory };
}The MAX_HISTORY = 10 constant is doing quiet important work. As conversations get longer, sending the full history every time adds up fast — both in latency and cost. Keeping only the most recent 10 messages means the context window stays tight, responses come back faster, and costs stay predictable. Adjust this number based on your app's use case.
Step 3: Streaming Responses
The difference between "AI returns an answer after 3 seconds" and "AI appears to be typing in real time" is enormous from a user experience perspective. Claude's API supports server-sent events (SSE) streaming, and adding it is more straightforward than most developers expect:
// services/claude.ts — add streaming variant
export async function callClaudeStreaming(
messages: Message[],
onChunk: (text: string) => void,
onComplete: () => void,
systemPrompt?: string
): Promise<void> {
const apiKey = process.env.EXPO_PUBLIC_ANTHROPIC_API_KEY;
if (!apiKey) {
throw new Error('Anthropic API key is not configured');
}
const response = await fetch(CLAUDE_API_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': apiKey,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
stream: true, // Enable SSE streaming
messages,
system: systemPrompt,
}),
});
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
const reader = response.body?.getReader();
const decoder = new TextDecoder();
if (!reader) {
throw new Error('Failed to get response stream reader');
}
let buffer = '';
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() ?? '';
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const data = line.slice(6);
if (data === '[DONE]') continue;
try {
const parsed = JSON.parse(data);
const text = parsed.delta?.text ?? '';
if (text) onChunk(text);
} catch {
// Incomplete chunks are expected — ignore parse failures
}
}
}
} finally {
reader.releaseLock(); // Always release, even on error
onComplete();
}
}The finally block is non-negotiable. If you don't release the reader lock — whether the stream completed normally or threw an error — subsequent requests will silently stall. It's the kind of bug that only shows up under specific conditions and is painful to diagnose.
Step 4: Wiring It Into the UI
With the logic layer built, have Rork update the chat UI:
Update the chat screen:
1. Disable the Send button while isLoading is true
2. Show a typing animation for the AI's response while it streams in
3. Display error messages in a dismissible red banner with a Retry button
4. Add a "Clear conversation" button in the header
5. Auto-scroll to the latest message when new content arrives
The useChat hook is already implemented — connect sendMessage,
clearHistory, isLoading, and error to the appropriate UI elements.
Review what Rork generates and verify that each of the four hook values maps to the right UI component. Fine-tune with follow-up prompts, or edit directly in Rork Max.
Managing API Costs: Numbers Worth Knowing
Claude API pricing is per-token — you pay for what you send and receive. For a conversational assistant built on claude-sonnet-4-6, here's a rough estimate:
An average conversation turn (user message + AI response, around 500 tokens combined) costs a fraction of a cent. In practice, a user who has 15 conversations per day will generate API costs in the range of a few cents per month. Until you have significant scale, cost isn't the thing to lose sleep over.
That said, once your app starts gaining traction, track your cost-per-active-user. If you're planning a subscription tier, you want that number well under your subscription price. The MAX_HISTORY trim and keeping max_tokens reasonable (1024 is generous for most assistant responses) will handle the majority of optimization.
From Here
You now have a working AI assistant app with proper streaming, context management, and error handling. The architecture here — separate service layer, custom hook for state, prompts to Rork for UI updates — scales well as your app grows.
The natural next step is to experiment with the system prompt. Changing just those few lines transforms the same codebase into a different product: a cooking assistant, a writing coach, a language tutor. That flexibility is one of the things that makes Claude-powered apps worth building.
For more on prompt design for AI features, see the prompt engineering mastery guide. For extending this into a more complex multi-feature AI app, the Claude API assistant app deep-dive covers additional patterns.