⬡ Dev Tools/2026-04-30Advanced

Rork × AI Moderation Production Guide: Reporting, Auto-Review, and Tiered Sanctions for UGC Apps

A complete production-grade guide to layering AI moderation, reporting, and tiered sanctions into your Rork UGC app. Includes a working three-layer pipeline using OpenAI Moderation, Perspective API, and Cloudflare Workers Queues.

Rork⁴⁸³ UGC Content Moderation² OpenAI Moderation Perspective API Cloudflare Workers²² Trust and Safety

✦ Premium Article

The week after I shipped my first UGC app to the App Store with Rork, I received over thirty abuse reports. It was a small community app I was running solo, but the kinds of violations that showed up were nothing I had anticipated: text containing self-harm signals, photos of children uploaded without consent, impersonation accounts using celebrity names. There were days when I spent six hours just handling reports.

What I learned from that experience is this: if you don't design moderation in three layers from day one, you'll be crushed by manual work the moment you launch. This article walks you through how to build production-grade moderation into a Rork-built UGC app — pre-submission checks, on-submission auto-review, and post-publication report handling — using the actual code I run in production today (Cloudflare Workers + Queue + KV).

We'll cover where OpenAI Moderation API shines and where it falls short, how to choose between it and Perspective API, what Apple and Google explicitly require in their guidelines, and what the EU DSA and regional child-protection laws demand. Everything an indie developer needs at implementation time, in one place.

Why moderation must be designed in three layers

The mistake I made on first launch was trying to consolidate every decision into a single "on-submission auto-review" layer. When you put everything in one place, three problems hit at once: API costs balloon, misjudgments have wide blast radius, and there's no way to handle anything reported after publication.

These days, I design every UGC feature with three distinct layers:

Pre-submission (client-side preflight): Before the user taps Submit, the client filters out obvious violations — empty strings, exceeded length, URL-only posts, emoji spam. This alone reduces server load by 40-60%.
On-submission (server-side moderation): When a post hits the API, we enqueue it for AI moderation and split the result into three buckets: instant publish, hold for review, or instant reject.
Post-publication: Reports on already-published content, additional training, and the human-review pipeline. This layer also includes the appeal mechanism that's now legally required in many jurisdictions.

Connect these three layers with Cloudflare Workers Queues and you get a clean balance between cost and response speed. The submission API enqueues and returns immediately, a worker runs the actual judgment in the background and writes the result to KV, and the frontend shows a "Reviewing" state until the verdict is final.

Pre-submission: client-side preflight in Rork

Rork compiles into React Native components, so we can attach a lightweight check at the TextInput and ImagePicker layer. This isn't a security boundary — it's a noise filter that cuts out 99% of obvious garbage before it ever leaves the device.

// app/components/PostComposer.tsx
import { useState } from "react";
import { TextInput, View, Pressable, Text } from "react-native";
 
const NG_PATTERNS = [
  /^https?:\/\/\S+$/,                    // URL-only posts not allowed
  /(.)\1{20,}/,                          // Same character repeated 21+ times
  /[\u{1F300}-\u{1F9FF}]{30,}/u,         // 30+ consecutive emojis
];
 
const MIN_CHARS = 2;
const MAX_CHARS = 1000;
 
type PreflightResult =
  | { ok: true }
  | { ok: false; reason: string };
 
function preflight(text: string): PreflightResult {
  const trimmed = text.trim();
  if (trimmed.length < MIN_CHARS) {
    return { ok: false, reason: "Posts must be at least 2 characters." };
  }
  if (trimmed.length > MAX_CHARS) {
    return { ok: false, reason: `Posts must be under ${MAX_CHARS} characters.` };
  }
  for (const pattern of NG_PATTERNS) {
    if (pattern.test(trimmed)) {
      return { ok: false, reason: "This post format is not allowed." };
    }
  }
  return { ok: true };
}
 
export function PostComposer({ onSubmit }: { onSubmit: (text: string) => void }) {
  const [text, setText] = useState("");
  const [error, setError] = useState<string | null>(null);
 
  const handleSubmit = () => {
    const result = preflight(text);
    if (!result.ok) {
      setError(result.reason);
      return;
    }
    setError(null);
    onSubmit(text.trim());
    setText("");
  };
 
  return (
    <View>
      <TextInput
        value={text}
        onChangeText={setText}
        multiline
        placeholder="What's on your mind?"
        maxLength={MAX_CHARS + 100}
      />
      {error && <Text style={{ color: "#d33" }}>{error}</Text>}
      <Pressable onPress={handleSubmit}>
        <Text>Post</Text>
      </Pressable>
    </View>
  );
}

A critical note: this preflight does not replace server-side validation. The client is tamperable, so the server must run the same checks independently. Preflight exists for UX (instant error feedback, fewer wasteful API calls) and to reduce server load — nothing more.

The reason these three patterns are first in the filter is that they're exactly the patterns I saw concentrated during the early days of my UGC launch. URL-only posts are spammers probing the system, character repetition is script-kiddie smoke testing, and emoji spam is the classic signature of teenage pile-on participation. At one point about 30% of the 5,000 daily posts hit these patterns, and the preflight alone cut server load by a third.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦If your UGC app has been drowning in reports and manual deletions, you'll have a working automated moderation pipeline you can deploy today

✦You'll learn when to use OpenAI Moderation, Perspective API, or a custom model, and how to design a stack that fits your specific product

✦You'll walk away with a three-layer (pre-submission / on-submission / post-publication) moderation architecture, with concrete Rork + Cloudflare Workers code aligned with the EU DSA, child-safety regulations, and Apple/Google guidelines

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

On-submission auto-review: when to use OpenAI Moderation vs Perspective API

After comparing options in production, the stack I've settled on is: Perspective API for the first text classification pass, OpenAI Moderation for re-evaluation of gray-zone cases and image checks. Here's how each tool fits:

Perspective API (Google Jigsaw): Free. Six categories (TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT). Multi-language including Japanese, English, Spanish. Default rate limit is 1 QPS, requestable to higher tiers.
OpenAI Moderation API: Free (omni-moderation-latest). More granular categories: hate, harassment, self-harm, sexual, violence, etc. Accepts image input on omni-moderation-latest. Average latency around 200ms.
Vision-specific image classifiers: AWS Rekognition Content Moderation, Google Cloud Vision SafeSearch, Azure Content Moderator. For NSFW, violence, and weapons detection in images, these outperform general-purpose models.

In practice: Perspective handles initial text triage, OpenAI re-evaluates anything in the gray zone, and Rekognition handles images. Perspective is excellent at multilingual support, while OpenAI is meaningfully better at picking up sarcasm and self-harm hints from context. Running both serially adds latency, but if you wrap them in a Cloudflare Workers Queue, the user never notices.

Server-side pipeline: Cloudflare Workers + Queue + KV

Here's the core of the actual pipeline I run in production. The three things that matter: the submission API returns immediately, judgment runs in the background, and the frontend reads the verdict from KV.

// workers/src/post-router.ts
import { Hono } from "hono";
 
type Env = {
  POSTS_KV: KVNamespace;
  MODERATION_QUEUE: Queue<ModerationJob>;
  PERSPECTIVE_API_KEY: string;
  OPENAI_API_KEY: string;
};
 
type ModerationJob = {
  postId: string;
  userId: string;
  text: string;
  imageUrls: string[];
  createdAt: number;
};
 
const app = new Hono<{ Bindings: Env }>();
 
app.post("/api/posts", async (c) => {
  const { text, imageUrls = [] } = await c.req.json();
  const userId = c.req.header("X-User-Id");
  if (!userId) {
    return c.json({ error: "Unauthorized" }, 401);
  }
 
  // Re-run preflight on the server (client is tamperable)
  if (text.trim().length < 2 || text.length > 1000) {
    return c.json({ error: "Invalid length" }, 400);
  }
 
  const postId = crypto.randomUUID();
  const post = {
    id: postId,
    userId,
    text,
    imageUrls,
    status: "pending_moderation" as const,
    createdAt: Date.now(),
  };
 
  // Save as pending and return 202 immediately
  await c.env.POSTS_KV.put(`post:${postId}`, JSON.stringify(post), {
    expirationTtl: 60 * 60 * 24 * 30, // 30-day cleanup target
  });
 
  // Enqueue moderation asynchronously
  await c.env.MODERATION_QUEUE.send({
    postId,
    userId,
    text,
    imageUrls,
    createdAt: post.createdAt,
  });
 
  return c.json({ postId, status: "pending_moderation" }, 202);
});
 
app.get("/api/posts/:id", async (c) => {
  const id = c.req.param("id");
  const post = await c.env.POSTS_KV.get(`post:${id}`, "json");
  if (!post) return c.json({ error: "Not found" }, 404);
  return c.json(post);
});
 
export default app;

Why does immediate return matter? Not just UX. When misjudgments happen, "rolling back a published post" is far more costly — both in recovery work and emotional load on the user — than "holding a post in review." Compared to the opaque shadowban approach you see on platforms like X, telling users explicitly "we're reviewing this" produces fewer disputes in my experience.

The consumer worker that processes the queue looks like this:

// workers/src/moderation-consumer.ts
type Verdict = "approved" | "rejected" | "needs_review";
 
async function checkText(text: string, env: Env): Promise<Verdict> {
  // (1) First-pass triage with Perspective API
  const perspective = await fetch(
    `https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=${env.PERSPECTIVE_API_KEY}`,
    {
      method: "POST",
      body: JSON.stringify({
        comment: { text },
        requestedAttributes: {
          TOXICITY: {},
          SEVERE_TOXICITY: {},
          THREAT: {},
          SEXUALLY_EXPLICIT: {},
        },
        languages: ["ja", "en"],
      }),
    },
  ).then((r) => r.json() as Promise<PerspectiveResponse>);
 
  const scores = perspective.attributeScores;
  const severe = scores.SEVERE_TOXICITY.summaryScore.value;
  const threat = scores.THREAT.summaryScore.value;
 
  if (severe > 0.8 || threat > 0.7) return "rejected";
  if (severe < 0.3 && threat < 0.3) {
    // Skip OpenAI to save cost on clearly safe posts
    return "approved";
  }
 
  // (2) Re-evaluate gray zone with OpenAI
  const openai = await fetch("https://api.openai.com/v1/moderations", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${env.OPENAI_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ model: "omni-moderation-latest", input: text }),
  }).then((r) => r.json() as Promise<OpenAIModerationResponse>);
 
  const result = openai.results[0];
  if (result.flagged) {
    // Self-harm and minor-protection violations are auto-rejected regardless of threshold
    if (
      result.categories["self-harm/intent"] ||
      result.categories["self-harm/instructions"] ||
      result.categories["sexual/minors"]
    ) {
      return "rejected";
    }
    // Other flagged content goes to human review
    return "needs_review";
  }
  return "approved";
}
 
export default {
  async queue(batch: MessageBatch<ModerationJob>, env: Env) {
    for (const message of batch.messages) {
      const job = message.body;
      try {
        const textVerdict = await checkText(job.text, env);
 
        const post = (await env.POSTS_KV.get(`post:${job.postId}`, "json")) as Post | null;
        if (!post) {
          message.ack();
          continue;
        }
 
        post.status =
          textVerdict === "approved"
            ? "published"
            : textVerdict === "rejected"
              ? "rejected"
              : "needs_review";
        post.moderatedAt = Date.now();
        post.moderationVerdict = textVerdict;
 
        await env.POSTS_KV.put(`post:${job.postId}`, JSON.stringify(post));
        message.ack();
      } catch (err) {
        // Retry on transient failures (DLQ on max retries)
        message.retry({ delaySeconds: 30 });
      }
    }
  },
};

The retry logic exists to prevent posts from getting stuck in pending forever due to transient API failures. With max_retries = 3 and dead_letter_queue configured in wrangler.toml, three failed attempts route the post to a human review queue automatically.

User reports and the "instant mute" pattern

Reports are post-publication handling — they apply to content that's already live. The crucial design choice here is that the moment a report arrives, the offending content should disappear from the reporter's view (soft mute). Full deletion or permanent bans can come later. What matters first is making the reporter feel safe, which builds long-term trust.

// workers/src/report-handler.ts
app.post("/api/posts/:id/report", async (c) => {
  const postId = c.req.param("id");
  const reporterId = c.req.header("X-User-Id");
  const { reason, detail } = await c.req.json();
 
  if (!reporterId || !reason) {
    return c.json({ error: "Bad request" }, 400);
  }
 
  const reportId = crypto.randomUUID();
  await c.env.REPORTS_KV.put(
    `report:${postId}:${reporterId}`,
    JSON.stringify({
      reportId,
      postId,
      reporterId,
      reason,
      detail: detail?.slice(0, 500),
      createdAt: Date.now(),
    }),
  );
 
  // Soft-mute the post for the reporter immediately
  const muteKey = `mute:${reporterId}:${postId}`;
  await c.env.MUTES_KV.put(muteKey, "1");
 
  // Auto-flip to needs_review when the report count crosses the threshold
  const reportCount = await incrementReportCounter(c.env, postId);
  if (reportCount >= 3) {
    const post = (await c.env.POSTS_KV.get(`post:${postId}`, "json")) as Post | null;
    if (post && post.status === "published") {
      post.status = "needs_review";
      post.flaggedReason = "auto_threshold";
      await c.env.POSTS_KV.put(`post:${postId}`, JSON.stringify(post));
    }
  }
 
  return c.json({ ok: true, reportId });
});
 
async function incrementReportCounter(env: Env, postId: string): Promise<number> {
  const key = `report_count:${postId}`;
  const current = parseInt((await env.COUNTERS_KV.get(key)) || "0", 10);
  const next = current + 1;
  await env.COUNTERS_KV.put(key, String(next), { expirationTtl: 60 * 60 * 24 * 7 });
  return next;
}

The "three reports flips to needs_review" threshold is calibrated from my own ops experience. One report alone is too vulnerable to coordinated mass-reporting attacks; five or more is too slow and users churn. Three has been the sweet spot in practice, with effectively zero false bans from the small number of mistaken reports.

Tiered sanctions and the appeal window

Apple App Store Review Guidelines 1.2 and Google Play Developer Program Policy require UGC apps to provide block, report, moderation, and appropriate consequences. Beyond that, the EU's DSA (Digital Services Act, fully in force since 2024) requires platforms to give sanctioned users a clear path to appeal.

The tiered sanctions I run look like this:

Tier 1 (soft warning): A modal on next app open: "Your post X may have violated community guidelines." The post isn't deleted. A 7-day cooldown applies before it counts as recurrence.
Tier 2 (48-hour suspension): Posting, commenting, and reporting are disabled. Read access remains. Read-only mode.
Tier 3 (7-day suspension): Same restrictions as Tier 2 but for a full week.
Tier 4 (permanent ban): All write actions disabled. Read-only forever. Account deletion is user-initiated only.

At every tier, an "appeal" link must be visible, and the appeal must reach a human within 24 hours. This single design choice satisfies both DSA obligations and Apple/Google review expectations in one move. Apple Privacy Nutrition Labels under "User Generated Content" should also reflect this appeal flow.

When an appeal arrives, the re-evaluation must be done by a human, not the AI. If you let the same model re-judge the same input, it returns the same verdict and users get nothing. I block out 30 minutes weekly as "Appeal Review Time" and process them in batch.

Legal requirements: DSA, child-protection laws, Apple/Google guidelines

Compliance is unglamorous, but treating it lightly invites store rejection, regulatory orders, and at worst — fines. Here are the minimum requirements an indie developer needs to clear:

EU DSA (fully in force 2024): Obligations scale with size. Even SMEs (under 45M EU users/year) must offer a report form, an appeals path, and an annual transparency report.
Japan's revised Provider Liability Limitation Act (2022): Disclosure procedures for offending posts are streamlined. In practice, you need a public report channel and a reasonable response window.
Tokyo Metropolitan Youth Healthy Development Ordinance and similar prefectural laws: UGC services accessible to under-18s must "efficiently address" sexual, violent, and self-harm content.
Apple App Review Guideline 1.2: Block, report, moderation, and appropriate consequences are required. Violations risk app removal even after launch.
Google Play Developer Program Policy "User Generated Content": Equivalent to Apple's requirements, plus additional rules for the Designed for Families program.

Practically, the highest-leverage move is to publish a /legal/transparency-report page with quarterly numbers (reports received, auto-rejected, human-reviewed, appeals filed, appeals upheld vs overturned). One page satisfies DSA, demonstrates transparency, and pre-empts most store-review questions.

Common mistakes and pitfalls

Patterns I've seen indie developers (myself included) fall into:

1. "Delete instantly on report"

This is a gift to bad actors. Coordinated mass-reporting from multiple accounts can silence anyone they dislike, killing community self-regulation. Always enforce report-count thresholds plus human review.

2. Treating AI moderation as the final verdict

OpenAI Moderation and Perspective API both have ~3-8% misclassification rates on English (higher on Japanese). Make AI the final judge and you'll spend your days answering "why was my post deleted?" tickets. Use AI to split into auto-reject (clearly bad), auto-approve (clearly fine), and needs_review (gray zone). Humans handle the gray.

3. Forgetting the self-harm and minor-protection exception

Standard moderation rejects when scores cross a threshold. The categories self-harm/intent, self-harm/instructions, and sexual/minors are different — these must auto-reject regardless of threshold, and self-harm cases require an immediate support-resource link. This isn't just an Apple/Google requirement; it's a baseline ethical line.

4. Dropping multilingual support

Even Japanese-language apps get flooded with English spam. Perspective API requires explicit languages: ["ja", "en"] to score Japanese well. Conversely, if you only specify ["ja"], English posts skip evaluation. Use a library like franc to detect language per request.

5. Anonymizing reporter IDs entirely

It's tempting to fully anonymize reporters, but you need reporter-ID-level counts to detect mass-reporting attacks. A user filing 100 reports in a day is almost certainly weaponizing the report system.

6. Not monitoring Cloudflare Workers Queue backlog

When the queue piles up, posts stay "in review" for hours from the user's perspective. Set up monitoring on wrangler queues consumer metrics and alert when backlog > 100 from day one.

Pre-launch checklist

Run through this list the day before release. This is the same checklist I personally walk through, end to end:

Submission API returns 202 and KV stores the post as pending_moderation
Queue picks up the post within 5 seconds and the worker runs moderation
Obviously bad content (insults, self-harm signals, identity attacks) lands in rejected on real-input testing
Obviously safe content (ordinary diary entries, photos) lands in approved
Filing 3 reports auto-flips the post to needs_review
Appeal-form submissions reach the admin email
The transparency report renders at /legal/transparency-report
App Privacy Nutrition Labels has "User Generated Content" set to ON, with block/report/moderation declared
Google Play Console "User-generated Content" questionnaire answered (yes / has reporting / has moderation)
Self-harm support links (region-appropriate, e.g. 988 Suicide & Crisis Lifeline) appear when self-harm classification fires

Clear this list and your UGC app passes Apple/Google review, satisfies DSA, and meets domestic ordinances. There's no such thing as perfect moderation, but the combination of three-layer design, AI auto-screen, human review, and an appeal path is — in my view — the strongest setup an indie developer can realistically ship today.

After launch, spend 30 minutes a day reviewing the needs_review queue manually for the first week. You'll start spotting AI misjudgment patterns, and your threshold tuning will improve dramatically. Today, the smallest possible first step is to grab a Perspective API key and run a single classification against staging.

Thank You for Reading

Rork Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.