adrian-altner.de/src/content/posts/en/2026/03/migrating-vision-script-from-openai-to-claude.md
Adrian Altner 4bf4eb03b1 Add new posts for Image Voice Memos, Initial VPS Setup on Debian, Local Webmention Avatars, Security Headers for Astro with Caddy, and Setting up Forgejo Actions Runner
- Created a new post on Image Voice Memos detailing a macOS app for browsing photos and recording voice memos with automatic transcription.
- Added a guide for Initial VPS Setup on Debian covering system updates, user creation, and SSH hardening.
- Introduced a post on caching webmention avatars locally at build time to enhance privacy and comply with CSP.
- Documented the implementation of security headers for an Astro site behind Caddy, focusing on GDPR compliance and CSP.
- Set up a Forgejo Actions runner for self-hosted CI/CD, detailing the installation and configuration process for automated deployments.
2026-04-22 23:00:10 +02:00

6.2 KiB

title description pubDate category tags seriesParent seriesOrder
Migrating the Vision Script from OpenAI to Claude How scripts/vision.ts was rewritten to use the Anthropic SDK with claude-opus-4-6 and tool use instead of OpenAI's json_schema response format. 2026-03-19T12:00:00+01:00 en/development
astro
photography
openai
obsidian-to-vps-pipeline-with-sync-pull-and-redeploy 3

The script that generates photo sidecar files — scripts/vision.ts — was originally written against the OpenAI API. Moving it to Claude was mostly a rewrite of the request envelope; the surrounding infrastructure — EXIF extraction, concurrency control, batching, CLI flags — stayed exactly the same.

What the script does

scripts/vision.ts processes new JPG files in the photo albums directory. For each image without a .json sidecar, it:

  1. Extracts EXIF metadata with exiftool (camera, lens, aperture, ISO, focal length, shutter speed, GPS)
  2. Sends the image to an AI Vision API to generate alt text, title suggestions, and tags
  3. Merges both into a JSON file written next to the image

The resulting sidecar drives the photo stream on this site — alt text for accessibility, titles for the detail page, EXIF for the metadata panel.

Dependency and environment

# before
pnpm add openai

# after
pnpm add @anthropic-ai/sdk
# before
OPENAI_API_KEY=sk-...

# after
ANTHROPIC_API_KEY=sk-ant-...

Client

// before
import OpenAI from "openai";
const openai = new OpenAI();

// after
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ maxRetries: 0 });

maxRetries: 0 disables the SDK's built-in retry behaviour. The script manages its own retry loop with configurable backoff, so double-retrying would be redundant.

Structured output

Problem: The script needs a guaranteed-shape JSON response — title ideas, description, tags — with no parsing surprises. OpenAI and Claude approach this differently.

Implementation (before): OpenAI used a json_schema response format to constrain the model output:

const completion = await openai.chat.completions.create({
  model: "gpt-5.1-chat-latest",
  max_completion_tokens: 2048,
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "visionResponse",
      strict: true,
      schema: { ... },
    },
  },
  messages: [{ role: "user", content: [...] }],
});

const result = JSON.parse(completion.choices[0].message.content);

Implementation (after): Claude uses tool use with tool_choice: { type: "tool" } to force a specific tool call — this guarantees the model always responds with the schema, with no JSON parsing step:

const response = await anthropic.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 2048,
  tools: [{
    name: "vision_response",
    description: "Return the vision analysis of the image.",
    input_schema: {
      type: "object",
      additionalProperties: false,
      properties: {
        title_ideas: { type: "array", items: { type: "string" } },
        description: { type: "string" },
        tags: { type: "array", items: { type: "string" } },
      },
      required: ["title_ideas", "description", "tags"],
    },
  }],
  tool_choice: { type: "tool", name: "vision_response" },
  messages: [{
    role: "user",
    content: [
      {
        type: "image",
        source: { type: "base64", media_type: "image/jpeg", data: encodedImage },
      },
      { type: "text", text: prompt },
    ],
  }],
});

const toolUseBlock = response.content.find((b) => b.type === "tool_use");
const result = toolUseBlock.input; // already a typed object, no JSON.parse needed

Solution: toolUseBlock.input arrives as a typed object — no JSON.parse, no schema re-validation. The image content block format also differs: OpenAI uses { type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } }, while Anthropic uses a dedicated source block with type: "base64" and a separate media_type field.

Rate limit handling

Problem: The script catches rate limit errors and retries with exponential backoff. The old code detected 429s by poking at a raw status property and regex-parsing the error message for a retry-after hint — fragile in both directions.

Implementation: Switch to the SDK's typed exception class:

// before — checking a raw status property
function isRateLimitError(error: unknown): boolean {
  return (error as { status?: number }).status === 429;
}

function extractRetryAfterMs(error: unknown): number | null {
  // parsed "Please try again in Xs" from error message
}

// after — using Anthropic's typed exception
function isRateLimitError(error: unknown): boolean {
  return error instanceof Anthropic.RateLimitError;
}

function extractRetryAfterMs(error: unknown): number | null {
  if (!(error instanceof Anthropic.RateLimitError)) return null;
  const retryAfter = error.headers?.["retry-after"];
  if (retryAfter) {
    const seconds = Number.parseFloat(retryAfter);
    if (Number.isFinite(seconds) && seconds > 0) return Math.ceil(seconds * 1000);
  }
  return null;
}

Solution: instanceof Anthropic.RateLimitError is the detector, and the retry-after header is read off the exception directly. Everything else — EXIF extraction, concurrency control, batching, file writing, CLI flags — stayed exactly the same.

Why claude-opus-4-6

claude-opus-4-6 is Anthropic's most capable model and handles dense visual scenes, low-light photography, and culturally specific subjects well. For a batch script that runs offline before a deploy, quality matters more than latency.

What to take away

  • Claude's tool use with tool_choice: { type: "tool" } is a cleaner way to get structured output than OpenAI's json_schema — the result comes back as a typed object, no JSON.parse step.
  • Image payloads differ: OpenAI takes a data-URL in image_url; Anthropic wants a source block with explicit media_type.
  • Use the SDK's typed exception (Anthropic.RateLimitError) and read retry-after from its headers — don't regex the error message.
  • Set maxRetries: 0 on the SDK if you already have your own backoff loop. Double-retrying wastes tokens and quota.