Add new posts for Image Voice Memos, Initial VPS Setup on Debian, Local Webmention Avatars, Security Headers for Astro with Caddy, and Setting up Forgejo Actions Runner
- Created a new post on Image Voice Memos detailing a macOS app for browsing photos and recording voice memos with automatic transcription. - Added a guide for Initial VPS Setup on Debian covering system updates, user creation, and SSH hardening. - Introduced a post on caching webmention avatars locally at build time to enhance privacy and comply with CSP. - Documented the implementation of security headers for an Astro site behind Caddy, focusing on GDPR compliance and CSP. - Set up a Forgejo Actions runner for self-hosted CI/CD, detailing the installation and configuration process for automated deployments.
This commit is contained in:
parent
9d22d93361
commit
4bf4eb03b1
69 changed files with 4904 additions and 344 deletions
|
|
@ -0,0 +1,164 @@
|
|||
---
|
||||
title: Migrating the Vision Script from OpenAI to Claude
|
||||
description: 'How scripts/vision.ts was rewritten to use the Anthropic SDK with claude-opus-4-6 and tool use instead of OpenAI''s json_schema response format.'
|
||||
pubDate: '2026-03-19T12:00:00+01:00'
|
||||
category: en/development
|
||||
tags:
|
||||
- astro
|
||||
- photography
|
||||
- openai
|
||||
seriesParent: obsidian-to-vps-pipeline-with-sync-pull-and-redeploy
|
||||
seriesOrder: 3
|
||||
---
|
||||
|
||||
The script that generates photo sidecar files — `scripts/vision.ts` — was originally written against the OpenAI API. Moving it to Claude was mostly a rewrite of the request envelope; the surrounding infrastructure — EXIF extraction, concurrency control, batching, CLI flags — stayed exactly the same.
|
||||
|
||||
## What the script does
|
||||
|
||||
`scripts/vision.ts` processes new JPG files in the photo albums directory. For each image without a `.json` sidecar, it:
|
||||
|
||||
1. Extracts EXIF metadata with `exiftool` (camera, lens, aperture, ISO, focal length, shutter speed, GPS)
|
||||
2. Sends the image to an AI Vision API to generate alt text, title suggestions, and tags
|
||||
3. Merges both into a JSON file written next to the image
|
||||
|
||||
The resulting sidecar drives the photo stream on this site — alt text for accessibility, titles for the detail page, EXIF for the metadata panel.
|
||||
|
||||
## Dependency and environment
|
||||
|
||||
```bash
|
||||
# before
|
||||
pnpm add openai
|
||||
|
||||
# after
|
||||
pnpm add @anthropic-ai/sdk
|
||||
```
|
||||
|
||||
```bash
|
||||
# before
|
||||
OPENAI_API_KEY=sk-...
|
||||
|
||||
# after
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
```
|
||||
|
||||
## Client
|
||||
|
||||
```ts
|
||||
// before
|
||||
import OpenAI from "openai";
|
||||
const openai = new OpenAI();
|
||||
|
||||
// after
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
const anthropic = new Anthropic({ maxRetries: 0 });
|
||||
```
|
||||
|
||||
`maxRetries: 0` disables the SDK's built-in retry behaviour. The script manages its own retry loop with configurable backoff, so double-retrying would be redundant.
|
||||
|
||||
## Structured output
|
||||
|
||||
**Problem:** The script needs a guaranteed-shape JSON response — title ideas, description, tags — with no parsing surprises. OpenAI and Claude approach this differently.
|
||||
|
||||
**Implementation (before):** OpenAI used a `json_schema` response format to constrain the model output:
|
||||
|
||||
```ts
|
||||
const completion = await openai.chat.completions.create({
|
||||
model: "gpt-5.1-chat-latest",
|
||||
max_completion_tokens: 2048,
|
||||
response_format: {
|
||||
type: "json_schema",
|
||||
json_schema: {
|
||||
name: "visionResponse",
|
||||
strict: true,
|
||||
schema: { ... },
|
||||
},
|
||||
},
|
||||
messages: [{ role: "user", content: [...] }],
|
||||
});
|
||||
|
||||
const result = JSON.parse(completion.choices[0].message.content);
|
||||
```
|
||||
|
||||
**Implementation (after):** Claude uses tool use with `tool_choice: { type: "tool" }` to force a specific tool call — this guarantees the model always responds with the schema, with no JSON parsing step:
|
||||
|
||||
```ts
|
||||
const response = await anthropic.messages.create({
|
||||
model: "claude-opus-4-6",
|
||||
max_tokens: 2048,
|
||||
tools: [{
|
||||
name: "vision_response",
|
||||
description: "Return the vision analysis of the image.",
|
||||
input_schema: {
|
||||
type: "object",
|
||||
additionalProperties: false,
|
||||
properties: {
|
||||
title_ideas: { type: "array", items: { type: "string" } },
|
||||
description: { type: "string" },
|
||||
tags: { type: "array", items: { type: "string" } },
|
||||
},
|
||||
required: ["title_ideas", "description", "tags"],
|
||||
},
|
||||
}],
|
||||
tool_choice: { type: "tool", name: "vision_response" },
|
||||
messages: [{
|
||||
role: "user",
|
||||
content: [
|
||||
{
|
||||
type: "image",
|
||||
source: { type: "base64", media_type: "image/jpeg", data: encodedImage },
|
||||
},
|
||||
{ type: "text", text: prompt },
|
||||
],
|
||||
}],
|
||||
});
|
||||
|
||||
const toolUseBlock = response.content.find((b) => b.type === "tool_use");
|
||||
const result = toolUseBlock.input; // already a typed object, no JSON.parse needed
|
||||
```
|
||||
|
||||
**Solution:** `toolUseBlock.input` arrives as a typed object — no `JSON.parse`, no schema re-validation. The image content block format also differs: OpenAI uses `{ type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } }`, while Anthropic uses a dedicated `source` block with `type: "base64"` and a separate `media_type` field.
|
||||
|
||||
## Rate limit handling
|
||||
|
||||
**Problem:** The script catches rate limit errors and retries with exponential backoff. The old code detected 429s by poking at a raw status property and regex-parsing the error message for a retry-after hint — fragile in both directions.
|
||||
|
||||
**Implementation:** Switch to the SDK's typed exception class:
|
||||
|
||||
```ts
|
||||
// before — checking a raw status property
|
||||
function isRateLimitError(error: unknown): boolean {
|
||||
return (error as { status?: number }).status === 429;
|
||||
}
|
||||
|
||||
function extractRetryAfterMs(error: unknown): number | null {
|
||||
// parsed "Please try again in Xs" from error message
|
||||
}
|
||||
|
||||
// after — using Anthropic's typed exception
|
||||
function isRateLimitError(error: unknown): boolean {
|
||||
return error instanceof Anthropic.RateLimitError;
|
||||
}
|
||||
|
||||
function extractRetryAfterMs(error: unknown): number | null {
|
||||
if (!(error instanceof Anthropic.RateLimitError)) return null;
|
||||
const retryAfter = error.headers?.["retry-after"];
|
||||
if (retryAfter) {
|
||||
const seconds = Number.parseFloat(retryAfter);
|
||||
if (Number.isFinite(seconds) && seconds > 0) return Math.ceil(seconds * 1000);
|
||||
}
|
||||
return null;
|
||||
}
|
||||
```
|
||||
|
||||
**Solution:** `instanceof Anthropic.RateLimitError` is the detector, and the `retry-after` header is read off the exception directly. Everything else — EXIF extraction, concurrency control, batching, file writing, CLI flags — stayed exactly the same.
|
||||
|
||||
## Why claude-opus-4-6
|
||||
|
||||
`claude-opus-4-6` is Anthropic's most capable model and handles dense visual scenes, low-light photography, and culturally specific subjects well. For a batch script that runs offline before a deploy, quality matters more than latency.
|
||||
|
||||
## What to take away
|
||||
|
||||
- Claude's tool use with `tool_choice: { type: "tool" }` is a cleaner way to get structured output than OpenAI's `json_schema` — the result comes back as a typed object, no `JSON.parse` step.
|
||||
- Image payloads differ: OpenAI takes a data-URL in `image_url`; Anthropic wants a `source` block with explicit `media_type`.
|
||||
- Use the SDK's typed exception (`Anthropic.RateLimitError`) and read `retry-after` from its headers — don't regex the error message.
|
||||
- Set `maxRetries: 0` on the SDK if you already have your own backoff loop. Double-retrying wastes tokens and quota.
|
||||
Loading…
Add table
Add a link
Reference in a new issue