---
title: 'Caching webmention avatars locally at build time'
description: 'A small Astro helper that downloads webmention author photos during the build, dedupes them, and serves them locally — for a strict CSP, stronger privacy, and better availability.'
pubDate: 'Apr 22 2026'
category: en/tech
tags:
- astro
- webmentions
- privacy
- csp
- indieweb
- gdpr
translationKey: local-webmention-avatars
---
This post follows up on my [security-header rollout](/en/security-headers-astro-caddy/). Once a strict Content Security Policy with `img-src 'self' data:` was live, the next scan surfaced a new issue: mention photos on the post pages were gone.
The obvious fix — opening `img-src` to `'self' data: https:` — treats the symptom, not the actual problem. This post describes the approach that solves both at once: the CSP stays strict, and readers never make a single request to the external avatar host.
## Problem
My `Webmentions.astro` component renders the facepile and replies with external avatar URLs pulled straight from the webmention.io API:
```astro
```
Three problems in one:
1. **CSP conflict.** `img-src 'self' data:` blocks external images. The alternative `img-src 'self' data: https:` would water the directive down to near-uselessness.
2. **Reader privacy.** Every visit to a page with mentions causes the reader's browser to hit `avatars.webmention.io` — a tracking vector they never opted into. The avatar host sees IP address, user agent, referer, and the path of every post with mentions. Under GDPR Art. 5.1.c (data minimisation) that's unnecessary disclosure of personal data.
3. **Availability.** If webmention.io is down or slow, avatars are missing or delay rendering.
All three disappear if the browser never loads the avatars from the third party in the first place.
## The idea
Webmentions are already fetched **at build time**, not at read time. Which means every avatar URL is known while the post is being rendered. So: download the images in the same pass, store them locally, and rewrite the HTML to point at the local path.
The implementation needs four pieces:
- A cache directory under `public/` that survives across builds.
- A mirror into `dist/`, because Astro copies `public/` to `dist/` **before** rendering — files added to `public/` later would be missing from the current build's output.
- Deduplication so the same avatar doesn't get downloaded twice when one author has multiple mentions on the same page (liked and reposted, for example).
- A garbage-collection pass at build end that removes orphaned avatars — otherwise files pile up forever once a user deletes their like or comment.
## Implementation
The helper file `src/lib/avatar-cache.ts`:
```ts
import { createHash } from 'node:crypto';
import { existsSync } from 'node:fs';
import { copyFile, mkdir, readdir, unlink, writeFile } from 'node:fs/promises';
import path from 'node:path';
const PUBLIC_DIR = path.resolve(process.cwd(), 'public', 'images', 'webmention');
const DIST_DIR = path.resolve(process.cwd(), 'dist', 'images', 'webmention');
const URL_PATH = '/images/webmention';
const EXT_BY_MIME: Record = {
'image/jpeg': 'jpg',
'image/png': 'png',
'image/webp': 'webp',
'image/gif': 'gif',
'image/avif': 'avif',
'image/svg+xml': 'svg',
};
const memo = new Map>();
const usedFilenames = new Set();
async function mirrorToDist(filename: string, srcPath: string) {
try {
await mkdir(DIST_DIR, { recursive: true });
const dest = path.join(DIST_DIR, filename);
if (!existsSync(dest)) await copyFile(srcPath, dest);
} catch {
// dist/ doesn't exist during `astro dev` — ignore
}
}
async function download(url: string): Promise {
const hash = createHash('sha1').update(url).digest('hex').slice(0, 16);
await mkdir(PUBLIC_DIR, { recursive: true });
try {
const res = await fetch(url, { signal: AbortSignal.timeout(10_000) });
if (!res.ok) return null;
const mime = (res.headers.get('content-type') ?? '').split(';')[0].trim().toLowerCase();
const ext = EXT_BY_MIME[mime] ?? 'jpg';
const filename = `${hash}.${ext}`;
const filepath = path.join(PUBLIC_DIR, filename);
if (!existsSync(filepath)) {
const buf = Buffer.from(await res.arrayBuffer());
await writeFile(filepath, buf);
}
usedFilenames.add(filename);
await mirrorToDist(filename, filepath);
return `${URL_PATH}/${filename}`;
} catch {
return null;
}
}
export function cacheAvatar(url: string | undefined): Promise {
if (!url) return Promise.resolve(null);
if (!/^https?:\/\//i.test(url)) return Promise.resolve(url);
let pending = memo.get(url);
if (!pending) {
pending = download(url);
memo.set(url, pending);
}
return pending;
}
async function sweepDir(dir: string): Promise {
if (!existsSync(dir)) return 0;
const entries = await readdir(dir);
let removed = 0;
await Promise.all(
entries.map(async (name) => {
if (!usedFilenames.has(name)) {
await unlink(path.join(dir, name));
removed++;
}
}),
);
return removed;
}
export async function sweepCache(): Promise<{ removed: number }> {
const publicRemoved = await sweepDir(PUBLIC_DIR);
await sweepDir(DIST_DIR);
return { removed: publicRemoved };
}
```
Four details that are easy to miss:
- **SHA1 of the URL as filename** dedupes deterministically. The same avatar always gets the same filename, regardless of how many builds it appears in.
- **MIME type from the response header** drives the extension. Relying on the URL suffix is fragile — webmention.io serves some avatars behind URLs without an extension.
- **`memo` map** prevents duplicate downloads within a build. `Webmentions.astro` calls `cacheAvatar` via `Promise.all`; without memoisation the same author who liked and reposted the same page would be fetched twice in parallel.
- **`usedFilenames` set plus `sweepCache`** turns this into a mark-and-sweep cache. Every cache hit during the build marks a filename as "used"; at build end the sweep deletes anything unmarked. Without it, avatars from deleted mentions (a like taken back, a comment removed) would linger in the folder forever.
In `Webmentions.astro` the cache step runs right after `fetchMentions`:
```astro
---
import { cacheAvatar } from '~/lib/avatar-cache';
// ...
const all = await fetchMentions(targetStr);
await Promise.all(
all.map(async (m) => {
if (m.author?.photo) {
const local = await cacheAvatar(m.author.photo);
if (local) m.author.photo = local;
else delete m.author.photo;
}
}),
);
---
```
If the download fails (timeout, 404, whatever), `author.photo` is removed — the component then falls back to the initial-based avatar automatically.
### Garbage collection as an Astro integration
`sweepCache` can only run once every page has been rendered — any earlier and it would delete avatars a later page still needs. Astro's `astro:build:done` hook is exactly the right moment.
The integration `src/integrations/avatar-cache-sweep.ts`:
```ts
import type { AstroIntegration } from 'astro';
import { sweepCache } from '../lib/avatar-cache';
export default function avatarCacheSweep(): AstroIntegration {
return {
name: 'avatar-cache-sweep',
hooks: {
'astro:build:done': async ({ logger }) => {
const { removed } = await sweepCache();
if (removed > 0) {
logger.info(`swept ${removed} orphaned webmention avatar${removed === 1 ? '' : 's'}`);
}
},
},
};
}
```
Registered in `astro.config.mjs`:
```js
import avatarCacheSweep from './src/integrations/avatar-cache-sweep';
export default defineConfig({
integrations: [
mdx(),
sitemap({ /* ... */ }),
avatarCacheSweep(),
],
});
```
A useful side effect: `astro:build:done` never fires in `astro dev`. The sweep only runs on `astro build` — exactly where it matters. During development the cache stays untouched, so a dev server can't accidentally evict an avatar that the next production build still needs.
### Why the `/images/webmention/` path
My first attempt used `/webmention-avatars/` at the site root. That works, but it doesn't match the Caddy rule I already had for caching:
```caddy
header /images/* Cache-Control "public, max-age=604800"
```
Under `/images/webmention/...` this rule applies automatically — a week of browser caching for avatars, no extra config needed.
### .gitignore
The cache folder doesn't belong in the repo — it gets populated during each build if empty, and otherwise persists from a previous run:
```
public/images/webmention/
```
On CI without persistent caching the avatars are re-fetched every build. Since they're tiny JPEGs, that's not a problem.
## Solution
The built HTML now contains:
```html
```
Four wins at once:
1. **CSP stays strict.** `img-src 'self' data:` is enough — no free pass for arbitrary HTTPS sources.
2. **Zero third-party requests on page load.** Readers fetch the avatars from the site's own server; the mention host never sees them. GDPR Art. 5.1.c satisfied.
3. **Availability decoupled.** Rendering the avatars only depends on the build having succeeded, not on webmention.io being reachable at view time.
4. **Self-cleaning cache.** Retracted likes and deleted comments take their avatar with them on the next build. The folder stays lean, no file carcasses from people no longer linked — data minimisation in practice, not just on paper.
The Caddy `/images/*` caching rule applies out of the box. On every build that produces something new, the avatars sit in `dist/images/webmention/` — unchanged for previously seen URLs, freshly fetched for new ones.
## What to take away
- **External resources known at build time belong in the cache.** It's an old recommendation for Google Fonts — the same argument applies to webmention avatars, external icons, Mastodon badges, and anything else tied to a stable URL.
- **CSP compliance and privacy point at the same fix.** Stop loading the avatars externally and the CSP doesn't have to be relaxed **and** readers' IPs don't leak.
- **URL-hash dedup is trivial and robust.** No database, no external config — a cache folder plus SHA1 is all you need.
- **Availability is an underrated privacy bonus.** Self-hosted resources don't disappear when the external provider raises prices, changes the API, or shuts down.
- **A cache without garbage collection is a data graveyard.** Mark-and-sweep is five lines of code and stops the pile-up — whether you care about disk space or data minimisation.