← All recipes
MigrationAdvanced
Backfill historical AI outputs with signed receipts
Most teams adopting AI provenance already have a log of AI outputs in their database. This recipe walks the log and produces a signed receipt for each historical entry — idempotent, rate-aware, and safe to resume after a crash.
Honest framing — what backfill can and can't prove
- Can prove: the AI output existed in this exact form at the moment of backfill (timestamp on receipt = backfill time, not original generation time). Useful for "this content existed before any future challenge" claims.
- Can NOT prove: the AI output existed at its original generation time. The receipt timestamp is the signing time. If your original logs include a generation timestamp, you can include that in metadata, but it's only as trustworthy as your log infrastructure.
- Stronger pattern: if you have access to cryptographic timestamping at the original generation time (e.g., your logs were anchored via OpenTimestamps as they were written), the backfill receipt + the original anchor together form a chain back to the original time.
Schema: track backfill state
Add a column to your AI output log to track which rows have been backfilled. This makes the operation idempotent and resume-safe.
ALTER TABLE ai_output_log
ADD COLUMN certnode_receipt_id TEXT,
ADD COLUMN certnode_signed_at TIMESTAMPTZ,
ADD COLUMN backfill_attempted_at TIMESTAMPTZ;
CREATE INDEX idx_ai_output_log_backfill
ON ai_output_log (backfill_attempted_at)
WHERE certnode_receipt_id IS NULL;Worker: batched, idempotent, rate-aware
import { CertNode } from '@certnode/sdk'
const cert = new CertNode({ apiKey: process.env.CERTNODE_API_KEY! })
const BATCH_SIZE = 50
const RATE_LIMIT_DELAY_MS = 60 // 1000 req/min = 60ms between calls
async function backfillBatch() {
// 1. Claim a batch of unsigned rows (mark them attempted so concurrent
// workers don't grab the same rows)
const batch = await db.$transaction(async (tx) => {
const rows = await tx.aiOutputLog.findMany({
where: {
certnode_receipt_id: null,
OR: [
{ backfill_attempted_at: null },
// Retry rows where attempt was >1h ago (stuck workers)
{ backfill_attempted_at: { lt: new Date(Date.now() - 60 * 60 * 1000) } },
],
},
orderBy: { generated_at: 'asc' },
take: BATCH_SIZE,
})
// Mark attempted
await tx.aiOutputLog.updateMany({
where: { id: { in: rows.map((r) => r.id) } },
data: { backfill_attempted_at: new Date() },
})
return rows
})
if (batch.length === 0) return { processed: 0, done: true }
// 2. Sign each row, respecting rate limits
let processed = 0
for (const row of batch) {
try {
const signed = await cert.signAIOutput({
output: row.ai_output,
model: row.model ?? undefined,
provider: row.provider ?? undefined,
// Backfill metadata: encode the original generation time in promptHash
// so verifiers can see the claimed-vs-signed time gap
promptHash: `backfill|original-gen-time=${row.generated_at.toISOString()}`,
})
// 3. Persist the receipt ID — this is the idempotency anchor
await db.aiOutputLog.update({
where: { id: row.id },
data: {
certnode_receipt_id: signed.receiptId,
certnode_signed_at: new Date(signed.signedAt),
},
})
processed++
// 4. Rate-limit sleep
await new Promise((resolve) => setTimeout(resolve, RATE_LIMIT_DELAY_MS))
} catch (err) {
// 5. On error, leave backfill_attempted_at set — row will be retried
// after the 1h stale-window. Log and continue with the next row.
console.error('[backfill] failed for row', row.id, err)
}
}
return { processed, done: false }
}
// Run as a cron or one-shot worker:
async function runUntilDone() {
while (true) {
const result = await backfillBatch()
console.log('[backfill] processed batch:', result)
if (result.done) {
console.log('[backfill] all rows signed')
break
}
}
}Idempotency notes
- Why mark attempted before signing: two concurrent workers grabbing the same row would create two receipts for the same content. The transaction in step 1 ensures each row is claimed before signing starts.
- Why the 1h stale window: a worker that crashed mid-batch leaves rows with
backfill_attempted_atset but no receipt. Retry them after 1h so legitimate slow runs don't get double-signed but stuck rows recover. - Why store the receipt ID: idempotent lookups. Re-running the backfill never re-signs a row that already has a receipt. Safe to invoke repeatedly until done.
Cost + cap planning
At $0.01/receipt above the 100/mo free tier, a 1M-row backfill costs about $10,000. Volume discounts apply: rates step down at 10K, 100K, and 1M signings/mo (see pricing). For backfills above 100K rows, contact contact@certnode.io for batch pricing — backfill workloads are different from steady-state and we can quote separately.