Skip to main content
← All recipes
MigrationAdvanced

Backfill historical AI outputs with signed receipts

Most teams adopting AI provenance already have a log of AI outputs in their database. This recipe walks the log and produces a signed receipt for each historical entry — idempotent, rate-aware, and safe to resume after a crash.

Honest framing — what backfill can and can't prove

  • Can prove: the AI output existed in this exact form at the moment of backfill (timestamp on receipt = backfill time, not original generation time). Useful for "this content existed before any future challenge" claims.
  • Can NOT prove: the AI output existed at its original generation time. The receipt timestamp is the signing time. If your original logs include a generation timestamp, you can include that in metadata, but it's only as trustworthy as your log infrastructure.
  • Stronger pattern: if you have access to cryptographic timestamping at the original generation time (e.g., your logs were anchored via OpenTimestamps as they were written), the backfill receipt + the original anchor together form a chain back to the original time.

Schema: track backfill state

Add a column to your AI output log to track which rows have been backfilled. This makes the operation idempotent and resume-safe.

ALTER TABLE ai_output_log
  ADD COLUMN certnode_receipt_id TEXT,
  ADD COLUMN certnode_signed_at TIMESTAMPTZ,
  ADD COLUMN backfill_attempted_at TIMESTAMPTZ;

CREATE INDEX idx_ai_output_log_backfill
  ON ai_output_log (backfill_attempted_at)
  WHERE certnode_receipt_id IS NULL;

Worker: batched, idempotent, rate-aware

import { CertNode } from '@certnode/sdk'

const cert = new CertNode({ apiKey: process.env.CERTNODE_API_KEY! })

const BATCH_SIZE = 50
const RATE_LIMIT_DELAY_MS = 60 // 1000 req/min = 60ms between calls

async function backfillBatch() {
  // 1. Claim a batch of unsigned rows (mark them attempted so concurrent
  //    workers don't grab the same rows)
  const batch = await db.$transaction(async (tx) => {
    const rows = await tx.aiOutputLog.findMany({
      where: {
        certnode_receipt_id: null,
        OR: [
          { backfill_attempted_at: null },
          // Retry rows where attempt was >1h ago (stuck workers)
          { backfill_attempted_at: { lt: new Date(Date.now() - 60 * 60 * 1000) } },
        ],
      },
      orderBy: { generated_at: 'asc' },
      take: BATCH_SIZE,
    })

    // Mark attempted
    await tx.aiOutputLog.updateMany({
      where: { id: { in: rows.map((r) => r.id) } },
      data: { backfill_attempted_at: new Date() },
    })

    return rows
  })

  if (batch.length === 0) return { processed: 0, done: true }

  // 2. Sign each row, respecting rate limits
  let processed = 0
  for (const row of batch) {
    try {
      const signed = await cert.signAIOutput({
        output: row.ai_output,
        model: row.model ?? undefined,
        provider: row.provider ?? undefined,
        // Backfill metadata: encode the original generation time in promptHash
        // so verifiers can see the claimed-vs-signed time gap
        promptHash: `backfill|original-gen-time=${row.generated_at.toISOString()}`,
      })

      // 3. Persist the receipt ID — this is the idempotency anchor
      await db.aiOutputLog.update({
        where: { id: row.id },
        data: {
          certnode_receipt_id: signed.receiptId,
          certnode_signed_at: new Date(signed.signedAt),
        },
      })
      processed++

      // 4. Rate-limit sleep
      await new Promise((resolve) => setTimeout(resolve, RATE_LIMIT_DELAY_MS))
    } catch (err) {
      // 5. On error, leave backfill_attempted_at set — row will be retried
      //    after the 1h stale-window. Log and continue with the next row.
      console.error('[backfill] failed for row', row.id, err)
    }
  }

  return { processed, done: false }
}

// Run as a cron or one-shot worker:
async function runUntilDone() {
  while (true) {
    const result = await backfillBatch()
    console.log('[backfill] processed batch:', result)
    if (result.done) {
      console.log('[backfill] all rows signed')
      break
    }
  }
}

Idempotency notes

  • Why mark attempted before signing: two concurrent workers grabbing the same row would create two receipts for the same content. The transaction in step 1 ensures each row is claimed before signing starts.
  • Why the 1h stale window: a worker that crashed mid-batch leaves rows with backfill_attempted_at set but no receipt. Retry them after 1h so legitimate slow runs don't get double-signed but stuck rows recover.
  • Why store the receipt ID: idempotent lookups. Re-running the backfill never re-signs a row that already has a receipt. Safe to invoke repeatedly until done.

Cost + cap planning

At $0.01/receipt above the 100/mo free tier, a 1M-row backfill costs about $10,000. Volume discounts apply: rates step down at 10K, 100K, and 1M signings/mo (see pricing). For backfills above 100K rows, contact contact@certnode.io for batch pricing — backfill workloads are different from steady-state and we can quote separately.