SynthID vs cryptographic provenance, watermarking vs signing
Google's SynthID embeds an invisible statistical watermark into AI output. CertNode AI Provenance attaches a cryptographic signature that travels alongside the content. They are not the same approach. They solve adjacent problems differently. This is honest about which one wins for which job.
The two approaches in one sentence each
SynthID (watermarking): embed a statistical signal directly into the AI-generated content (in image pixels, audio samples, text token distributions) so the content itself carries a detectable signal that another tool can read.
Cryptographic provenance (CertNode): attach a separate signed record to the content at the moment of generation. The signature lives in metadata or a sidecar receipt, not in the content itself.
Both answer the question "is this AI-generated, and from where?". They are honest about that question differently.
How watermarking works
SynthID modifies the model's output distribution at generation time so the resulting content contains a statistical pattern undetectable to humans but detectable to a SynthID detector. For text, the model preferentially selects tokens that fit a hidden pattern. For images, the watermark perturbs pixel values in ways imperceptible to viewers but recoverable by an AI watermark detector.
Watermarking is integrated into the model itself. Google can ship SynthID for Gemini outputs because Google controls the Gemini decoding step. A third party cannot retroactively watermark someone else's model.
How cryptographic provenance works
At generation time, your application calls the AI model normally, then signs the output with a cryptographic API. The receipt lives outside the content, in a database, in metadata, in an attached file. The receipt is verified by recomputing the content hash and validating the signature against a public key.
Cryptographic provenance is model-agnostic. Whether you are calling Claude, GPT, Mistral, Llama, or your own fine-tune, the signing step is identical. The model provider does not need to be involved.
Side-by-side
| SynthID (watermark) | CertNode (sign) | |
|---|---|---|
| Where the signal lives | Inside the content itself | Outside the content (sidecar receipt) |
| Survives content extraction | Yes (signal embedded) | Only if receipt accompanies content |
| Robust to paraphrasing / edits | Partial (text watermark degrades with rewriting) | No (any edit breaks hash match, feature, not bug) |
| Works across models | No (model-specific) | Yes (provider-agnostic) |
| Verification confidence | Statistical (probabilistic) | Cryptographic (binary, mathematical) |
| Court admissibility framing | Unclear (statistical inference) | FRE 902(13)/(14) self-authenticating |
| Independent timestamp | No (no temporal claim) | Yes (RFC 3161 + Bitcoin anchor) |
| User-controlled | No (model provider controls) | Yes (you sign, you store) |
| Multi-model stack | Each model needs its own watermark | One signing scheme covers all |
When watermarking is the right answer
Watermarking wins for a specific use case: detecting AI content in the wild after it has been extracted from its origin context. If a piece of text or an image gets copy-pasted, screenshotted, redistributed, the cryptographic receipt does not travel with it. The watermark might.
Concrete example: a social platform wants to flag AI-generated images even when a user uploads them without metadata. The image was AI-generated by Gemini, the metadata was stripped, the file was re-encoded. SynthID can still detect the watermark inside the pixels. A CertNode receipt sitting in a database somewhere is useless to the platform that received the file alone.
Watermarking is the right answer for: platform-side AI content moderation, social-media bot detection (in theory), tracking AI content's spread.
When cryptographic provenance is the right answer
Cryptographic signing wins for: attesting that you generated this specific content at this specific time using this specific model, in a way that holds up to skeptical examination. Audit, compliance, litigation, regulated industries.
Concrete example: enterprise procurement asks an AI startup "how do you audit your AI usage?" The answer they want is "every output is signed at generation time with a verifiable receipt that includes model, provider, and timestamp from an independent third party." A SynthID watermark inside the content does not answer that question because the signal was added by Google, not by the AI startup, and it does not carry the application-level metadata (which user, which project, which prompt class).
Cryptographic provenance is the right answer for: EU AI Act Article 50 compliance, FRE 902 admissibility, enterprise audit trails, multi-model production stacks, regulated industries.
Where they overlap
They can run together without conflict. A Gemini-generated image can carry both SynthID inside the pixels (model-side detection signal) and a CertNode receipt outside the file (application-side audit trail). Different layers, different audiences, no contradiction.
For most teams, picking one is enough. Pick by what question you are answering: "is this AI?" (watermarking) versus "did our system generate this and can we prove it?" (cryptographic provenance).
The detection problem watermarking inherits
Watermarking is sometimes pitched as a solution to AI detection. It isn't, exactly. It's better than statistical-pattern detection (GPTZero, Turnitin, etc.) because the signal is intentionally embedded, not inferred. But it shares detection's structural weaknesses:
- Adversaries can attempt to remove the watermark (paraphrase tools, image filters, transcoding).
- The watermark is only as good as the detector. If the detector is wrong, false positives and negatives result.
- The model provider holds the watermark key. Verification is centralized; you trust Google to verify a Gemini watermark.
- It tells you "this content came from a watermarked model" but not "this specific output was generated by your specific application at a specific time."
We argue the AI ecosystem will need both layers (watermarking for spread tracking, cryptographic signing for audit trails) and the failure mode is treating watermarking as a substitute for signing in compliance contexts.
If your question is "can we prove our system generated this?"
CertNode is the answer. SynthID can complement, not replace.