I have a love-hate relationship with AI. I love using it to help me build apps and websites, develop and test ideas and proofread blogs etc.
But! Something that has bothered me for a while, and should bother a lot more people than it seems to, is that every major AI model was trained on human-created content. Articles, blog posts, novels, code, poems, essays — all hoovered up from the internet and fed into systems that can now produce passable imitations of the real thing.
The people who created that content got nothing. No attribution. No compensation. No acknowledgement that their work was used at all.
And there's currently no way to prove it. If an AI produces something that draws on your work, you can't demonstrate that connection. You can't even prove you wrote the original in the first place — not in any way that's cryptographically verifiable, timestamped, and independent of a platform that might disappear tomorrow.
So, with the help of AI, I'm building something that might address the problem of AI.
What I'm trying to build
Keep Digital Human is a provenance tool for creators. That's a fancy way of saying that it lets you prove you made something, and when.
You paste your text or file into a form in your browser, and without any of your data being sent anywhere, the system creates a cryptographic fingerprint of your content. A SHA-256 hash, if you want the technical term. It's a unique string of characters that can only be produced by that exact piece of text or file. Change a single comma or pixel and you'd get a completely different fingerprint.
That fingerprint gets timestamped and stored. The original content never leaves your browser. We never see it. We never store it. All the system keeps is the fingerprint, the timestamp, and your identity.
You then get a public verification page — a URL you can share with anyone — that proves this specific content existed at this specific time, authored by you.
Why this matters
Think of it as a digital deed of authorship.
Right now, if you publish a blog post, the only proof you wrote it is... that it's on your blog. If someone copies it, or an AI ingests it, or it appears in a training dataset — what evidence do you have? A WordPress timestamp that you could have faked? A screenshot that proves nothing?
A cryptographic fingerprint is different. It's mathematical proof. You can independently verify it. You can't forge it. You can't backdate it. It exists as an objective record that this exact content was registered at this exact moment by this exact person.
What's more, the system then allows you to:
- prove your ownership later by checking the original text or file you registered with the system
- allow you to check against AI output to see if there are enough similarities that your work has been used for training

Versioning — because work evolves
Creative work doesn't stand still. You publish an article, then you edit it. You revise a manuscript. You remaster a track. The second version is still yours, but it's got a different fingerprint.
So we've built versioning into the system. When you update a piece of work, you can register the new version and link it to the original. The system maintains a full version history — a chain of provenance records showing how your work evolved over time, each one independently verifiable, each one timestamped.
It's like track changes, but cryptographic and permanent.
The open standard
Provenance is only useful if it's not locked inside one system. If Keep Digital Human is the only place that can read its records, we've built a walled garden, not infrastructure.
So we've published the KDH Provenance Record as an open standard. It's a specification that defines how provenance records should be structured, discovered, and verified — and anyone can implement it.
The standard includes several ways for machines to find and verify provenance records automatically:
- Structured data embedded in web pages, so search engines and AI crawlers can see provenance information without having to visit the Keep Digital Human site
- A discovery endpoint (a
.well-knownURL, if you're technically inclined) that lists registered works in a machine-readable format - Embeddable badges — a small visual widget you can add to your website that links back to your verification certificate
- A bulk verification API that lets AI platforms check hundreds of content fingerprints in a single request
If an AI company wanted to check whether its training data included registered works, it now has a standardised way to do so. Whether they choose to is another question, but the mechanism exists.
Catching AI in the act
We've built an AI detection scanner. The system probes multiple AI models with a series of carefully designed queries related to your work and analyses whether the responses show signs of drawing on your content.
AI doesn't copy — it synthesises. So the scanner uses ten different probe strategies, each testing a different way an AI might reproduce your work:
- asking the AI to write about the same topic
- giving it the opening sentence and asking it to continue
- asking it to paraphrase your text
- asking it to mimic your writing style
- testing whether it can recall specific facts or arguments from your work
- even asking it directly who wrote a particular passage
For each probe, the system measures text similarity using n-gram analysis — essentially checking how many short phrases overlap between your original work and the AI's response. It breaks this down into trigram scores (three-word sequences), pentagram scores (five-word sequences), and identifies the longest matching phrase.
Is this definitive proof that an AI was trained on your work? No. But it's evidence. When you accumulate enough data points across enough works, patterns emerge.
The detection dashboard gives you a full breakdown of every scan — which models showed the strongest similarity, which probe strategies were most effective, and exactly which phrases matched.
The API and browser extension
For creators who want to integrate provenance into their workflow, there's a public API. You can register works, check fingerprints, and retrieve provenance records programmatically. There's also a batch registration endpoint for when you want to register an entire back-catalogue in one go.
We're also building Chrome and Firefox browser extensions. These let you select text on any web page and register it with a right-click, check whether the page you're reading contains registered content, and even prompts you to register when it detects you're about to publish something on platforms like Medium or WordPress.
The bigger picture
I'm not pretending this solves everything, clearly it doesn't. It's very early days and everything is more or less in test mode. But the fundamentals are there.
Consider where AI is heading. It's becoming the operating layer for everything. The thing that mediates your communication, summarises your reading, acts on your behalf. In that world, the question of "who made this?" becomes critical.
Not because of some abstract principle about intellectual property. Because creators deserve to be part of the value chain. If your writing is good enough to train an AI, it's good enough to be attributed and compensated.
The vision — and this is a vision, not a current reality — is a system where:
- Creators can prove they authored something and when
- AI usage of that content can be tracked and attributed
- Micropayments flow back to creators proportional to how much their work is used
- Human-created content carries a verifiable "made by a human" signal
We're somewhere between steps one and two.
What this isn't
This isn't blockchain-bro nonsense. There are no tokens. There's no cryptocurrency. There's no speculative investment opportunity.
What we're using is the useful part of that technology — cryptographic verification. The same mathematical principles that make blockchain work, applied to the problem of trust and attribution.
This also isn't DRM. I'm not trying to restrict what anyone does with content. I'm trying to create a record of who made it. Those are very different things.
Try it out
The system works. You can register content right now, including:
- text
- images
- documents
- audio
- video
You get a real cryptographic fingerprint, a real verification page, a downloadable provenance certificate, and an embeddable badge for each item. You can register works individually or in batches, track versions, run AI detection scans, and monitor your content across multiple AI platforms.
Whether it all matters depends on whether enough creators use it to establish a critical mass of provenance records, and whether AI platforms eventually choose (or are compelled) to respect them.
I don't know if that will happen. But I know the alternative — doing nothing and hoping the problem resolves itself — definitely isn't a solution.
What you can do
If you're a creator — writer, journalist, poet, musician, artist, filmmaker, blogger, vlogger, coder, essayist, anyone who makes things with words, sounds or images — go to keepdigitalhuman.com and register something. It's free. It takes thirty seconds. Your content never leaves your browser.
Because we never store your original content, you need to keep it safe yourself. If you ever need to prove you registered something, you'll need the exact same file or text to regenerate the matching fingerprint. And if you update your work, register the new version — it takes seconds and builds your provenance chain.
At worst, you'll have a timestamped proof of authorship that exists independently of any platform. At best, you'll be part of building the infrastructure that gives creators a seat at the table in the AI economy.
The silly shirt, in this case, is a cryptographic hash. Wear it. Tell everyone you're a creator.
Keep Digital Human is free and open. The provenance standard is published for anyone to implement, the API is documented, and the code is on GitHub.