/ posts / Licensing of audits hosted on the OpenVet Registry
registry

Licensing of audits hosted on the OpenVet Registry

The OpenVet Registry hosts audits that its users produce, and the whole point is that those audits are freely accessible to anyone who wants to consume them. The whole reason I am building the OpenVet Registry, that I am hosting it, and that I am licensing it as AGPL is that I think audits should be a public good, and supply-chain security should not be a subscription model.

“Freely accessible” sounds nice in a marketing sentence, but it is not a legally meaningful statement. If we want consumers to actually be able to use the audits they download — read them, depend on them, redistribute them, build on top of them — we need to encode what “freely accessible” means in terms that copyright law understands. That means licensing.

This post is about how I’m planning to approach that.

Audits are original work #

One thing that makes the licensing question tractable is that audits, as OpenVet defines them, are completely original work. They do not embed the code they describe. An audit can annotate a piece of code, but the annotation refers to the source by file path and line number — it does not include the source itself. Findings, claims, and the report text are all written by the auditor.

This matters because it means a single audit can be covered by a single license. There is no embedded third-party material that needs to be tracked or licensed separately. The auditor wrote the audit, the auditor owns the copyright, and the auditor gets to pick the license.

The license attribute #

Audits already have a mechanism for carrying meta-data about the audit itself (as opposed to data about the thing being audited). We call these attributes. The natural place to encode the license is as one of these attributes, set to an SPDX license identifier.

That answers the how. The harder questions are:

  • What licenses are reasonable for audits to be published under?
  • Should the registry require specific licenses, or accept anything?

What licenses make sense #

The registry only exists because audits get more useful when they are shared. If somebody publishes an audit under a license that prevents others from using it — anything non-commercial, anything that forbids redistribution, anything bespoke that consumers need a lawyer to read — there is no point in hosting that audit on a public registry. The whole value proposition collapses.

So the set of acceptable licenses needs to be narrow enough that consumers can use audits without thinking about it, and wide enough to cover the reasonable preferences authors have. My argument is that, at least for now, that set should be exactly two licenses: CC0 and CC-BY-4.0.

Both are well-understood, both are permissive enough to be useful, and they differ in exactly one meaningful way:

  • CC0 is effectively a public-domain dedication. The author gives up their rights and the work can be used by anyone for any purpose with no obligations attached.
  • CC-BY-4.0 allows the same uses, but requires attribution: if you redistribute or build on the work, you have to credit the original author.

Either license makes audits useful in the way I want them to be. A consumer can pick up an audit, improve it, and re-publish the improved version. If the original was CC0, no strings attached. If it was CC-BY-4.0, the improved version needs to credit the original author. Both of these are reasonable workflows for a collaborative auditing ecosystem.

This is not a novel choice. RustSec uses the same pair: their own original advisory content is published under CC0, and they also ingest GitHub advisories, which are CC-BY-4.0. The intersection of “useful for a public security database” and “common enough that people will actually pick it” lands in roughly the same place for both of us.

Why enforce it, and why now #

The registry could accept any license and let consumers sort it out. I think that would be a mistake.

If we do not enforce a license attribute from the start, we end up with a collection of audits with unclear licensing. Some will have no license at all (which, under default copyright law, means consumers cannot legally redistribute them). Some will have hand-rolled licenses that consumers have to read individually. Cleaning that up after the fact is much harder than getting it right from day one — you have to track down every author and ask them to re-license, and any author who has gone silent leaves a permanent hole in the dataset.

It is always possible to relax the requirement later. If, in a year, it turns out that there is a strong case for adding a third license to the allow-list, that is a one-line registry change and existing audits remain valid. Going the other direction — tightening the rules after the fact — would mean breaking or removing existing audits, which is the kind of thing that erodes trust in a registry.

So: enforce a license attribute, and limit it to CC0 or CC-BY-4.0.

Cross-posting from other sources #

One of the use cases I want this scheme to support is cross-posting existing vulnerability data into the audit format. I would like to be able to write a tool that ingests, say, RustSec advisories, and generates (either automatically or with human review) an audit whose findings and claims encode what that advisory says. The same goes for GitHub Security Advisories, and for CVE data.

The CC0 / CC-BY-4.0 allow-list covers this cleanly for the sources I care about most:

  • RustSec advisories are CC0, so they can be republished under either CC0 or CC-BY-4.0 with no friction.
  • GHSA entries are CC-BY-4.0, so a cross-posted audit can carry that license and the attribution requirement is satisfied by crediting the original advisory.
  • CVE data is more nuanced. The structured facts (affected versions, identifiers, references) are not copyrightable on their own, so summarizing them in our own words produces an original audit that we can license freely. Verbatim copying of CVE descriptions would fall under the CVE program’s terms of use, which is a special case I do not want to add to the allow-list.

In other words, the policy is “audits in the registry are CC0 or CC-BY-4.0”, and the cross-posting workflow has to produce audits that fit inside that policy. Any source whose terms do not fit gets summarized in the audit author’s own words, or it does not get cross-posted.

Implementation #

The default workflow to create audits uses the command-line interface of OpenVet. I am changing this tooling to insert a default license of CC0, with a message stating it has done so. The attribute can be overridden using the audit commands, like this:

openvet audit attribute license --set CC-BY-4.0

In the registry, I am implementing a check to require a license attribute to be present, and for it to be one of the licenses in the allow-list. OpenVet does not force you to use any of these licenses. Only the registry requires published audits to have one of these two licenses.

Summary #

  • Audits are original work, so a single license per audit is enough.
  • The license will be encoded as a license attribute on the audit, using an SPDX identifier.
  • The registry will only accept CC0 or CC-BY-4.0.
  • Enforcing this from the start avoids a permanent licensing-mess in the dataset; the allow-list can always be relaxed later.
  • The same allow-list happens to make it straightforward to cross-post audits derived from RustSec, GHSA, and (with care) CVE data.