Annex B: Getting started writing SPDX 3 (Informative)

(a.k.a My First SPDX File)

This guide is designed to walk you through the concepts behind an SPDX document, by walking through writing one by hand. While it is possible to write all your SPDX documents by hand, we would recommend looking at the various language bindings that are available for crafting more complex documents. Nevertheless, walking through an example of a hand written document can be instructive into how SPDX documents work to better understand concepts that are at play, even when using language bindings.

All of the provided fragments listed here are intended to be used to construct a complete a valid SPDX JSON document when concatenated together

If you do would like to construct the complete example from this Markdown file, use the following command:

cat getting-started.md | awk '/^```json/, $0=="```" {if ($0 !~ /^```.*/ ) print}'

Please note that all descriptions of properties, classes, etc. are non-normative; that is they are intended to help you understand what is going on in simpler language, but are not necessarily complete. Links to the full official documentation are provided where possible.

The Preamble

All documents need to start somewhere, and SPDX documents are no exception.

The root of all SPDX documents will be a JSON object, so start with that:

{

Next, we need to identify that the document is an SPDX 3 JSON-LD document, which is done with:

    "@context": "https://spdx.org/rdf/3.0.0/spdx-context.jsonld",

SPDX documents are designed to be a strict subset of JSON-LD, such that they can be parsed using either a full JSON-LD parser if you need the full power of linked documents or RDF, or a much simpler JSON parser if all you care about is extracting meaningful SPDX data from the document.

Because the document is valid JSON-LD, the @context must be provided to tell the JSON-LD parser how to expand the human readable names in the document into full IRIs (don't worry if you don't know what that means, it's not really that important). You can think of this line as telling us "This is an SPDX document, and this provided URL tells us how to decode it". The SPDX JSON Schema will force you to put the correct value here when validating a document.

Now, we need to specify the list of objects that we want to create in this document. JSON-LD has a special way of specifying this list using the @graph property of the root object like so:

    "@graph": [

Tell us about yourself

Our first SPDX object is going to be a Person that tells us who is writing this document (you!), so lets get started with it:

        {
            "type": "Person",

This is the basic format for any object in SPDX; all objects have one required property named type that tells us what this object actually is, so here we say this is a Person.

Next, we need to name our object:

            "spdxId": "http://spdx.example.com/Person/JoshuaWatt",

Most objects can have some sort of "ID" property that gives it a name. In the case of Person, that property is called spdxId (inherited from Element). This property is the URI that should give this object a universally unique name. Although this property looks like a HTTP URL, it is in fact not. Technically speaking, a URL defined a Location, where as a URI defines an Identifier (i.e. the name by which something is known). In all likelihood, a URI is not a resolvable location from whence you can do an HTTP GET to retrieve data, but rather just a way of constructing a namespaced identifier. This identifier can be used within this document to refer to this object (more on that later), or it can be referenced from other documents to refer to this specific object (although in that case there needs to be additional information to describe how to find this document). URI's are considered to be universally unique, so any objects constructed with this URI are considered to be the same object, and any references to this URI is a reference to this specific object we are creating.

If you work for a company, own a domain, etc. it is encouraged to use that (or some subdomain of it) in place of spdx.example.com.

In practice, many spdxId values will have some sort of hash or random UUID-like string incorporated to make them unique.

Moving on from this, we have:

            "creationInfo": "_:creationinfo",

All SPDX objects derived from Element must specify how they were created by linking to a CreationInfo object. It is important to know the providence of where objects come from; but more on this later.

            "name": "Joshua Watt",

The optional name property is inherited from the Element class, and means "the common name for the thing", or in this case, your name.

As our last step, we want to indicate another way by which You are known to the world; specifically your E-mail address.

To do this we first need to use the (optional) externalIdentifier property which Person inherits from Element:

            "externalIdentifier": [

This property is an array of ExternalIdentifier objects, so start by adding one to the array:

                {
                    "type": "ExternalIdentifier",

Again notice this uses the type property to identify what the object is. However it should be noted that this is our first object that is not derived from Element, and therefore it does not need a spdxId property.

Next, lets add the relevant information about your email address:

                    "externalIdentifierType": "email",
                    "identifier": "JPEWhacker@gmail.com"

Two properties are used here. First, [externalIdentifierType][Property_externalIdentifierType] is used to indicate what type of external identifier this is. There are many choices, but in the case we are specifying your email address, so we choose the value email. The second property is the indentifier property which is the actual string identifier (in this case, your email address).

We are now done with our Person, so close it all out and prepare for the next object:

                }
            ]
        },

Where did all this stuff come from?

Our next object is going to be a CreationInfo object. It is required to provide one for every SPDX document, as all objects derived from Element must link to one in their creationInfo property to indicate where they came from.

Note that the CreationInfo describes where a SPDX Element itself came from (that is, who wrote the actual JSON). This is a distinct concept from describing where the thing an Element describes comes from, which is covered later.

Lets get started:

        {
            "type": "CreationInfo",

Hopefully this is making sense. We are saying this object is a CreationInfo.

            "@id": "_:creationinfo",

This object also has an @id similar to the spdxId of our person, but it is subtly different First of all, this one is not a URI like our Person, but instead starts with a _:. This type of identifier is known as a blank node. Blank nodes serve a similar purpose to the URI of the spdxId, however they only have scope within this SPDX document. What this means is that it be impossible to reference this CreationInfo by name outside of this document. Inside the document, you can use this identifier to refer to this object. The string after the _: is arbitrary and you may choose whatever unique (within the document) string that you choose.

It should be noted that CreationInfo does not derive from Element class (like our previous example of ExternalIdentifier), and as such the @id property is technically optional. However, since we will need to refer to this object at other places in the document, we must give it an identifier. This also means that this object does not have a mandatory creationInfo property (which makes sense since it would be a circular reference). Finally, CreationInfo is only allowed to have a blank node identifier.

If you look back at the Person we just created, you'll notice that its creationInfo property has the string value that matches the @id of this object; this is how objects are linked together by reference in SPDX.

Next, we need to specify which version of the SPDX spec that elements linking to this CreationInfo are conforming to:

            "specVersion": "3.0.0",

Now, we need to use the createdBy property to indicated who (or what) created the elements that are linked to this CreationInfo:

            "createdBy": [
                "http://spdx.example.com/Person/JoshuaWatt"
            ],

This property is a list of reference to any class that derives from Agent. Since you are the person writing the document, put a single list item that is the spdxId of your Person element here to link them together. Note that even though this is using a full URI instead of a blank node, this is linking in the same way as creationInfo described above.

Also, it is worth noting that this does indeed create a circular reference between our Person.creationInfo property and CreationInfo.createdBy property. This is fine in SPDX, as objects are not required to be a Directed Acyclical Graph (DAG).

Finally, we need to specify the date that any objects linking to this CreationInfo were created using the created property and close out the object:

            "created": "2024-03-06T00:00:00Z"
        },

Use today's date and time in ISO 8601 with the format: "%Y-%m-%dT%H:%M:%SZ". The timezone should always be UTC.

Describing the Document

SPDX requires that information about the document itself be provided. In order to do this, we must create a SpdxDocument object, so lets do that now:

        {
            "type": "SpdxDocument",
            "spdxId": "http://spdx.example.com/Document1",
            "creationInfo": "_:creationinfo",

SpdxDocument derives from Element, so it has 3 required properties, type, spdxId and creationInfo. We've seen all of these properties before in Person, so hopefully this getting more familiar. Note that we again link back out our previous CreationInfo object.

Next, we need to indicate which Profiles our document uses using the profileConformance property. This can be used by consumers of the document to quickly determine if the information they want is in the document (for example, if a user wants to find CVE data, but the security profile is not present, there is no reason to continue looking in this document).

            "profileConformance": [
                "core",
                "software"
            ],

In this case, we are saying this document conforms to the core profile (all SPDX documents should include this), and the software profile, since we will be describing some software later.

The final property we need to define is rootElement. This property is a list of Element (or any subclass of Element) references. Add this now and close our our SpdxDocument:

            "rootElement": [
                "http://spdx.example.com/BOM1"
            ]
        },

The purpose of this property is to indicate the "interesting" element(s) in the document. Since a document can contain a large number of elements, it might be difficult for a consumer of the document to know what the focus of the document is. This property clarifies that by suggesting which element(s) a user should look at to start navigating. While it is possible to have more than one root element, it is rare to need more than one.

Careful readers of the SpdxDocument documentation will note that we have omitted the element (derived from the ElementCollection parent class). Technically speaking, the property should link to all the elements that are in the document using this property. However because this would be error prone, it is implied that all Element objects present in the @graph (that is, all the objects we are writing) are implicitly added to the element property.

A Complete Document!

At this point, we have a completed SPDX document (albeit, one that has an unresolved references in SpdxDocument.rootElement). This is a fully valid document because it has the SPDX 3.0 preamble, and the required SpdxDocument object, which in turn requires a valid CreationInfo, which we've provided. Finally, the CreationInfo requires an Agent to describe who or what created the Elements in the document, which we've provided by writing a Person object which describes you.

While this is the minimal example, it may feel long. However, as we continue in the document it should become more apparent how reuse of these 3 objects (particularly the CreationInfo) helps reduce total document size while still conveying precise information. In addition, there are other options to make a more compact document that are not covered yet, such as referring to a external Agent instead of encoding it in the document.

Lets Add Some Software!

Now that we have the basic valid document, its time to start adding some interesting data to it. Lets start with a fictitious software package called amazing-widget which we distribute as a tarball for users to download and run.

To start with, we need to define a software_Package object the defines how our software is distributed. In this case, the software_Package will be describing a tarball which someone can download, but it can be almost any unit of content that can be used to distribute software (either as binaries or source). See the documentation for more details.

Lets define our package:

        {
            "type": "software_Package",
            "spdxId": "http://spdx.example.com/amazing-widget",
            "creationInfo": "_:creationinfo",

This should be familiar by now. Note the reuse of our previous CreationInfo.

Also note that this is our first element that is outside of the Core profile in SPDX. In this specific case, the class is defined in the Software profile, and as such is prefixed with software_. Any classes and properties that are defined in a profile other than Core will be prefixed with the lower case profile name + _ to disambiguate them from classes and properties with the same name in other profiles.

Again, we can use Element.name to give the common name for our package:

            "name": "amazing-widget",

Importantly, even though this is a class defined in the Software profile, name is defined in core so it does not get prefixed. When writing objects, pay attention to which profile the property is defined in, as that sets the prefix (the documentation should make it clear what the serialized name of a property is if you are unsure TODO: It does not yet).

Next, we will define what version the amazing-widget package is using software_packageVersion, and where the user could download this package from using software_downloadLocation (both are optional):

            "software_packageVersion": "1.0",
            "software_downloadLocation": "http://dl.example.com/amazing-widget_1.0.0.tar",

These are our first two examples of properties not defined in the Core profile, and as such they get the software_ prefix.

Now, we should define when this software was packaged using the (optional) builtTime property, so that downstream users can tell how old it is:

            "builtTime": "2024-03-06T00:00:00Z",

Note that we are back in the Core profile properties here (specifically, builtTime is a property of Artifact in Core)

Next, we want to indicate who actually made the package we are describing. This is done using the (optional) originatedBy array property:

            "originatedBy": [
                "http://spdx.example.com/Person/JoshuaWatt"
            ],

In this example, you can put a single element that references your Person spdxId here to indicate that you actually made the package. Note that while we are using the same spdxId as we used in the CreationInfo, this is not required. originatedBy is the property that we used to describe who made the actual package being described by the software_Package and not the JSON object itself.

Finally, we would like to inform consumers of our SPDX how they can validate the package to ensure its contents have not changed, or to check if a file that they have is the same one being described by this document. This is done using the verifiedUsing property, which is an array of IntegrityMethod objects (or subclasses).

            "verifiedUsing": [
                {
                    "type": "Hash",
                    "algorithm": "sha256",
                    "hashValue": "f3f60ce8615d1cfb3f6d7d149699ab53170ce0b8f24f841fb616faa50151082d"
                }
            ]
        },

Specifically, we are using the Hash subclass of integrity method to indicate that the SHA-256 checksum of the package file is f3f60ce8615d1cfb3f6d7d149699ab53170ce0b8f24f841fb616faa50151082d

Whats in our Package?

Describing that we have a distributed package is a great start, but we are able to go further (although this is not mandatory!). Our next object is going to describe all the files contained in our software_Package by using software_File.

Lets get started with our first file, the program executable:

        {
            "type": "software_File",
            "spdxId": "http://spdx.example.com/amazing-widget/main",
            "creationInfo": "_:creationinfo",
            "name": "/usr/bin/amazing-widget",
            "verifiedUsing": [
                {
                    "type": "Hash",
                    "algorithm": "sha256",
                    "hashValue": "ee4f96ed470ea288be281407dacb380fd355886dbd52c8c684dfec3a90e78f45"
                }
            ],
            "builtTime": "2024-03-05T00:00:00Z",
            "originatedBy": [
                "http://spdx.example.com/Person/JoshuaWatt"
            ],

We've seen all this before, so hopefully it all makes sense.

While it's great to have a file, it's not easy to tell what purpose this file serves. We might be able to infer that its an executable program from the name, but SPDX provides the ability for us to directly specify this using the (optional) software_primaryPurpose and software_additionalPurpose properties (derived from sofware_Artifact):

            "software_primaryPurpose": "executable",
            "software_additionalPurpose": [
                "application"
            ],

A software_Artifact can have as many purposes a you want to describe, but there should always be a software_primaryPurpose property defined before any software_additionalPurpose are added.

Finally, as one last bit of information, we'll say what the copyright text of the program is using the (optional) software_copyrightText property and close out our file:

            "software_copyrightText": "Copyright 2024, Joshua Watt"
        },

Lets add one more file for fun. This one will describe a config file for our program:

        {
            "type": "software_File",
            "spdxId": "http://spdx.example.com/amazing-widget/config",
            "creationInfo": "_:creationinfo",
            "name": "/etc/amazing-widget.cfg",
            "verifiedUsing": [
                {
                    "type": "Hash",
                    "algorithm": "sha256",
                    "hashValue": "89a2e80bc48c4dd10044c441af0fc6fdad5d31b2fa391cb2cf9c51dbf4200ed9"
                }
            ],
            "builtTime": "2024-03-05T00:00:00Z",
            "originatedBy": [
                "http://spdx.example.com/Person/JoshuaWatt"
            ],
            "software_primaryPurpose": "configuration"
        },

Linking things together with Relationships

Now we've described our software_Package, and two software_Files that should be contained in it, but we have one small problem: there is nothing that tells us that our files are actually contained by the package.

In order to do this, we must introduce the SPDX Relationship. These are a very powerful concept in SPDX that allows linking Elements and describing how they are related.

Relationships themselves are also derived from SPDX Elements, so we need the required three properties to start a new one:

        {
            "type": "Relationship",
            "spdxId": "http://spdx.example.com/amazing-widet-contains",
            "creationInfo": "_:creationinfo",

Next, we need to say what the relationship between our objects is going to be. We do this using the relationshipType property:

            "relationshipType": "contains",

The full list of what a Relationship can describe is defined by the RelationshipType vocabulary (a fancy work for enumeration). There are a lot of possible options, and each one has a specific meaning and restrictions on what types it can relate, so read the documentation to find the specific one you need and how to use it. In our case, we are using contains which is defined as "The from Element contains each to Element". Perfect.

Now, we need to describe what Elements are being connected. Relationships always have a directionality associated with them: you can think of them as an arrow pointing from their from property to their to properties. from is always required and must be a single object, whereas to is a list of zero or more objects. Lets write the JSON to express this:

            "from": "http://spdx.example.com/amazing-widget",
            "to": [
                "http://spdx.example.com/amazing-widget/config",
                "http://spdx.example.com/amazing-widget/main"
            ],

This is the minimum required to define a Relationship, but we want to add one more property to convey additional information and close out the object:

            "completeness": "complete"
        },

The completeness property is very useful as it indicates if we know that this Relationship can be considered to describe all we know about the type of relationship or not. For example, by stating that this relationship is complete, we are saying that our package contains those 2 files, and only those 2 files. We could have also stated that the relationship was incomplete in which case we are stating that we know we didn't list all the files, and other are included. Alternatively, we could have stated that the relationship completeness was noAssertion meaning we don't know if we captured all the files or not. If this property is omitted, it's assumed to be noAssertion.

Wrapping it all up in a BOM

We've made great progress, and we are almost done. For our final step, we want to wrap up everything we know about the package into a "Software Bill of Materials".

This is done by creating a software_Sbom object:

        {
            "type": "software_Sbom",
            "spdxId": "http://spdx.example.com/BOM1",
            "creationInfo": "_:creationinfo",

Note that this is the object referenced by the rootElement of our SpdxDocument, since it is the primary subject of our entire document.

software_Sbom derives from ElementCollection just like SpdxDocument, so it has the same rootElement property. In this case, it is the subject of the SBOM, which is our software_Package:

            "rootElement": [
                "http://spdx.example.com/amazing-widget"
            ],

Unlike SpdxDocument however, there is no implicit value for the element property. Instead, we need to list all the elements that are part of this SBOM (think of this as the line items in the SBOM). In our specific case, this is the software_Files that part of our package, but if you had any other elements related to the package (e.g. licenses, security information, etc.) those would also be included:

            "element": [
                "http://spdx.example.com/amazing-widget/main",
                "http://spdx.example.com/amazing-widget/config"
            ],

Finally, we need to specify what type(s) of BOM this is using the software_sbomType property:

            "software_sbomType": [
                "build"
            ]
        }

This property is effectively indicating at what point in the software lifecycle this SBOM was generated. Since we are describing an executable program, build seems the most likely.

Closing it all up

Now that we are all done, we have a few things to clean up, namely that we need to close the @graph list and the root object, so lets do that now:

    ]
}

Congratulations! You just wrote your first SPDX document! Hopefully this walk through has been instructive and you are ready to get started with SPDX!