Using Kosli to signal a change freeze

Like many software teams, here at Kosli we use a continuous delivery approach. This means that every commit to our trunk is automatically built, tested, and deployed to our production-like staging environment. This provides us with the confidence that every build is potentially deployable to production. We use our staging environment to perform final exploratory testing before we deploy to production.

Deployments to production are “on-demand”. Any developer on the team can deploy the current staging version to production, as needed, using a simple command.

We noticed a pattern around deployments, the developer wanting to deploy would ask the team if it was “ok to deploy?”. This might be in a slack message, or in our shared team video call. 99%* of the time the answer would be “👍”, but occasionally someone would ask to wait. There were a few reasons for this. Often, it was because there was a small escaped bug that had been picked up in exploratory testing but hadn’t yet been corrected. Sometimes, there was a long running process running in production that a deployment might interrupt.

Asking “ok to deploy?” is fine when there are other folks around, but it creates uncertainty when there’s something to deploy NOW and you’re on your own. We’re a small team, distributed across different time zones, so this isn’t unusual.

We wanted a way to signal the exceptional circumstance where a deployment to production shouldn’t happen. A change-freeze of sorts. Our needs were:

Add a gate to the automated deployment process that prevents deployments happening when changes are frozen.
Record who set the change freeze, when they set it, and why.
Record when the change-freeze ended.

Naturally, for us, Kosli came to mind as a useful way to record this.

The nodeploy flag

The first thing we needed was a way to raise a flag so that someone could signal when production changes were frozen. We decided that using Kosli’s tag command we could tag our production environment in Kosli with a nodeploy tag to show that deployments were off-limits.

Kosli tags are key=value pairs, so for this change freeze we settled on using nodeploy=<timestamp>.

$ kosli tag env "prod-aws" --set nodeploy="2024-10-31T10.44.48"

Preventing deployments

To stop deployments happening we add a simple check in the deployment pipeline that verifies the nodeploy flag is not set:

nodeploy=$(kosli get env "prod-aws" --output json | jq -r .tags.nodeploy)

if \[ "${nodeploy}" != "null" ]; then

  exit 1

fi

Recording the details

None of the steps so far have really relied on Kosli, environment tags are just a convenient way to signal the change-freeze. Kosli is designed to record all of the facts about a process. So to do this we will use a Flow to record every time the nodeploy tag is added to production.

A Kosli Flow represents a process, like a build pipeline, or in our case the process for freezing changes to production. Every time you execute that process you can record the facts in a Trail. In Kosli, we call the facts you record Attestations. In Kosli language we want to make an Attestation whenever we set or remove the nodeploy flag.

To record starting a change-freeze we can use an attestation like:

$ kosli attest generic --flow "prod-aws-nodeploy" \
                       --trail "2024-10-31T10.44.48" \
                       --name "nodeploy-on" \
                       --description "Demo for blog post"

To stop the change-freeze:

$ kosli attest generic --flow "prod-aws-nodeploy" \
                       --trail "2024-10-31T10.44.48" \
                       --name "nodeploy-off"

The resulting Kosli Trail then provides us with the audit trail recording all of the details of that change-freeze.

Putting it all together

To make this as simple as possible for the team to use, we created a simple script for setting, removing, and asserting whether the nodeploy flag is in use. You can see the full script here.

nodeploy on

$ ./bin/nodeploy.sh -r "Demo for blog post" on

Setting nodeploy for prod-aws

Tag(s) \[nodeploy] added for env 'prod-aws'

trail '2024-10-31T10.44.48' was begun

generic attestation 'nodeploy-on' is reported to trail: 2024-10-31T10.44.48

nodeploy off

$ ./bin/nodeploy.sh off

Removing nodeploy for prod-aws

Tag(s) \[nodeploy] removed for env 'prod-aws'

generic attestation 'nodeploy-off' is reported to trail: 2024-10-31T10.44.48

nodeploy assert

$ ./bin/nodeploy.sh assert

Checking nodeploy for prod-aws: ok.

Next steps

We’d love to hear if something like this would be useful for you and your team? Do you have change-freeze windows? How do you record what happens during these windows? Do you need a mechanism to override the change-freeze for particular forms of change?

Don’t hesitate to get in touch!

*Probably. We didn’t really measure it.

TRUSTED BY THE WORLD’S LARGEST BANKS AND REGULATED COMPANIES