Imagine you’re a Fintech CTO 🤓 with several teams and tens of microservices. Do you know what’s currently running in prod? How about yesterday? A week ago? Last month? And if you do know what’s in prod, do you also know how it got there? 🤔
Getting answers to these questions isn’t straightforward, but for teams in regulated industries it’s essential. You have to know what has changed, when it changed, and who changed it. You have to know what was running in prod at 2.30am last Thursday, or indeed at any other given time. ⏰
In this post we’ll show you how to answer these questions - and get rid of a bunch of paperwork - by automating a secure chain of custody across your pipelines. In only 5 steps. 👍
- Build a “black box recorder” for the pipelines
- Ensure binary provenance
- Automate risk controls
- Segregation of duties and approvals
- Environment change logs
Ready? Let’s dive in! 🐬
1. Build a “black box” recorder for the pipelines
The first step to compliance automation is to build a “black box”◼️ to automatically record every event in our software delivery process - from the initial git commit to the final push to prod.
I’m using the black box analogy because our software process needs to have an automated record of every change that is both infallible and indestructible. 💪
It should also be based on an append-only journal where every change is non-modifiable. Without this feature we will not be able to prove compliance in a way that will satisfy regulators.
Now we can eliminate a lot of the manual change documentation with automation. 🤖
2. Ensure binary provenance
Keep in mind our overall objective - we want to know what’s running in prod and how it got there. To be cryptographically sure of this we need to rewind 🔙 all the way back to the build process to establish binary provenance.
Binary provenance is a fancy term, but it boils down to two simple concepts:
- How to identify software
- How to connect this identity to its ingredients
For software identification, we really favour cryptographic fingerprints 🐾 (Kosli uses SHA256 sums for example). This is a unique identifier for any sequence of bytes.
Forget about file names, versions, strings, schema, metadata, or anything else that can be easily changed. You want the SHA. 👈
You can then know with certainty that any time you see this same collection of bytes - regardless of where you find it 🔍 - it’s the same binary because the prints match. If they don’t match it’s either a completely different artifact or, less likely, someone has modified the original.
And that means you can’t qualify one thing and then deploy another without realising the discrepancy. 🕵️🕵️
To establish binary provenance we take the SHA256 fingerprint of every artifact we build, record it securely in our black box with the source code that created it, and any other relevant meta-information, such as links to build logs, etc. 📊
3. Automate Risk Controls
Risk controls of different descriptions are usually defined by your Secure Software Development Lifecycle (SSDLC). For example, in the process documentation we might have things like:
- Code must be unit tested
- All merges to master must have approved pull request
- Code must be pass integration & contract tests
- Code must have passed security analysis
These controls can be anything we like, but whatever they are we should do them in our pipeline. Remember, implementing tasks as gates 🚧 is a mistake and these are no different. That’s why we automate all of these processes and make sure they’re executed for every change. Whatever our risk controls are, we automate them! 🤖🤖🤖🤖🤖
The task of conforming to our process is now automatic, and we can proceed to automate the collection of all the evidence that these tasks have been done against the relevant binary in our black box. 😎
So, we start with the SHA of our binary, which is our initial fingerprint, and as we qualify it and pass it through the risk controls in our pipeline we add the evidence to the binary and record it in our black box.
This gives us a complete history of where our binaries came from and all of the changes and events that they have been subject to. And because our black box is append-only we’re only ever adding ➕➕➕ new evidence.
That means we can go back in time and look at any point in the history of our pipeline. We can re-run stages of the pipelines or overwrite existing data, but these will be new commits in the database and the black box will capture them.
We now have a ready made audit trail that’s secure 🔐 and incorruptible. 🥳
4. Segregation of duties and approvals
“But what about the human in the loop?” I hear you ask. 🙋🏽
In high security environments there’s usually a requirement that a person with a designated role is tasked with approving the risk of changes to production. It’s a way to mitigate insider threats and ensure accountability by giving a human👨🏻💼 responsibility for looking at proposed changes and taking responsibility for them.
But, as we’re already following steps 1-3, so this step can be automated from our CI system, from Slack, from Git, or from inside the black box itself 👍 Remember, we have a record of every change, so our designated human can predefine all the criteria needed for a release candidate to pass to production. ➡️
For example, if we can verify binary provenance and check for all of the necessary risk controls in the pipeline, the deployment step can be automated because all the required evidence is there. The key point is that even if people are required to approve, it doesn’t have to be a high ceremony as long as you manage risk and documentation automatically.
5. Environment Change Logs
The final step is to log 📊 every change made to every service in production. By doing this for every change we can know what’s currently in production, what was in production on a given date, what the last 10 deployments were, etc. 👀
And that’s basically all there is to it. A secure chain of custody is all about creating a trail of breadcrumbs by recording where our binaries come from, and how they change as they make their way through our pipeline to production. ⏩⏩⏩⏩⏩⏩✅
Cool or what? 😎
If you have any questions about this blog or want to talk about doing DevOps in a regulated space, feel free to book some time with me. I’d love to hear👂 about the specific challenges in your industry.
TVM! Mike. 🙏