If you’re using containers to deploy your software, it is important to be aware of potential vulnerabilities within your container images. These may be introduced through dependencies in your built image, or perhaps through dependencies within the base image(s) used to build your image.
Snyk is one of the most popular tools for scanning container images for vulnerabilities - you may well already run a snyk container test when you deploy code through your CI pipeline. However, with new vulnerabilities (and their fixes) being discovered constantly, it isn’t enough to simply scan just prior to deployment. An image with a squeaky clean snyk scan upon deployment might be subject to several high-severity vulnerabilities the very next day - and if you have software which is infrequently deployed, that can lead to your software being vulnerable for days, weeks, maybe even months.
Finding new vulnerabilities running in production
The answer is simple - scan your images after deployment as well as before, ideally on a regular schedule, so that you are alerted to (and can address) vulnerabilities as soon as possible. But manually rerunning snyk scans across all your environments every day or week will quickly become very tedious - so here we discuss a simple method to automate the process using Kosli, Snyk and your CI pipeline of choice, giving you regular scanning and real-time feedback of the results with Kosli’s compliance system.
In this article we show how we have implemented this process for the production environment (aws-prod) for https://cyber-dojo.org . Since cyber-dojo comprises numerous individual micro-services, each with their own Docker image, automated scanning is significantly more convenient than manual!
Environments and Flows in Kosli
Before we jump in, there are two key Kosli concepts you need to be familiar with: environments and flows.
A Kosli environment is a place where you can track the changes in your runtime environment through time, which can contain numerous artifacts. Here is cyber-dojo’s aws-prod environment:
A Kosli Flow is basically a place to record the artifacts produced by your CI pipeline along with any associated evidence (e.g. test results, Jira tickets). Each Flow has a template specifying the evidence you require for an artifact to be considered compliant. The compliance status of a Flow or an environment is determined by the compliance status of the currently-running artifact(s) therein. Here is cyber-dojo’s nginx Flow:
How to get the data from Kosli
We need to get the necessary data from cyber-dojo’s aws-prod environment in order to identify the images we want to scan. If you install the Kosli CLI, there is a simple command for this, which outputs JSON containing details for all the currently-running artifacts in the environment. Here is the command to get the data from the environment:
kosli get snapshot "${KOSLI_ENVIRONMENT}" --output=json > "${snapshot_json_filename}"
Here is some of the resulting JSON:
The JSON for artifacts with provenance (i.e. we know the Flow) can then be parsed to extract the required details:
- Artifact name
- Artifact fingerprint SHA
- Flow name
- Git commit SHA
In cyber-dojo’s snyk-scans repository we use jq and bash to create variables called FLOW, GIT_COMMIT, FINGERPRINT and ARTIFACT_NAME.
Run a Snyk Scan and report the evidence
We can now use these variables to run a snyk container scan and report the result to Kosli. Each cyber-dojo service repository has a .snyk policy file, outlining any vulnerabilities that we want the scan to ignore - if there is no fix currently available, or if the vulnerability does not impact the service. This can be retrieved using cURL and stored in a temporary file for the snyk scan.
For each artifact contained in the Kosli snapshot we now run the snyk scan:
run_snyk_scan()
{
local -r snyk_output_json_filename="${1}"
# Use fingerprint in image name for absolute certainty of image's identity.
local -r image_name="${ARTIFACT_NAME}@sha256:${FINGERPRINT}"
local -r snyk_policy_filename=.snyk
# All cyber-dojo microservice repos hold a .snyk policy file.
# This is an empty file when no vulnerabilities are turned-off.
# Ensure we get the .snyk file for the given artifact's git commit.
curl "https://raw.githubusercontent.com/cyber-dojo/${FLOW}/${GIT_COMMIT}/.snyk" > "${snyk_policy_filename}"
set +e
snyk container test "${image_name}" \
--json-file-output="${snyk_output_json_filename}" \
--severity-threshold=medium \
--policy-path="${snyk_policy_filename}"
set -e
}
A Kosli Flow not only records the provenance of an artifact, it can also determine its compliance to an SDLC policy. Every cyber-dojo artifact requires snyk scan evidence to be compliant - so we need to re-report this snyk evidence to the artifact’s Flow.
kosli report evidence artifact snyk \
--fingerprint="${FINGERPRINT}" \
--flow="${FLOW}" \
--name=snyk-scan \
--scan-results="${snyk_output_json_filename}"
How to automate Snyk scans in production
Now that we can retrieve the environment data, scan the images and report back to Kosli, we want a way to have this happen automatically and periodically. This is easily achieved by setting up a workflow in your CI pipeline that runs on a cron job. The workflow will then run at whatever time(s) you’ve specified, calling the script(s) for pulling the Kosli data, scanning and reporting. If using Github Actions, this will look like this (although of course Kosli works with any CI):
name: Weekly Snyk scan of aws-prod
on:
workflow_dispatch:
schedule: # At 09:00 every Saturday
- cron: '0 9 * * SAT'
If the new snyk scan did not find any vulnerabilities, the relevant Kosli artifact will remain compliant; otherwise it will become non-compliant, alerting you to the fact that there’s a problem and giving you the opportunity to address the issue quickly.
Regardless of the outcome, the snyk scan evidence will be attached to the relevant artifact, giving you proof that the process is working and your images are being regularly scanned. It is only the most recent evidence that determines an artifact’s compliance status, so you can be certain that you’re being kept up to date with the most recent scans.
In the following snapshot, cyber-dojo’s production environment is non-compliant due to a failing snyk scan in the nginx Flow - thanks to Kosli we were able to fix the problem and rapidly regain compliance!
Conclusion - security depends on continuous environment monitoring
An important lesson to take away from this is that your security ultimately depends on the software running in production. It’s easy to assume that deployments made with secure base images, thoroughly scanned and tested before they’re released, can be shipped to production and then forgotten about.
But it’s worth bearing in mind that new CVEs are announced every day, and the only way to know if your environments are affected is to run your scans again in the way we have just demonstrated.