A hot take đĽđĽ from a kind place.
Before I start throwing sparks around I want to make clear that I think thereâs lots of benefits to capturing everything as code in git. Static definitions, recipes and specs for how we make our software are useful in all kinds of ways. đ
However, those definitions don’t help us to understand our dynamic environment and thatâs my essential problem with GitOps. Lots of claims are made for GitOps - it offers better security, historical records, and a solution to drift and reconciliation. I find myself wondering whether any of this is really true and in this article Iâll explain why.Â
GitOps makes me think of the old Hans Christian Andersen tale, about whatâs real and whatâs imagined. The emperor declares heâs wearing clothes, but what if heâs actually not wearing anything at all?Â
What is GitOps?
Before we dig in, letâs set a baseline for what we describe as GitOps based on weaveworkâs four principles
- The entire system is described declaratively.
- The canonical desired system state is versioned in Git.
- Approved changes that can be automatically applied to the system.
- Software agents to ensure correctness and alert on divergence.
Just like the agile manifesto, these four principles are pretty easy to accept. But, as with agile, turning theory into practice is the interesting part.
"I saw the best minds of my generation destroyed by madness...while trying to set up continuous integration."
— Austin Bingham (@austin_bingham) August 18, 2022
What does GitOps look like in practice?
Gitops strongly centers on the idea of software agents continuously running to converge system state with desired state.
So, how do we reconcile these states using a typical GitOps approach?
We install an operator (or agent) into our cluster which âpullsâ (more on that later) the desired state from a git config repo, makes decisions, and adjusts workloads accordingly.
This is offered as an alternative to a standard DevOps pipeline which âpushesâ change to the cluster:
Ok, so weâve outlined the theory and described the basic practice. Now for the upsides for GitOps. How do they materialize when we start to implement?
Extra security with GitOps? đ§
First up - added security. Whatâs the benefit of taking the âpull-basedâ approach as opposed to simply pushing a change to our cluster? The main advantage is that with GitOps your CI server doesnât have production access, so we can say that this improves our security.
Traditional | DevOps | GitOps | |
---|---|---|---|
Infrastructure | Imperative | Declarative | Declarative |
Desired State | Untraceable | Versioned | Versioned |
Change Approvals | Tickets | Pull Requests | Pull Requests |
Deployments | Manual | CI Event | + GitOps Operator |
Security | Manual | Secrets in CI | + Secretes in infra |
However, is there really any additional security in this setup? If the CI system can update configurations, how does GitOps prevent rogue workloads from being deployed by a malicious actor with access to CI? đ¤Â
Versioning and environment history
Another major selling point for GitOps is the versioned history for the environment. That is kinda true, but you also get this with plain old DevOps assuming your pipeline and deployment information is in the source repo. This history is useful, but it isnât a true record for how environments have actually changed (more on this later). Â
Rollback
Is rollback simpler with GitOps? Iâm of the opinion that youâre better off with regular old DevOps by just reverting the commit. The benefit here is that it makes rollback a standard developer workflow and versioned with the source repository. Something doesnât work? Simply git revert
Disaster recovery
What happens when your whole cluster goes down? What happens when you want to bring up a new cluster? Those are fair questions. But most teams aren’t rolling out blue/green clusters. Most companies have a static cluster/clusters. Most disaster recovery wouldnât be hampered with the need to run deployment pipelines, and I think this should be scripted without the need for GitOps.
So, yeah, Iâm skeptical about the benefits. But I have more reservations when we start to look at the trade-offs we have to make when we implement GitOps. Letâs take a look at those.Â
The challenges with GitOps
The first big challenge with GitOps is the effect it has on our pipelines. Splitting the deployment away from the earlier stages of the pipelines causes them to become distributed. From a value stream perspective this makes it hard to understand the overall path from commit to production. It disconnects earlier stages of qualification from later ones.
This matters because it removes the developer feedback from the value stream. In this setup, if a deployment fails, where does the feedback come from? How do developers get information on the deployment process? How can they enhance the deployment process with their own notifications? How can they improve the deployment process?
The second side effect is that separating these stages into two toolsets increases the gap between development and operations.
Typically the GitOps tooling is run and managed by a central platform team. Often the CI system is in the domain of the team. As Marshal McLuhan said, âWe shape our tools and thereafter our tools shape usâ.
The gap is widened even further by using a separate config repo to store the desired states:
It is common to have git repositories centered around individual microservices with a separate common repo for describing the desired state of environments. One is code and developer centric, one is operations centric. Also, it is not unusual to have to write glue pipeline scripts to update the config repo.
Revisiting push vs. pull
The major innovation in GitOps seems to be to move the operations to a pull-based model. This seems like a big change, but on closer inspection I donât think itâs actually true. [Thanks to my good friend Henrik Hoegh for spelling this out to me]
Typically, a GitOps operator reads a config from a git repo, applies zero-to-many transformations on it, and then pushes it into the kubernetes API server. Which is exactly what your deployment tool does in the push based model! With GitOps we distribute our pipeline over two asynchronous tools, using a git repository as a semaphore, but with both approaches we push changes into our cluster.
Itâs great for Drift and Reconciliation though, right?
Another big headline advantage of GitOps is the reconciliation loop - the automated repairing of any drift or manual change. Any undocumented changes are erased and the environment is reconciled with the git definition.Â
At face value, this seems like a massive bonus. However, I take a different view on this too. Before we jump to reconciling undocumented changes we need to ask why they have happened in the first place. Maybe we donât want them reconciled? There could be a very good reason to do a manual change, and we might not want the environment to be automatically repaired. Â
Another reason might be sabotage, in which case we definitely want humans in the loop to investigate and manage the situation. In either case, configuration drift should cause a proper incident management process to take place, not just a slack message from a reconciliation loop that disappears into the ether.
And on the technical side, I feel that Kubernetes already has a reconciliation loop. You describe your deployments and configuration declaratively, and it is Kubernetesâ job to make that true. Layering reconciliation loops feels like adding unnecessary complexity.
The map git repo is not the territory
We like to think the git config repository is equivalent to how things change, but in reality there is a gap between these static definitions and what is actually happening in the dynamic DevOps automation. All this talk about GitOps providing a âSingle Source of Truthâ is simply not true. If I want to find out what was actually running on Thursday night there is no simple way to get there.
The GitOps configuration provides no insight into manual changes, scaling events, failed reconciliation and many other edge cases. These types of events cause incidents but GitOps provides no situational awareness when they happen.
When an incident occurs what we really need is to understand how things have actually changed. A big problem with modern GitOps is developers and ops teams having little or no true record of the actual changes that occur. We need to be clear that desired states are not actual states.
A static view of change is beneficial but limited
I began by saying that Iâm all for putting the recipes, definitions, and specs for our desired DevOps in version control. It offers us all kinds of benefits. Letâs remind ourselves of what they are:Â
- Better transparency: enables sharing, reviewing and auditing in a familiar technology
- Code tools and workflows: enables branching/pull-request based approaches to integrate change
- Better Quality: allows you to add linters, checkers and static analysis in the automation processes, and enforces consistency of changes
- Immutability: helps minimize configuration drift
- Centralization: can help reduce âconfiguration sprawlâ: the configuration of processes spread over multiple unconnected systems
So far so good - but every static definition has a dynamic execution. There are real events taking place, asynchronously and automatically, with results we need to record and understand.
Static Definition | Dynamic Execution |
---|---|
Build script | Compilation/Packaging |
Test suite | Test runs |
Deployment file | Deployments |
Docker file | Docker image builds |
Infrastructure model | Infrastructure changes |
The dynamic world is, quite literally, where the action is. Working back from a GitOps definition to events, changes, ordering, and dependencies is not easy for developers.
If youâre still wondering why this matters, the google SRE book tells us that â70% of outages are due to changes in a live system.â So, when things go wrong, the dynamic world should be the first place we look for answers.Â
Conclusions - Whoâs GitOps?Â
Much like the agile manifesto, the loose definition of GitOps means that it can and will be applied in all kinds of different ways. A broad church approach is the perfect nerd-snipe. Is terraform GitOps? Maybe? I dunno!Â
And like agile everyone experiences FOMO. What if this is the next big thing? Do we jump on the bandwagon for fear of being left behind? With agile we need to ask âwhoâs agile?â Maybe we need to ask âwhoâs GitOps?â As ever, what we really need to ask ourselves is - who are we serving with these tools and what problems are we trying to solve?Â
At the end of The Emperorâs New Clothes the people around the emperor continue to praise his new outfit, even as they come to realize heâs completely naked. Itâs an embarrassing situation that any of us can get into when we go all in on something and want it to be true. Like the kid in the crowd who shouts âBut the Emperor has nothing at all on!â sometimes itâs good to uncomplicate everything by just saying what you see.