John Willis, Distinguished Researcher at Kosli, dives into Investments Unlimited - the latest novel from IT Revolution. It’s about an investment bank dealing with DevOps, DevSecOps, and IT Risk. John is co-author of this bestseller and he will share the story behind the book, how and why it was created, and the real life lessons it holds for all regulated software organizations.
Filmed at Exploring DevOps, security, audit compliance, and thriving in the digital age in Oslo on December 8th 2022.
Full Transcript:
OK, so without further ado, I’d like to welcome John Willis to the stage. As we give him a round of applause, I’d also like to suggest that one of the things we did at DevOpsDays Oslo last month was that every time a speaker takes a drink from a glass of water, it’s nice to give them a little clap – just to keep the keep their spirit going. Get this man some water!
Funny story: I’ve been working on this idea, for lack of a better term, called DevOps Automated Governance for like five years. Really, I could not find a vendor that I thought did what we were sort of driving and building – and I was building this with some of the biggest industry people. I mean these people, if you had a product, they would’ve known about it. I ran into Mike at Open Spaces and I was saying, “I don’t think anybody does this?” And he said, “We do!” And I was like, ‘Here we go, another joker that I’m going to have to beat down…’
And I was like:
“Do you do this?” and he goes, “Yeah, we do that.”
“Alright, well do you do this?!”
“Yeah.”
And I was like “OK, maybe I better come over to your booth and find out what you’re doing.”
So we started with a fight! So he’s a good bloke, he didn’t hold it against me!
Alright, so the book. You know, the thing is, I’m not going to sell you the book. What I want to do is tell you the interesting story of why the book came to be. And it really was sort of an industry industry problem trying to be solved by some people who had this problem.
The people who wrote the book: Helen Beal did massive work at Lloyd’s of London; Bill Bensing did something called Dead Sword, which was one of the biggest DoD DevSecOps projects in the Department of Defense; Jason Cox runs basically all commercial properties at Disney; and John Rzeszotarski was top of house first fellow at Capital One. I think I calculated 150 years worth of experience in this book. And although it’s a fictional story, they’re all basically true stories.
Just quickly, I’ve done a lot of things: I think it’s about 12 books, somewhere around 10 or 12 startups (most of them disastrous flaming burnouts – but hopefully that’s not the case with Kosli!), but I’ve had a couple of wins recently. I sold a company to Docker, I sold a company to Dell – in fact, that’s when my mother-in-law was finally like, “Oh, now I get what you do!” She could never figure it out until I said I’d sold a company to Dell.
The green book is the one we’re really going to focus on, which led us to this one. And I’ve had a 10-year, sort of my white whale if you will, is a Deming book that should be done by the end of this year. Ten years I’ve been working on that, and it’ll be out there pretty soon.
So this is the book. Depending on when you hit enter it could be number one, number eight, number 10 in some category of the book Investments Unlimited. I think it’s important to talk about the people that are – I mean, there are certainly the nine authors, but for me, Topo Pal, you know, brilliant, unbelievable, egoless, brilliant person. You know, go search for some of his presentations – a physicist who became a computer person – and me and him around 2017 were just working with Gene Kim. We were just having this conversation about how terrible internal audit and IT risk is, and the low efficacy. And one day over a bunch of beers, we said, “Blockchain!”, you know, but like, nope, sorry. He came back and reported back that when you mentioned Blockchain in a large bank, it gets taken away from you back in 2017. So then we wrote this paper and the guy in the middle, John Rzeszotarski, literally went back after we wrote the book and built a system. And I became a mentor for that, so these two guys were pretty instrumental in the story that leads up to this novel.
The other thing is Gene Kim basically invites about 40 or 50 of us to Portland every year (in the pandemic it was virtual) and we work on these research papers, we break out into groups, and back in 2015 – it is all Creative Commons, they’re all out there, there’s like 100 papers, not just security, it’s everything DevOps, all industry leaders, an unbelievable resource service. But we wrote something – we wanted to sort of discuss the idea of separation of duties and next that you could redefine the definition of that with DevOps, right? And that’s a well-told story already.
But the one that was really fun, 2018 we did Dear Auditor and I’ll talk a little more about that, but it was sort of a tongue-in-cheek apology letter to the auditors, and at the end it was about another 10 or 15 pages of actually a risk control matrix, which we really defined.
And that led us to this conversation about ‘Could we do something like a Blockchain-ish way to do the evidence, the attestations that controls the gates of the way we deliver software?’ and that book became Investments Unlimited, which was supposedly going to be version two of that.
So the risk control matrix in Dear Auditor is interesting. You can see real quick, there’s like 53 risks, but 10 categories, and you can see the PAM, the SOD, SBOM, the DAR, but the one that sort of woke Topa and I up was the number eight: the divergence of audit evidence from developer evidence. That was the thing that sort of just stuck out, like that’s part of the low efficacy, high toil of what we do in audit in a lot of, especially large, banks, but certainly everywhere. And so we sat down and did this green book I’ve been talking about, which was like, OK, if we’re going to do this thing, what’s our goals and objectives?
Well, we certainly want to shorten the audit time. Most large banks, most large anythings, do 30-40 days a year – even as we speak today – doing audits and the audit efficacy is terrible, right? I’ve gone into CIOs and spent time doing qualitative analysis for their organization.
“How do we do, John?”
“Terrible!”
Because you look at the subjective nature of the way the audits are and the real data, I mean, we could spend the whole next hour talking about how terrible that is. And then ultimately it’s a trust model. Like, why do we have CABs? Because we haven’t built the right trust.
So these were the three primitives that we were going into this thing on. And so we wrote, we sat down, and it started out as a reference architecture – which was we wanted to increase the efficacy while reducing the toil, with the key of the whole Blockchain conversation that started this idea. Maybe it’s not going to be Blockchain, but it’s going to be a model that turns an automated model that turns subjective evidence into objective evidence. Throw away the fancy words, and basically a service record that says, “Blah blah blah, I did this, blah blah blah, I did that, here’s a screen vid, blah blah blah, go look at the log”, to digitally signed attestations that are immutable, non-tamperable, and are completely automated. Right? How about that?
And so the idea then is, OK, what do we do? Traditionally we move from manual gates to continuous checks. We turn 30 or 40-day audits into zero days, hit enter, and see what the world looks like. We go from checklists to risk controls as code. So the idea is we start building an abstraction layer for our code.
You know, what drives me nuts is when I talk to second lines – I was just speaking to a large Asian bank three weeks ago, and the guy who was telling me was second line – used to be first line, now is second, three lines of defense, right? And he said his manager came to him and said, “Why are you looking at code? Why are you always in the weeds?” And he said, “Because I don’t trust the first line’s data.” So there’s just a broken trust model. And so we moved to this idea of a DSL-like function, whereas a human readable machine interprets inversion control so that what we say we want to do is what happens.
And here’s the real key: because once we have an abstraction and we move away from telling the second line to look at Snyk logs or open control or Kubernetes logs, we have this abstraction. So now second line can actually get into designer requirements, and you can agree now on sort of a construct of ‘these are things and they’re trusted’.
If I tell you that there’s a trusted perimeter that says it has to be 80% test coverage, the second line doesn’t have to go and look at a log. And by the way, we don’t want that to happen, right? And they become self-documented because now the agreement is on the primitive in the DSL, right?
So now we build this level of trust. It’s not “Why did I fail the audit or why did you do this? Give me this screen print!”, but it’s “We agreed this is the construct of the agreement.” And at that point, we get to arbitrate over whether the construct is correct or not. We don’t have to argue about that, this is the risk control, I don’t understand how to give you that data.
And then last but not least is the thing that gets most interesting. Once you build that level of trust through this abstraction, you then actually see like – and this is what warms my heart for doing this for 40 years now, that’s pretty scary – that you start seeing this increase of what are called self-identified risks. So now what you have is developer and operations, instead of being reactive or “Don’t tell auditors that, don’t explain that to them because that’s going to add another five days.” They’re actually going back to second line and saying, “Hey, I think I’ve got another idea that we should probably put into risk control.” They’re being creative because now they understand why we’re doing this. So you know you’ve really broken down the wall of confusion when that happens.
And so in that green book we tried to sit down and think about ‘What are the stages?’. So if you think about what we’re going to try to create – immutable data, change subjective to objective – how do we think about the world in terms of stages where we can build common controls and actors?
And so this is just three examples, like risk control and computer scanning. Now, some of them might be pass/fail, some of them might be percentage based, right? So for example, if I could put in a DSL that says, “All dependencies as build for a container must pass”, that’s pretty straightforward.
If today it’s Snyk and tomorrow it’s Aquasec, and then some other fancy product that doesn’t exist yet, it doesn’t matter – you can plug and play those because the abstraction in agreeing between second and first line is already a transaction. And then risk control is something like not only did it have TDD coverage and it passed, but maybe it’s a percentage – like we want 70% test coverage for this particular application. And then something like a leaky-looky, it will scan for secrets.
But again, the key is the abstraction, which is the DSL, where we can trust that the source is non-tamperable. So we don’t have to have conversations about:
“Well, how did you do that?”
“Well, I use Snyk.”
“Well, how do I know Snyk did the right thing?”
“Here’s the log.”
I had one second line tell me the answer in the service now record was appointed to the code in GitHub. I mean, that’s not the way we should be doing business.
This is a longer list that we tried to address in the book, the first book, the green book. What’s interesting is that – remember we talked about self-identified risk – in one of this sort of compliance, like a PCI DSS or even a HIPAA, like psychosomatic complexity might not be a thing that basically is there. But it does make the code stronger, it does create behavior changes, and it really does make you safer, even change size or branching strategy.
What you find is not only do you sort of see an inverse increase, but you start seeing the behavior change. At one bank where for years they were trying to convince people to be active and TDD, and once they actually started putting these on a graph in front of everybody, all these teams started raising their hand: “Hey, can we get that training for test-driven development? You know, container scanning?”
So we did it for each of the stages. So the idea is we had an original prototype for a YAML-based, but you know, coming into Kosli, when I saw what Kosli was doing I was like, “OK, good, I don’t want to do another startup, I’m 63 and getting too old for this game.” And it was just a couple of minor missing pieces, not even missing. But even this week, some of the stuff that we accomplished – John, do you want to raise your hand? We started out with the concept of working with a bank with a pure YAML-based DSL and we’re going to get there, I think, pretty quick based on what we’ve accomplished in one week.
But being able to just take the Kosli command line interface, we already accomplished basically an Investments Unlimited demo this week, starting on Monday. So that’s how powerful the product is. But again, human readable machine interpreted version control, that’s an artifact. Now, the actual artifact that is this abstraction, this DSL, whatever you want to call it, is an attestation along with all the other things.
So if in a month or so it changes – like the rules change, or there’s a new executive order – we can go back and we can say, “Well, it was this way when this was deployed, but it was this way when it was split.” And then you’ll have probably seen – and Mike will do a great presentation later of how you know – but the other thing I liked about Kosli too, which is I’ve always looked at this as a bottom-up approach, right? Like, I need to gather all this data in the pipeline and throw it into basically a Merkle tree or just some type of non-tamperable data – Mike will do the rest there.
But this is important, this is the place I want to get to real fast with what we’re doing at Kosli, because this is what we were doing at the bank. If we have that DSL and we have all those attestations and gates, I want to put this screen in Bamboo, or Travis, or Circle CI or any of that, so the developer’s right there. There’s no confusion about when the developer sees red, they know that they did it within the design requirements with the second line and it takes them right to the DSL or it takes them right to the failure.
Anyway, the company narrative is this small investment company and they’re about to fail an audit, basically. And the CEO gets this notification in the US from the OCC, saying if they fool around, they’ll lose their license. So they get everybody together, and it’s based on the Phoenix Project style, so there’s a sort of Socratic dialogue, there’s somebody who helps everybody understand.
One of my favorite lines in the book is: “Your DevOps has failed you.” In other words, DevOps – a lot of people have done incredible work with DevOps, but they forgot to take security along, right? And you had to explain to them that they needed to do a lot more work.
There’s an interesting story around Equifax, maybe I’ll do that in the next presentation. But here’s the organization chart, and one of the things we tried to purposely create is a confusing org chart in this fictional company. The reason I point it out to you – I’m going to have to steal a couple of minutes, aren’t I? – the reason I point out Equifax is that it becomes a classic example because the CISO reported to the chief legal officer, so they lost like $5billion market cap because of a breach.
And when the US Congress did an investigation on the breach, they interviewed the CISO, and the first question they asked the CISO was: “When you went to work for Equifax, did you think it was odd that you were reporting to the chief legal officer?” She said: “Yes, but I figured they knew what they were doing.”
But here’s the really scary question. They asked them: “When you knew that the PII was breached and the data was breached, why didn’t you contact the CIO?” And the answer was they didn’t think about it. Because that’s what happens when you have an org like this. When you have the chief risk compliance officer reporting to the CEO, the CISO reporting to the CEO, and the CIO reporting to the CEO, you’ve created three silos. And this is not uncommon in most corporations.
By definition, you have created a sort of Conway’s Law for security, if that makes sense. And so there’s Bill, and they get to MRIAs, and what was the other interesting thing to watch was that we made up the story of this company having these what they call matters requiring attention, which are basically notifications from the OCC that you’re doing bad things, you failed investigations.
They had 15 in our story and there were 15 of them. And the MRIA is like, “Hey, we’re pretty pissed right now.” It means you get a matter requiring immediate attention and so we made this story and the CEO gets it and like all hands, the CEO doesn’t screw around. But halfway through the book, we found out there’s a true story exactly like it – it was Mitsubishi who had a North American bank that actually had 15 open MRIAs, ignored it, and got a cease and desist. In the banking business when you get a cease and desist, you’re no longer a bank.
So the good news is they go ahead and they figure out how to fix it, a hero’s journey. They start small, they pick out eight different attestations and gates. This is the demo we built this week.
And here’s another fun thing about demos – demoing 101! – in the book, they do the demo, imagine you’re in a room and we finally get out the room like, “I think we’re going to fix this, we’re going to do this digitally signed attestation stuff!” and everything works.
And everybody is like, “Big deal!” So it’s like, “Wait a minute, you’ve got to do a demo where it breaks!” So we force the break where the checksum failed on an image, and yeah, that’s it.