Knight Capital Group, Inc. was a global financial services firm that operated in the world’s premier market-making, electronic execution, and offered side platform. It was one of the leading market makers in the USA, with more than 1,800 registered representatives serving approximately 31,000 active retail brokerage accounts. In 2012 a software error at the company led to a catastrophic trading loss and a collapse in their stock price. And it could have been avoided.
What happened to Knight Capital?
Knight Capital made money through making markets in equities and equity options, electronically executing trades on behalf of broker-dealers, high-frequency trading firms and institutional investors, and facilitating cash transfers between securities exchanges.
On August 1st 2012, the company’s employees made a system upgrade related to a new Retail Liquidity Program (RLP) called SMARS. This caused Knight’s US platform to produce faulty code which resulted in it placing errant trades on NYSE listed stocks. Knight Capital was not able to prevent the malfunction from occurring.
Over the course of 45 minutes, SMARS routed millions of orders into the market which resulted in over 4 million executions in 154 stocks, representing over 397 million shares. By the time Knight stopped sending orders, they had a net long position in 80 stocks worth approximately $3.5 billion and a net short position in 74 stocks worth roughly $3.15 billion. Knight ultimately lost over $440 million due to these unwanted positions.
How did it happen? A textbook example of GRC failure
The deployment of the new RLP code in SMARS was intended to replace older code running on eight servers. However, the old code was still on all of those servers. Previously, this older code had been used for functionality called “Power Peg,” which Knight had stopped using many years earlier. But even though the Power Peg functionality had not been used in years, it was still accessible and callable at the time of the RLP deployment on August 1st 2012.
The new RLP code also used a previously utilized flag to activate the Power Peg code. As part of the upgrade, Knight planned to remove the old Power Peg code so that the new RLP functionality rather than Power Peg would be activated when this flag was turned on. In 2003, Knight stopped using Power Peg. However, in 2005, the old Power Peg code was repurposed to a different entry point of the SMARS code sequence.
According to the Securities and Exchange Commission’s (SEC) cease-and-desist proceedings, this Power Peg code was not retested after it was moved to see whether the application would still work properly if utilized. During the installation of the new RLP code, the code was only deployed to seven of the eight servers. The SEC’s cease-and-desist proceedings emphasized that a second technician did not review the change.
In DevOps parlance, there was no peer review on the merge request. There were no established procedures in place that required such a review. In the end, orders sent to the eighth server that had the old code and the repurposed flag began sending erroneous orders, causing one of the largest High-Frequency Traders (HFT) to end up with a $460 million loss within 45 minutes, and bankruptcy within 24 hours.
The Securities and Exchange Commission’s findings
The following is a summary of the Securities and Exchange Commission’s cease-and-desist order dated October 16, 2013.
- Knight did not have an adequate written description of its risk management controls as part of its books and records in a consistent manner.
- Knight did not have technology governance controls and supervisory procedures sufficient to ensure the orderly deployment of new code or to prevent the activation of code no longer intended for use.
- Knight did not have controls and supervisory procedures reasonably designed to guide employees’ responses to significant technological and compliance incidents.
- Knight did not adequately review its business activity in connection with its market access to assure the overall effectiveness of its risk management controls and supervisory procedures.
- Knight’s 2012 annual CEO certification was defective because it did not certify Knight’s risk management controls and supervisory procedures.
Knight Capital and DevOps Automated Governance
Although the Securities and Exchange cease-and-desist proceedings did a great job outlining the incident, there is very little other public information to analyze the incident comprehensively. Therefore, most post analyses of the Knight Capital incidents, including this one, should be considered counterfactual reasoning. However, for a DevOps Automated Governance discussion, we can highlight some of what could have been likely contributors to the August 1st 2012 incident.
Probably the most glaring observation is that Knight didn’t seem to have any evidence of how they deployed their software. One could further surmise from the cease-and-desist proceedings that they were not automating their deployments. Again, this might be a counterfactual. However, this line from the cease-and-desist proceedings strongly suggests that the deployment was a manual process.
“Knight’s technicians did not copy the new code to one of the eight SMARS computer servers. "
If Knight had a DevOps Automated Governance system, they could have made a solid response to the cease-and-desist by showing immutable, digitally signed evidence of what happened instead of ad hoc post-investigation reviews.
For example, if the deployment was automated using a tool like Chef, Puppet, or Ansible, they could have created immutable evidence for the deployment. Furthermore, it would have been less likely that the eighth server would have been missed if they were using automation.
No evidence of Segregation of Duties
Along those same lines, Knight had no evidence that they were following Separation of Duties (SoD) principles. DevOps Automated Governance, at a minimum, would have shown evidence of their awareness and intent of SoD.
A common practice in a DevOps Automated Governance system is to process all of the evidence and controls. That would have gated the non-compliant activity along with immutable attestations. With control gates not only would Knight have had immutable evidence, but the non-compliant activity would have been flagged and the deployment might have been stopped.
Almost all of the well-known compliance frameworks require some form of evidence of testing and review procedures including, but not limited to, GDPR, HIPAA, NIST, PCI DSS, and SOX. In the SEC cease-and-desist proceedings, it was observed multiple times that changes to the old and repurposed Power Peg along with the new SMARS software were not tested and reviewed.
Conclusion
In the end, we can observe that Knight had a very poorly developed risk management strategy that included poor compliance evidence. Their lack of automation across some of their product offerings severely limited their ability to respond and communicate effectively. Finally, they did not seem to have any evidence of compliance testing or reviews in place - even though the prevailing industry best practices required this type of control.
Had Knight had any evidence of DevOps Automated Governance and risk management in place, they likely would have been able to communicate the facts quickly and effectively to regulators and customers alike - mitigating damage and adverse impact.