The Strangler Pattern is a pattern for safely and carefully retiring old code. The idea is simple - you run the old code and new code live, in production, side-by-side, checking that the new code behaves exactly the same as the old code. Once you are confident it does, you retire the old code.
In a previous blog we showed you how to get Python test coverage faster without killing your server. This 5 minute blog post shows how Kosli uses Python decorators to strangle old code. It describes strangling at the code level only, not at the database level. The Strangler module and its tests are available in the public tdd repo.
Suppose one of the Python classes we wish to strangle is called Artifact
:
class Artifact:
def __init__(self, flow, docs):
...
@property
def created_at(self):
...
def add_junit_evidence(self, evidence, timestamp):
...
We start by renaming Artifact
to OldArtifact
and creating a new class called Artifact
. All methods and properties of Artifact
proxy to OldArtifact
via two class-level strangler decorators:
@strangled_method
- illustrated onadd_junit_evidence
.@strangled_property
- illustrated oncreated_at
.
Both decorators use an argument to control whether the old or new code is on the mainline and whether old and/or new code is run. If both the old and new code is run their behaviours are compared. The argument is always one of the following four values:
OLD_ONLY = ("old", True, False)Â # mainline==old, call old only
OLD_MAIN = ("old", True, True) Â # mainline==old, call both & compare
NEW_MAIN = ("new", True, True) Â # mainline==new, call both & compare
NEW_ONLY = ("new", False, True)Â # mainline==new, call new only
OLD_ONLY
After this first refactoring step the Artifact
class looks like this:
from .old.artifact import OldArtifact
from strangler import *
@strangled_method("add_junit_evidence", use=OLD_ONLY)
@strangled_property("created_at", getter=OLD_ONLY)
class Artifact:
def __init__(self, flow, docs):
self.old = OldArtifact(flow, docs)
Decoration can now proceed incrementally, one property or method at a time, at differing levels of progression. For simplicity each example below shows both decorations at the same level.
OLD_MAIN
Now create a NewArtifact
class and start implementing the new functionality in it. Any Artifact
method/property decorated with OLD_MAIN
will automatically detect any difference in old-new behaviour. Response times will go up. We found that in production the decorators found small edge case differences well after our unit tests.
from .old.artifact import OldArtifact
from .new.artifact import NewArtifact
from strangler import *
@strangled_method("add_junit_evidence", use=OLD_MAIN)
@strangled_property("created_at", getter=OLD_MAIN)
class Artifact:
def __init__(self, flow, docs):
self.old = OldArtifact(flow, docs)
self.new = NewArtifact(flow, docs)
NEW_MAIN
Once you are confident the new implementation of a method/property is behaving identically you can switch it to the mainline. We found that small sets of dependent methods/properties often had to be switched to the mainline in unison. Production is still checking for differences in behaviour.
from .old.artifact import OldArtifact
from .new.artifact import NewArtifact
from strangler import *
@strangled_method("add_junit_evidence", use=NEW_MAIN)
@strangled_property("created_at", getter=NEW_MAIN)
class Artifact:
def __init__(self, flow, docs):
self.old = OldArtifact(flow, docs)
self.new = NewArtifact(flow, docs)
NEW_ONLY
Before deleting the old code you can turn it off. This allows a rapid switch back to NEW_MAIN
should that be needed. Response time should improve as the old code is no longer being run (except for the __init__
).
from .old.artifact import OldArtifact
from .new.artifact import NewArtifact
from strangler import *
LEVEL = NEW_ONLY
@strangled_method("add_junit_evidence", use=LEVEL)
@strangled_property("created_at", getter=LEVEL)
class Artifact:
def __init__(self, flow, docs):
self.old = OldArtifact(flow, docs)
self.new = NewArtifact(flow, docs)
When all methods and properties of Artifact
are NEW_ONLY
you can delete the Artifact
and OldArtifact
classes and rename NewArtifact
to Artifact
.
The delegation decorators
The strangled_method
decorator curries all arguments inside a functor. The strangled_property
is very similar; it has two functors, one wrapping the getter, the other wrapping the setter. Note: In practice we also needed custom handling for __iter__
after discovering the behaviour of some of our iterators was non-deterministic. See Postel’s Law.
def strangled_method(name, *, use):
check_use(use)
def decorator(cls):
def func(target, *args, **kwargs):
class Functor:
def __init__(self):
self.args = args
self.kwargs = kwargs
def __call__(self, obj):
f = getattr(obj, name)
return f(*args, **kwargs)
return strangled_f(cls, name, use, target, Functor())
setattr(cls, name, func)
return cls
return decorator
strangled_f
is a helper method used by strangled_method
and strangled_property
. It defines a Call
class (another functor) that calls the curried Functor
passing it either self.old
or self.new
.
def strangled_f(cls, name, use, obj, f):
class Call:
def __init__(self, age):
self.age = age
self.args = f.args
self.kwargs = f.kwargs
def __repr__(self):
return repr(self._target())
def __call__(self):
return f(self._target())
def _target(self):
return getattr(obj, self.age)
return strangled(cls, name, use, Call('old'), Call('new'))
All methods and properties thus end up in strangled
which has three parts:
- Call the old and/or new code using the
wrapped_call
helper function to capture their behaviour in dicts. - If both old and new are being run, compare their behaviour by comparing the two dicts.
- Return the result (or raise an exception) from the old or new code depending on which is on the mainline.
def strangled(cls, name, use, old, new):
if call_old(use):
old_call = wrapped_call(old, old_is_main(use))
if call_new(use):
new_call = wrapped_call(new, new_is_main(use))
if call_both(use):
strangled_check(cls, name, old_call, new_call)
call = old_call if old_is_main(use) else new_call
if call["exception"] is None:
return call["result"]
else:
raise call["exception"]
def wrapped_call(func, is_main):
try:
exception = None
trace = ""
result = func()
except Exception as exc:
exception = exc
trace = traceback.format_exc()
result = Raised()
def safe_repr():
try:
return repr(func)
except Exception as exc:
return f"Exception: {exc}"
return {
"is": "primary" if is_main else "secondary",
"result": result,
"exception": exception,
"trace": trace.split("\n"),
"repr": safe_repr(),
"args": func.args,
"kwargs": func.kwargs
}
def strangled_check(cls, name, old, new):
o_exc = old["exception"]
n_exc = new["exception"]
neither_raised = o_exc is None and n_exc is None
both_raised = not(o_exc is None or n_exc is None)
if neither_raised:
try:
if old["result"] == new["result"]: # [1]
return
else:
summary = ...
except Exception as exc:
summary = ...
elif both_raised:
if type(o_exc) is type(n_exc): # [2]
return
else:
summary = ...
else:
summary = ...
def loggable(d):
...
diff = {
"summary": summary,
"time": now().strftime("%Y-%m-%dT%H:%M:%SZ"),
"call": f"{cls.__name__}.{name}",
"old": loggable(old),
"new": loggable(new),
}
if in_tests(): # [3]
raise StrangledDifference(diff)
else:
log_difference(diff) # [4]
Notes:
[1] Checking for the same behaviour relies on well-behaved __eq__
implementations.
[2] Handling the case when both the old code and the new code raise an exception can be tricky. For example, suppose the old and new code both use a common library function with a syntax error!
[3] When running tests we want any difference in old/new behaviour to become an exception.Â
[4] When running in production we definitely don’t want an exception - log_difference
must not leak any exceptions.
Summary
At Kosli we’ve done two major internal restructurings. Both times we used Python decorators to slowly and carefully strangle the old code. We did this with no server downtime. There was a slight increase in response time over several weeks when the old and new code were both being run and any differences in behaviour ironed out.