On feature switches
Once a team has a stable and trusted way to get releases deployed quickly and potentially multiple times during a day it becomes very useful to have a way to switch features or changes ON and OFF in the production environment.
Once a team has a stable and trusted way to get releases deployed quickly and potentially multiple times during a day it becomes very useful to have a way to switch features or changes ON and OFF in the production environment.
Here we see why a team would want to do this, how it can be done and what useful things one should think of when doing so.
Reasons
Building a new feature or a new version of a feature is always a bit risky. And within an agile environment you want to get something out fast to get some feedback fast to iterate accordingly.
One solution that I have seen is to have dedicated environments to deploy the feature branch to. The dev team, potentially the QA team or even customers might have access to it. In the case of customers it might be their own production environment.
The main drawback of this way is that it encourages the changes to be wide and deep. The branch will live long (and prosper) and although developers might make sure to merge the main branch (trunk in the rest of this post) there will be a seriously big patch to merge into trunk at the end.
Another, but similar, option is to keep the branch separate and not merge or deploy it until it’s complete. This not only brings the same issue as the previous way but also limit seriously the ability for QA and customers to test the feature early and often.
An improvement compared to those is to have a way to switch ON and OFF features with or without additional conditions around them.
Whichever stack your product is based on you can find libraries to do this or build your own (some times a saner approach). Let’s go through an overview of different levels of such switches.
Simple switch
The simplest approach is to have a split in the code, usually in the views, with a condition to display one side or another based on the value of an environment variable or a value stored in a database.
Here is a simple example within a haml partial.
%h3= author.name
%ul
- if ENV[‘PHONE_CONTACT’]
%li= author.phone
%li= author.email
Here we rely on the content of an environment variable to decide wether or not we display the attribute phone
for the author
object.
Switch with conditions
Moving up from this is to add a condition tied up to the feature, beside the simple “this is ON”. This will require to rely on a database of some sorts. This allows to have more precise control on who the feature is displayed to.
Usually this can be used to activate a feature for only one user, a group of users, users belonging to an organisation or coming from a specific part of the world.
This is really where it starts to get useful. Let’s see how that could look.
%h3= author.name
%ul
%li
- if FeatureFlag.active?(:phone_call, user_id: user_id)
= link_to "Call”, dial_call(author.phone)
- else
Your account doesn’t support phone calls yet.
Having feature switches storing track of feature activation within a database, allows to easily activate or deactivate features fast from a web or command line interface. There is no need to change the environment variables and restart the application server. This is much smoother for end users.
Refactoring and feature switches
When one is refactoring code it’s often useful to compare the performance or result of old and new code. It can also be useful to track how far a migration is.
One useful library to do this is Github’s Scientist. When paired with a feature switch library it allows to include performance and reliability to be measured. This is most practical for changes that are not just aesthetics, and are running deeper than just views.
Handling complexity
One thing I often hear when I suggest using feature switches to handle adding or refactoring big features is that it’d introduce a lot of complexity.
Alas, this is usually used as a reason to keep feature branches running for a long time.
This is where the team needs to decide what their culture is about this. In my opinion, splitting any big change in smaller chunks outweighs any complexity that might be added by using feature switches to isolate the changes.
This kind of usage takes time to get used to so it’s a good idea to start first with using feature switches for smaller changes and include metrics measurements when needed.
Libraries and implementation
Feature switching libraries exist for many languages and frameworks out there, so does Github’s Scientist and statsd clients. Some are simple and using Redis only, some support multiple databases.
Yet, it's also fairly easy to implement a feature switch, especially when the stack has an ORM included. After all, one just has to keep track of features and potentially the resources that will make the condition.
In a way, if there is no record for a feature associated with a resource then it can be decided that the feature is not active for that resource.
An active record example
Let's have a look at how we could implement a simple feature flag library relying on ActiveRecord.
The simplest implementation can just rely on checking if a record with a specific key (a feature name) is present.
class FeatureFlag < ApplicationRecord
def self.active?(key:)
FeatureFlag.where(key: key) != []
end
end
Then you can use a simple FeatureFlag.active?(:email)
within any view, controller action or methods in models.
What if we want to check if a feature is activated for a specific user ? Well, then your model would also need a user_id
column and an index that group both the key
and the user_id
columns for best performance. And you could then rewrite the active?
method in the following way.
class FeatureFlag < ApplicationRecord
def self.active?(key:, user_id: nil)
FeatureFlag.where(key: key).where(user_id: user_id) != []
end
end
Notice that we specify a default value of nil
for the user_id
parameter of the method. That way we can still use active?
as we did with just the key
or with both a key
and a user_id
value.
How then do you create feature flags with such an implementation ? Well, this is an ActiveRecord model so you can rely on the usual create
method in your trusty Rails console or through a back office controller action.
Let's say you have super admins users in the back office of your product. You can view details of each user or their company account. And for each you can have a little “feature flags” section listing the features that are activated for this user or the company. Buttons or links pointing to an update action in a feature_flags_controller
for example would then be in charge of toggling the feature on/off depending on the current value.
Toggling is just a way to say “if you find an entry then delete it, and if you don’t find an entry then create one”. Deleting an entry would deactivate the feature, creating an entry would activate it.
Metrics
Keeping track of some metrics related to a feature switch can be very useful to assess if your code change is worth it, if it improves a situation or if a task is done.
Performance metrics
Using Github’s Scientist for example one can compare and keep track of the performance of two or more implementations and compare them. Experiments tied to a feature switch can be activated, deactivated and scaled using that feature switch. Metrics can be sent to a statsd or Prometheus instance for monitoring.
But this can also be done with the first implementation of a new feature with or without Scientist.
In both cases this is great to ensure that your implementation is performing within a window you have defined.
Progress metrics
Some times a set of data needs to be migrated and we need to know how far that migration is along. Let’s take the case of migrating a session store.
You can’t really do a stop and start approach : the product needs to stay online without a period of downtime.
Thankfully a session store doesn’t really hold data for ever. There can be a window of time for which sessions are persisted and after that time has passed the session expires and disappears.
A solution of double read and one write can be used. The idea is, for every connection asking for its session back the backend will first try to read that session in the new session store and then go check the old one for the data. If the data is found in the old store it’s copied to the new store before being sent back in the response.
The next time a connection asks for that session it will be served from the new session store.
This behaviour can be triggered with a feature switch and tracked with a pair of simple metrics.
Every time a read to the old store is done a counter metric can be updated in statsd. And every time a read to the new store is done a different counter metric can be updated in statsd. In the same way, the number of writes to the new store can be tracked.
The first two metrics will allow the team to track the progression of the migration through the rise of the number of reads to the new store and the fall of the reads to the old store. Once there has been no more reads done in the old store the migration can be considered done.
Then the feature switch, the metrics and the old data store can be removed or cleaned up.
Conclusion
We all work to build and improve products. The way we do it does matter and often we need to push ourselves to do so in a good way. Not necessarily the best way but a good way.
I like to keep some important points in mind when I work on a product. Those points are like guarding rails that keep me going in the right direction. One of them is “keep your branches small”. I have done so many long branches when I was younger that I know it causes some big pains down the road. First at the review time, then when it comes to merging and then when you have to deal with the consequences.
If a change in the product seems to call for a long winded set of changes it probably means you are not looking at the problem the right way. Step back, and think about it again. This time aim for finding the first thing that need to change, what metrics can be tied to the change and where the feature switches can be placed. It can be in the middle of a view to replace a table with a list. It can be in the router to display a different view all together.
Get that reviewed, get that deployed, collect and analyse your metrics and do the next step.
Making software relies heavily in cutting big problems into ones small enough that we can solve them. Feature flags are a nice tool to have to do so and to potentially plug metrics around the questions involved with changing things or introducing new ones.