SLA-Driven Development

A service-level agreement is a useful method for setting expectations on deliverability, and incentivizing quick turnaround. It's also a great way to motivate a development team. SLAs encourage goal-setting and collective buy-in internally. They improve your users' perception of your software and the people who maintain it. Best of all, they provide concrete, measurable criteria to evaluate the success of your team at a larger-scale level than issue-by-issue, or the regressions introduced in a given release. Whether your business model is software as a service or software as a product (free products included), you can easily use the fundamental components of an SLA to reap positive benefits. In this post, I will describe some guidelines for implementing an SLA in your team.

But we don't have "clients". Why should we have an SLA?

SLAs are useful for everyone. They give your team an opportunity to sit down and decide what is a sensible turnaround for important issues with your software. And if you feel like you don't have any clients, think again; if your software is a product that fuels other business within your organization, you are your own client! This goes even more so for a client-oriented business -- you are your foremost client, since unsuccessfully serving external clients is a disservice to your own business.

You may already have a release process. Well, you should. You may already have a process for prioritizing issues. But an important component to introducing an SLA into the mix is the line of communication it creates. It gives you an opportunity to introduce a dialogue between all groups affected over what your team's priorities should be at any given time, what is a reasonable time for arriving at a solution, and what sanctions (if any) exist against failure or to promote success. This brings me to my first guideline:

Keep It Collaborative

Think of all of the people involved in your product: developers, managers, users, designers, QA engineers, etc. If you don't have an SLA, try this simple exercise: walk up to one of each of these different roles and ask them what a reasonable amount of time it should take to fix a bug that completely broke the ability to make money with your product. Now ask them what will happen if it takes longer than that to fix that bug. For as many people you ask this line of questioning, you'll get as many different answers.

With an SLA in place, everyone who interacts with the use or development of the product should be on the same page with respect to what the expectations are. If they're unfamiliar with the specifics, they will at least know they can reference a document and find out without asking around. When was the last time you got everyone in a room together, or looking over the same document, to agree on how long it should take to fix something that could make or break your business? Wouldn't you feel a bit safer if you knew everyone had an idea of how long was too long? And what if, perhaps, certain scenarios were incentivized for or against in advance? If for instance, you were to give developers a nominal bonus (say $25) any time a bug of the highest severity was addressed within the SLA (or perhaps even within a shorter time limit), would you feel more or less comfortable that the issue would be addressed in time to prevent harm to your business?

Now think about the different kinds of people involved in the software from another aspect: there are the users or clients, obviously; then there are those involved in implementing user requests -- technical or managerial. Obviously, in most organizations, a layer of customer- or client-facing individuals interface the need of the users, who generally want everything right away, with the group tasked with delivering what the users want -- preferably at a pace that doesn't keep them burning the midnight oil or rushing their work. Everyone needs an open line of communication when developing an SLA -- either as an agreement within a single organization or a contract between more than one. This is a method of achieving mutual understanding and reaching a compromise on what can be determined a reasonable turnaround. It also improves the likelihood that the process in the SLA will be followed, since fewer parties will feel that the agreement has been forced upon them, but rather reached through a negotiation they were involved in.

Keep It Granular

A well-written SLA should have a description of the various scenarios from the reporting of a bug, to its prioritization, all the way to the concrete turnaround time for delivery of its solution. This means you need at least three separate classifications for reported bug:

A high-severity issue
- Usually financially impacting, but can also be highly time-sensitive
- Needs a high turnaround time
- Generally requires a hot-fix or off-cycle release, and sometimes a patch
- These may require additional process in advance of the solution (such as reverting to a previously stable release) in order to ensure the product's integrity
A medium severity issue
- A broken, but not financially vital or time-sensitive functionality
- Should be addressed during the current in-development release
A low severity issue
- Minor UI issues, or an unexpected result in an edge case
- Not financially impacting in any way, still usable
- Can be prioritized to the backlog, to be addressed within an agreed-upon number of releases

If you have N levels of granularity, you should have N different turnaround times, and N different criteria. So make sure that you're not overcomplicating things, but also make sure that the scenarios you describe effectively capture the issues you encounter from the origination of the bug report to its resolution and incorporation into a stable release.

Keep It Reasonable

This really only applies to issues of a higher level of severity. Anything that can be prioritized for the current in-process releases is not that difficult to handle. If there's one positive thing the Agile methodology has brought to software development, it's the philosophy that an in-development feature with a threatened deadline has the option of changing deadline or changing scope. Don't be afraid to iterate more or delay a new feature if it means addressing a medium-severity issue of greater importance to the client or the organization. Make sure your SLA includes language describing the risk to meeting deadlines for agreed-upon feature changes or requests.

Different groups may have different expectations on what is a reasonable turnaround time to address an issue of a given severity level. However, in the words of Lil Wayne, numbers don't lie. Promising to deploy a hotfix within 4 hours of an initial bug report, if your average issue resolution time is 3 hours and time to prepare a release is a half hour, is a dangerous game to play; it will only set you up for failure.

It only takes one vexing bug to leave your team looking incompetent. It's better to under-promise and over-deliver here. Don't give yourself an excess of padding, but make sure that you've taken into account the worst-case development time for your highest-severity issues, and the time it might take to QA the solution and manage a release for it.

Since an issue of the highest severity is a stop-the-presses kind of issue, it's okay to entertain options here that may otherwise be outside of process -- for instance, patching the solution in advance of deploying a hotfix or off-cycle release -- in order to meet the SLA. You should include language that enumerates these options and explains the risk for each scenario. If a solution is found within the SLA, and the difference between on-time delivery is opting for one of the riskier scenarios or not, the parties involved should be again be informed of the choices. At this point, the SLA should be considered met, whether or not the riskier option is invoked.

Incentivize Your Team

An SLA isn't much more than a mission statement unless there is language that explains what happens if the terms are not met for a given issue. This portion of the document might differ between what's provided to, say, a developer compared to a client. For a client, it may involve future discounts or credits. For a developer, it may affect something like bonuses or vacation time. That's not to say that every time a developer does his or her job, you should be shelling out extra cash. This is really just an example of one way to put some spurs on your agreement.

For the good kind of developer, the kind you should hope to have in your organization, meeting the agreement and maintaining the integrity of the product and the company that builds it should be incentive enough. However, if you're in an organization that does performance bonuses, it may be worth using the SLA as an indicator of performance. This is a good way to introduce a level of transparency that can serve as a motivating factor. Another passing thought on this is perhaps to include language in the "developer version" of the SLA that comps time spent outside of an eight-hour work day to deliver a high-priority issue at a certain rate -- for instance, one hour of paid vacation for every four hours spent.

Personally, we don't do any of that for the developers in our group. None of this kind of stuff is going to buy a sense of ownership, but there are a lot of organizations that modulate their compensation based on performance, so I thought I'd give the option. My opinion is that paying a fair wage and making sure the people who do the brunt of the work have an opportunity to contribute to the process of developing the agreement should be a better motivator. Involvement builds ownership. A bonus program in the wrong hands can be executed poorly or arbitrarily, and is just as much a dividing factor as it is a motivating factor.

It's All About Integrity

A service-level agreement is, in the end, an opportunity to put some real concrete language around what constitutes scenarios that derive from the normal process of maintaining a software product. It also sets expectations for deliverability on issues of routine maintenance. By further defining how issues are evaluated, and agreeing on a reasonable turnaround time, you can set your team up for success -- both in the eyes of stakeholders, as well as in their own eyes. Another great thing about this exercise is that it will make developers feel protected from the whims of an emotional client. Doing this for your team will allow them a sense of breathing room, and the peace of mind to know that they are being looked out for in the grand scheme of the process of product development.

Think about it: you have an opportunity to turn "I need it here yesterday" to "I should expect this up and running in 12 (or 24, or 48) hours". You're protecting the integrity of your organization by preventing the attitude that deliverability should be instant. Without something in place to modulate this, you're already fighting a losing battle at the initiation of a bug report.

You have an opportunity to keep middling UI issues that a single stakeholder thinks is the end of the world from causing yet another late night in the office for no reason other than his own peace of mind. Expect it to be handled by noon tomorrow -- not midnight tonight. Remember that SLA you helped us develop? You're protecting the morale of your team, and thus its integrity as a working unit. There's no need to fatigue your employees with long hours and the fear that they may be kept late, so long as expectations are adequately set between all groups internally.

Every organization can use an SLA. It's the starting point at which all issues, and their solutions, are prioritized. It introduces transparency, and it gives everyone a bit of breathing room. And, executed effectively, it's probably the closest to collective bargaining we're going to see in the world of software development. So take advantage of it.