Archives for October 2013

How to design Performance SLAs that your Business will love

Service Level design discussions are one of the most interesting parts of designing a contract. When you find yourself in the midst of one, eventually someone asks your advice: “What Performance Service Level Standards does one see in the industry today? Do you have any Service Level Agreement Examples?”.

The search for best practices is tempting. Why reinvent the wheel, right?

But there are no silver bullets. Copying performance SLA definitions from the market never really helps you to truly bring out the value of your service. A good SLA for performance is one that is in tune with the nature of your business.

Flux Capacitor

photo credit: Stuck in Customs via photopin cc

What is the nature of your business?

You should start by asking the question: “What is the nature of your business? And what does performance mean in the context of your business?” Dissect and boil down the nature of the business to the effect it has on the IT landscape and consequently the IT service you are trying to measure.

Over time,  three distinct patterns of services have emerged from such discussions. While there could be a mix of these patterns in any given service, one of them usually dominates the others and forms the base of your Service Level Objective.

The three Patterns of Service

Reaction-Intensive Services:

  • An e-commerce website selling books online is an example of such a business, where the website along with its underlying applications and infrastructure should orchestrate together to deliver information within the desired reaction time.
  • If you were buying such a service as a Platform as a Service, you would define Performance SLA Management based on the reaction time for events like searching for books, retrieving information on a particular book, clicking “Send to Shopping Card” etc.
  • It would also make sense to further analyse whether your reaction intensive business is based on retrieving information efficiently or servicing interactive requests quickly.
  • Therefore a performance SLA for an interface to a credit risk Agency (retrieving information) would be designed differently from the e-commerce bookstore example that we have been describing.
  • Equally important is whether the service you are measuring has to show a consistent average reaction time or whether it is ok to have erratic response times during periods of high traffic. Such decisions can have a major impact on the price that you would require to pay for the service.

Volume-Intensive Services:

  • A batch-processing service like reconciling bank transactions is a typical volume-intensive business.Performance in such a business is measured very differently from a Reaction-intensive business.
  • In such a situation, you should study the patterns in the transactions: are there peaks and troughs in the volume demand, are there differences in the complexity of the processing logic for various types of transaction batches? What is the impact of these patterns on the performance expectations?
  • Answer these questions, and look for patterns. Study the make-or-break scenario. This will help you design the service levels that you need to keep the business running at average volumes, at peak demand and at low intensity periods.

Deadline-Intensive Services:

  • The third type of service is one that is fraught with deadlines and has to deliver on critical due dates.
  • A typical example is month-end and year-end processing software. Such software is expected to perform complicated calculations and handle large volumes of data at certain times of the year. These services simply have to work as expected at exactly these times of the year.
  • Such a service oscillates between low intensity periods and periodic peaks where the software and its infrastructure is firing on all six cylinders with punishing volumes of data.
  • Performance Service Level Calculation in case of such systems will have to take care of these two types of usages and the performance expectations during these two states.

Dissecting your service

Designing an SLA concept to measure and track performance of your service will start with what might sound like a navel gazing exercise.

Step 1: What is the nature of your business?
Analysing the nature of your business and the demands that this nature puts on the IT services will tell you what kind of performance SLAs you should be designing.

Step 2: What is the impact on your IT landscape?
You will also find that this impacts different parts of your IT landscape differently. If you are defining your service at such a component level, separate out the parts of your application landscape that are reaction-intensive from those that are volume-intensive.

Step 3: What is the granularity is your service? At what level are you measuring?
Analyse the granularity of the service that you are establishing. Check whether this service is delivered by an entire stack of components. Are you measuring the behaviour of a part of your application landscape? What are the expectations on this part (reaction/volume/deadline). Or are you measuring the performance of a business process (with a underlying interconnected components across the technical stack)? What is the nature of this business process (reaction/volume/deadline)? What does this imply for expectations in performance?

The above three steps should lead you in the right direction to create Performance SLAs that mirror your business needs.

There is no one-size-fits-all

As you can see, there are no one-size-fits-all Performance SLAs, and searching for best-practice Performance SLAs is futile. Instead, take the time to study the intricacies of the business you are serving and use the patterns above to design a Performance Service Level measurement that not only matches your business, but also closely demonstrates the value that your IT service is adding to the business.

You have often looked for the chance to demonstrate that your IT service contributes to business – here is your chance to put some skin in the game.

Do you have performance based SLA definitions today for your IT service? Do they truly mirror the nature of your business?

The Zen of Service Level Agreements: 4 Design Principles for an Optimal SLA

Designing Service Level Agreements is an art. It is important to invest time and energy into defining a good Service Level Agreement upfront in an engagement to avoid unnecessary friction downstream.

This post summarizes the four Design Principles of Optimal Service Level Agreements.

Pedestrian Bridge in B+W

Design Principle #1 – It is all about Emphasis

Bring focus and emphasis into the Service Level Agreement Metrics that you design. Don’t spread yourself thin and define multiple metrics across multiple dimensions. A sea of metrics will only lead to an average mediocre performance across the board. Here are three steps that will bring focus into your SLA design.

Design Principle #2: Do the Math:

You have just finished defining your Service Level Agreement metrics and targets. You are satisfied that you now have a true measure of your service and your outsourcing contract. However, if you have not done your math, this could backfire with unintended consequences. Here are five mathematical traps that can set your SLA definitions up for failure.

Design Principle #3: Design for Resilience

A contract lifetime of 3-5 years can feel long in fast-paced industries like telecommunications and media. Even brick-and-mortar industries could see a complete business cycle in this time, and encounter at least one downturn during the life of the contract. Situations like these change priorities and expectations. This begs the question: Can your Service Level Agreements and KPI target values adjust to these changes? Here are three factors to consider so that the SLAs adjust to the demands of the situations.

Design Principle #4: Bringing Balance

A balanced Service Level Agreement is a sign that you and your provider have a healthy working arrangement.Do you have an early warning system built into your Service Level Measurement? Have you balanced the risk? Here are three ways to balance the risk in your SLA design.

If you have designed your SLA metrics well, your laser focused attention on these metrics while managing your service can lead to step changes in the quality.

What has been your experience with your current SLA model? Which part of your Service Level Agreement will you redesign after going through the above four Design Principles?

photo credit: w4nd3rl0st (InspiredinDesMoines) via photopin cc

3 practical tips to help you balance your SLA

Designing an SLA system is a balancing act. The art lies in how you choose to balance the risk between two parties. A skewed SLA definition might look good at first, but can prove harmful to both parties on the long run.

 Zen Habit #4: Balance

Here are three tips to bring balance into your Service Level Agreement.

Do not dwell in the past; do not dream of the future, concentrate the mind on the present moment. Buddha

Walk the Risk Tightrope

  • Outsourcing is essentially the transfer of risk from one party to another in a way that the party that has the most skills to handle the risk through operational expertise and authority also has the responsibility for execution.
  • If this risk is imbalanced, you will end up either paying too much for the service, or you will not be able to attract a serious provider to sign up to these conditions. In either case, unbalanced risk is a losing option for both parties.
  • Therefore balance the SLA system in a way that you focus the Service level Measurement rigour on the more critical parts of the operation. Do not spread yourself thin across your service portfolio.
  • For less critical parts of the operation, use key performance indicators (KPIs) which act as a dashboard to the system but do not attract any penalties.

Risk/Reward – the Ying and Yang of SLAs

  • Balance Penalties with Reward – Encourage the right behavior more effectively with a carrot and stick approach, and not only with the stick.
  • Raise the performance bar on the critical parts of the operation by fixing robust SLA targets and ensuring them with penalties.
  • But equally reward the focus of the provider when they over-achieve.
  • This has two benefits:
    • This will focus the provider on the critical parts of the operation.
    • When your system also equally rewards over-performance, this gets factored  into the price of the service and will negate a risk premium.

Adopt the Football Foul Card approach

  • Adopt the yellow and red card approach from football into your Service Level Measurement System.
  • Build a “yellow card” – an early warning system into your SLA measurements.
  • When a provider is dangerously close to defaulting on a SLA target, an early warning in the reporting should alert both you and your provider.
  • In the “yellow card” dialog, don’t chastise your provider, but caution him.
  • Have a constructive dialog with your provider on an early warning sign – find out mutually what both parties can do to avoid a service level default (the red card). While a provider might feel the pain of a penalty on a default, you face a reputational risk that a penalty payment cannot remedy.
  • Decide on the remedial actions that both parties will take to make sure that the service returns to normalcy.
  • Lastly, set a date in the future where you will meet to see the effect of the remedial action and whether this has positively affected the service measurement.

A balanced Service Level Agreement is a sign that you and your provider have a healthy working arrangement. It is important to invest time and energy into defining this upfront in an engagement to avoid unnecessary friction downstream.

Do you have an early warning system built into your Service Level Measurement? Have you balanced the risk?

Other posts in the Service Level Management series:

SLA – Zen Habit #1: Emphasis
SLA – Zen Habit #2: Do the Math
SLA – Zen Habit #3: Design for Resilience

photo credit: uteart via photopin cc

Can your Service Level Agreement stand the test of time?

Application Maintenance Contracts are signed usually for three to five years at a time – can your Service Level Agreement withstand the demands throughout the life of  the contract?

A contract lifetime of 3-5 years can feel long in fast-paced industries like telecommunications and media. Even brick-and-mortar industries could see a complete business cycle in this time, and encounter at least one downturn during the life of the contract. Situations like these change priorities and expectations.

This begs the question: Can your Service Level Agreements and KPI target values adjust to these changes?

Contracts and, in turn, the Service Level Agreement need to be designed for resilience so that they adjust to the demands of the situations.

Bamboo is flexible, bending with the wind but never breaking, capable of adapting to any circumstance. It suggests resilience, meaning that we have the ability to bounce back even from the most difficult times – Ping Fu

SLAs and the SLA Management system need to be designed keeping not just the present, but also the possibilities of the future in mind.

Zen Habit #3: Design for Resilience

BXP0029013

Design for Flexibility

  • Define Service Level Agreement aspects for flexibility – think along the dimensions (time, speed, availability) of each construct and factor.
  • Priorities change over the time of a business contract due to the business up and downturn.
  • A sales system has different priorities and expectations when sales are high (in an upturn) – the entire emphasis is on the Availability of the system. Defining a critical SLA for Availability would return its value in spite of its higher cost of service.
  • On a downturn, there might be a possible need to tune down this SLA to “keep the lights on” for the system and save costs.
  • In summary: design your SLA system so that you can change the target values during the life of the contract. Negotiate the corresponding impact on price.

Raise the Service Level performance bar

  • A provider takes over the scope of the contract through a transition that usually lasts 3-6 months and then cuts over into steady state.
  • Multiple contracts are shaped so that the SLAs and KPIs are valid starting Day 1 of Steady state.
  • Sometimes it might be wiser to start with a lenient SLA target in the beginning as the provider is learning the ropes, and then raise the bar on the performance over the duration of the contract.
  • Latest by the end of Year 1, the provider should be hitting the KPI values that you have targeted.
  • As the provider is getting better and better over time, once can also tune the penalty system. This would reflect in higher penalties towards the latter half of the contract. Here are further tips for designing incentives and penalties.

Backtest your SLA target values

  • Before releasing your KPI target values, backtest them using the data over the last five years.
  • Did you face a downturn in your industry over the last five years? How did the priorities change? What were the demands on response time versus the cost of service?
  • How did the demand flare in the upturn? What were the new added demands on availability of systems?
  • How do the SLA targets that you have defined match the above values? Have you left room in the contract to adjust the values with a corresponding effect on price?

Bring it together – build a resilient system

  1. Take inspiration from best-in-class SLA and KPI definitions for your topic and industry – adopt a Service Level Agreement checklist
  2. Study the varying demands in the upturns and downturns of your industry.
  3. Design a Service Level Measurement system that can quickly adjust to the changing priorities in your business.
  4. Backtest your SLA and KPI target values with past data

 

Has your Service Level Management System reflected the changing priorities over the last five years? How have the demands of your business changed over time? Have you thought of what impact this could have on your SLA definition?

 

Other posts in the Service Level Management series:

SLA – Zen Habit #1: Emphasis

SLA – Zen Habit #2: Do the Math

 

photo credit: miyukiutada via photopin cc

 

5 ways to ensure your SLA metrics backfire

This is a continuation of my series of posts on the Zen of Service Level Agreement Metrics.

Zen Habit #2: Do the math

You have just finished defining your Service Level Agreement metrics and targets. You are satisfied that you now have a true measure of your service and your outsourcing contract. However, if you have not done your math, this could backfire with unintended consequences.

Gimnasia para el Cerebro

Here are five mathematical traps that can set your SLA definitions up for failure.

The Percentage SLA Trap

  • How did you define a Service Level Agreement metric target – is it a percentage or a number? Are you measuring Quality metrics or Performance metrics?
  • A percentage value for an SLA focuses the operation on ensuring acceptable behavior across a total number of events, while you could use a number value for an SLA to focus only on the outages.
  • Percentage value SLA metrics are fine for measuring overall performance, but you might want to use number value SLA metrics for critical events that can damage the reputation of your service. Therefore a very visible and critical service is better served by counting the maximum number of allowed outages per time period instead of only measuring a percentage of availability.
  • Your operation could face damage in reputation due to a critical outage that attracts attention. A percentage-based SLA metric can have the law of averages work against you.

The Expensive Mean SLA Trap

  • It sounds counter-intuitive: An SLA should not reflect the service you expect from the provider.
  • Mathematically, the expectation of a service is the statistical mean of the service – this means that on a normal Gaussian distribution of events, almost 50% of the events could be worse than the mean.
  • An SLA defined at the statistical mean is an example of what I call a “Set to Fail” SLA. A good working relationship cannot be established on such an SLA definition.
  • An ideal SLA should be set left to the mean to guarantee a minimum service for you – and your provider should find the level of risk acceptable.
  • You might be tempted to say “well, let the provider handle this risk” – but there is no free lunch. You will be unnecessarily paying the risk premium that your provider would calculate into the price.

The Law of Small Numbers Trap

  • The total number of events also plays an unwitting role – this is when the law of small numbers can work against what you want to achieve.
  • If the number of events in a time period is very small, the likelihood of an SLA breach increases.
  • If the likelihood of an SLA breach is high, then you have forced the provider to include the penalty payment into the price, which means you are paying higher than you need to for a service.
  • You are also discouraging a provider to keep an issue unresolved once an SLA breach occurs.
  • A provider who has taken a penalty payment against margin is likely to save the effort to resolve an issue on which he has been already penalized. You lose out in the end. This is also a “Set to Fail” SLA.
  • So if you are defining an SLA metric on infrequent occurrences, you have two options in your SLA Management – either you combine similar services to increase the sample size, or use the ladder principle: the sample size is directionally proportional to the leniency of the SLA metric.

Sizing for Statistical Significance Trap

  • The observation time for your SLA should allow statistical significance for the measure you have in mind.
  • If the time period is too short, then you are again approaching the “set-to-fail” danger.
  • Example: Lets take the example of an SLA that allows a maximum of 3 critical errors per work order in User Acceptance Test per quarter. This assumes that you have enough work orders to reach a statistical sample size. However if your operation does not have this volume of work orders, you reach the “set-to-fail” threshold – and you might want to increase the duration in the measure to six months to capture the sample size of work orders.

The Safety Net Trap

  • Be aware that if a provider pays penalties due to an unresolved event that causes an SLA breach, they are no longer motivated to resolve this event.
  • Resolving such an event means adding more effort and eating to the margin which has already been eroded.
  • Your SLA mechanism should catch all such breach-events and incentivize their resolution.
  • Software programming languages have exception handling as a language feature – you need “exception handling” in your SLA strategy as well.

Service Level measurement is all about ensuring that the mathematics encourage the right behaviour. Doing the math diligently will take care that well-intentioned Service Level Agreements actually achieve the desired effect.

Have you done the math on your SLA metric definitions? Is it encouraging the right behaviour?

photo credit: solofotones via photopin cc

The Zen of Service Level Agreement Metrics – Emphasis

A series of my posts over the next weeks will look at the art of designing service level agreement metrics and SLAs. And I will try to distill this art into some key practices that have worked for me over multiple contracts – sitting on both sides of the table. We start with today’s cup of tea: Emphasis.

Zen Habit #1: EMPHASIS

Unbrella

Less is more

Bring focus and emphasis into the Service Level Agreement Metrics that you design. Don’t spread yourself thin and define multiple metrics across multiple dimensions. A sea of metrics will only lead to an average mediocre performance across the board.

Picture yourself after you define service level agreements and the ensuing SLA Management:

  • Do you see yourself trying to follow through on all these metrics every month?
  • How will you answer the question “do these metrics depict the true nature of our business?” ?
  • How will you decide which SLA metrics to focus on?

Ask yourself:

  • What happens if your provider is not doing well on one metric but well on the other? Is that good? Is that the behaviour you want?
  • Are you sure that the SLA metrics are not correlated with each other?
  • Which of these metrics will you use to drive your service improvement?

Sometimes the hardest part is not letting go of SLAs, but starting all over

You might have measured reams of metrics every month over the past years. And now you are trying to focus on the few that will drive the performance of your service.

How will you decide which metrics to focus on in your new service definition? Start by studying the business that your IT serves – which measures serve them best? Get to the root of your SLA definition.

  • Is it speed of reaction? Then measure the speed at which your provider implements small development jobs. Measure Function-points / week.
  • Is it predictability of budget? Then measure how accurate your provider can estimate.
  • Is it reliability of service? ……..you get the drift.

Add the above to your Service Level Agreement Checklist. Be extremely picky and choose the SLAs which comprehensively mirror the style of your business. In this way, when you drop the non-essential SLAs, your business client still supports the key ones that you have chosen to keep.

Learn to let go, but stay on top of your SLAs

As you define your new SLAs and move towards a new managed service, move away from measuring the inputs of the service. Start measuring the outputs of the service in your Service Level Agreement. Don’t measure your managed service by the speed at which your service provider onboards a team member. Measure your provider by the speed at which they delivers, and the reliability as per budget planned.

And finally stay on top of what you have decided to measure – if you have chosen your SLA metrics well, your laser focused attention on these metrics can lead to step changes in the service you offer.

photo credit: ecstaticist via photopin cc

Wordpress SEO Plugin by Wordpress SEO Plugin