When does redundancy become redundant?
Tl;dr
Redundancy is needed for a team to be productive, but there is such a thing as too much.
Instaface
Picture a software development team with three members - Alice, Bob, and Carol. All three of these team members are software developers, so they write code and build features for the team’s product, a hot new social media site called Instaface. As well as their usual software development work, each team member has some other responsibility. Alice is the team’s leader, the creator and owner of Instaface. Bob is a quality engineer, and he tests the output of the team’s work, making sure that Instaface remains a great product. Carol is the team’s infrastructure guru, she knows what is under the hood and what keeps Instaface ticking over. Note that we are assuming that these additional roles are genuinely above and beyond what is usually expected for a software engineer - Alice, Bob, and Carol are experts and owners of their domains.
These extra responsibilities are not just for show, or to pad CVs. These are vital parts of the software development journey. If Bob was not testing new Instaface features, the team would not catch mistakes in their code, so the team’s first sign of things going wrong would be users making complaints or deleting their Instaface accounts. If Carol did not maintain and monitor the team’s servers, then one out of date dependancy could lead to a security issue which brings the app to its knees. If Alice wasn’t providing some vision and guidance on what the product should be, then Instaface would scare away potential users. These are necessary roles, and the team simply could not go without them.
So, what happens if one of these roles becomes unavailable? What happens when Alice, Bob, or Carol need to go to hospital, or have a holiday booked, or (as the traditional framing of this question puts it) are hit by a bus? We know that we cannot just ignore these duties, but it is probably not a good idea to wait for the whole team to be back together before making progress, as they may end up waiting a while. To make sure that the team does not grind to a halt on every absence, they might double up on some of these responsibilities. Alice can take on some infrastructure responsibilities, Bob can make some decisions on product, Carol can test new features. Now the team is twice as resilient - the single absence which once ground production to a halt is now two absences. Our bus factor has gone from one to two.
This is not the end of the story though. The team are not completely safe. What if there are two absences? In this case, one of our critical roles cannot be filled. What do they do now? Well, they could triple up on these roles, so we end up with 3 product owners, 3 testers, and 3 infrastructure engineers, meaning that two members of the team could be absent and production does not need to stop. This sounds ok, but let’s stop and think. If two members of the team are absent, then only 33% of the team are available. Less than half. Is this a situation which we want to prepare for? Does the Instaface team want to accept a situation where only one member of staff is available as a genuine possibility? Perhaps they do, but would the same be true if the team grew to 10 people? 20? 100? I would argue that it is not reasonable for a large team to prepare for a situation where most of its members are going to be unavailable for a prolonged period of time.
As well as this situation not quite feeling right, there is a cost associated with this preparation. To properly train an engineer to be a QE, or a product owner, or whatever else, they need time out of their day to day, a training course, quarterly refreshers. Very quickly we are in a position where each person who is covering some responsibility requires thousands of pounds of investment. In a large team (or indeed, a large institution), this is not reasonable at all. If Instaface becomes a team of 100 people, then they would not prepare for a situation where only one of these people is keeping all of the lights on. We need to ask ourselves, where should the lines be drawn? What extent of preparation is reasonable, and what can we not justify?
The numbers
Coverage and impact
There are a few elements at play here. Let’s try to catch the essence of each of them and see what impact it has. Firstly, we have a few roles in our team which need to be done. It is not the case that each of these roles are either filled or not filled. In a development team, each of these roles will be covered by a group of people. With this in mind, we can say that the amount that a role is ‘covered’ at a given point in time can be measured as a proportion of ‘full’. We can denote this proportion as .
As we have established, coverage is not the full picture. If one of these roles is completely unavailable, then the team cannot work. We can describe this effect by talking about the impact of the role. If a team usually has 5 people carrying out testing duties, and 1 of them is unavailable, then the impact is not catastrophic. If 4 of them are unavailable, this is more of an issue. We can capture this phenomenon by introducing a new value - the impact of the role, based on the proportion of coverage. As a first stab at defining this value, let us write
Absence
Any member of a team will take some time off at some point. There are cultural differences across the world here, so this may cause more of an issue to an American team than it would to a European one. Despite this, the point stands - people are going to be unavailable at some point, the question is, how often? A sensible way of describing this is to pose the question as: “for any member of the team, what is the probability that they will be available on a given day?”. Here in the UK, 34 days off per year seems a reasonable guess. We can compare this against the number of non-weekend days in a year, , to find that the probability of a member of the team being available on a given day is about .
Costs
The costs associated with providing additional roles in a team can change a lot, and is open to interpretation. As mentioned previously, there are training costs and days off during training to be considered. These are not all though - once a software developer is also a QE, they would be justified in arguing that they are now worth more. Management take some convincing on this point, as to them the developer is simply doing what is required by the job, but the outside world does not see it in this light. To Instaface, Carol is the software engineer who also manages the team’s servers, but to Grambook, she is a software engineer and a infrastructure engineer. If that leads to a job offer which is £10’000 above her current rate, then it is likely that Instaface need to either match that, or find a replacement (which may cost them even more).
With this in mind, we need to decide on a cost for each of our developers to become experts/owners for each different role which they are taking on. This is a tricky number to put a finger on. For the sake of my own simplicity, I am going to continue with my estimate of £10’000 per additional role. These numbers should be specifically chosen to fit the team, and the role in question. That is, we want a new parameter for each role , which includes salary uplift, training, lost productivity, and management overhead.
Putting it all together
Now that we have defined our parameters, we can see how they fit together. Firstly, we need a baseline of value provided by the team, meaning some value which we say the team provides in a year when we assume that everything is working fine. The impact of each role which we talked about previously has a multiplicative effect on the value. Giving us a new baseline of
Where is our set of roles.
To get an idea of this value , we can unpack what it represents. We want to find an estimate of the proportion of coverage for a given role on a random day. We want to define this estimate in terms of:
- Our availability constant,
- The number of team members who are filling the role, say
- A value to represent the demand of the role on a yearly basis,
We can think of this demand as the number of people we would need to have working on the role to ensure that all tasks are completed.
We are assuming here that the absences are completely independent of each other. This is probably not the case. For example, if someone has had annual leave approved on a given day, their colleagues covering the same role will be slightly less likely to be approved at the same time. Some more care could be applied to defining the value of . For our assumptions, it just becomes
As a final addition to our coverage, we can notice that having more than the required number of people assigned to a certain role adds no value (it would probably do the opposite). As such, we can cap our value for at 1.
Finally, we look at our costs. This as an additive effect on our value, to the tune of our cost estimate multiplied by . We can finally define our ‘redundancy adjusted value’, or RAV:
So what?
How do we interpret this value now that we have it? Well, to answer that question, let’s look at some examples. Firstly, let’s visit our friends back at Instaface
Example 1 - Instaface
We need to make some assumptions about how Instaface are doing. Firstly, let’s assume that their app is still quite small, but that it is turning a decent profit. Let’s put their annual profit at £200’000, so . We mentioned previously that the team have found that ideally they would have 2 people covering each of our roles, so for each , . For our final assumption, let’s say stick with the aforementioned annual cost of £10’000 for each role taken on by a member of the team. In the case that Alice, Bob, and Carol all stick with their individual areas of expertise, the redundancy adjusted value becomes:
So having no redundancy for any of the roles turns the team’s £200’000 profit into one of £31’536. Quite the loss.
If, instead, the team doubled up on these roles as we said previously, what we end up with is:
So adding this redundancy maintains a much greater amount of the team’s value.
Example 2 - Grambook
Let’s look at a bigger organisation, Grambook. We slightly reframe what it is that we are looking at in this example. Rather than looking at sharing roles among members in a software development team, in this example we look at how policies should be set at a corporate level. The underlying point still remains though - the question we are asking is how many developers should carry out additional tasks.
Let’s suppose that Grambook is an organisation which generates £10 billion each year, tends to demand the work of 500 product owners, 2000 infrastructure engineers, and 1500 QEs, and generally finds that each of these roles tend to cost £20’000, £35,000, and £15’000 per year respectively, on top of normal salaries.
If Grambook did not consider their redundancy at all, they would see the demand that each area has, and assign 500 developers to be product owners, 2000 to be infrastructure engineers, and 1500 as QEs. In this case, our RAV becomes:
Now taking some redundancy into account, Grambook might decide to assign slightly more of each role - 600 product owners, 2200 infrastructure engineers, and 2000 QEs. To save chalk I will not show the calculation, but the RAV in this case is £9’580’000’000, nearly £500 million extra by taking on more.
If, instead, Grambook wanted to really make sure that there would always be someone there to cover all QE, infra, and product owner roles, they might double the number of each of them - assigning 1000 POs, 4000 infrastructure engineers, and 3000 QEs. In this case, the RAV would be £9’390’000’000. Yes, this is a better situation than having no redundancy at all, but the returns are diminishing. Grambook have lost around £200 million by being too cautious.
Conclusion
What is the take-home then? How does this change our ways of working? For one, I would like to suggest that the model I have put forward has legs. Sure, improvements could be made, and the constants which we define are important, but these are just the details. At heart we have a tool here which will let us organise our teams better. That is good. Distributing work is always going to be a balancing act, you might as well give yourself an upper hand.
Comments
Loading comments...