Some time ago, at a conference I met Dave, an Operations Lead from UK-based automotive company. We spent quite a while discussing their journey to cloud and DevOps transformation happening at the same time. The results were quite impressive as, within less than a year from siloed organisation with Devs and Ops barely talking each other, they became an organisation with a platform team delivering tools and autonomous, cross-functional DevOps teams owning business-oriented services. As a result,time-to-market went down, while the quality of software and stability of services went up. Using the terminology introduced in “State of DevOps 2018” report, I’d classify them as highly performing teams.
Nevertheless, what struck me most was the fact that when we talked about the beginning of this journey, I heard the following words:
“We knew that as Operations team we don’t have experience with the Cloud and DevOps, but at the same time we didn’t want to outsource this service to software house; we didn’t want to fire anyone. At the end of a day, we ended up hiring several contractors whose goal was to teach us everything.”
First, I entirely agree that releasing experienced staff with domain knowledge simply because they lack some technical skills is the worst thing one could do.
Secondly, there are several different DevOps working models (topologies) and they picked one of them quite intuitively.
When I was analysing our internal teams working on different projects and different clients over the last two years, I found this particular topology being followed relatively often, yet it’s not the only option. In their book entitled “Team Topologies”, Skelton and Pais identified almost ten different topologies. In this blog post I will focus only on three most common ones, and based on our experience, I will try to suggest when it’s recommended to choose them, and I will also point out the key benefits and risks.
How does it work?
In this model, sometimes called Fully Shared Ops Responsibilities or DevOps Managed Service, a product team (the Team) is fully responsible for a service they are creating. They have got a great deal of autonomy and a wide range of responsibilities. On the one hand, they pick frameworks, services and tools by themselves, but at the same time, they need to have broader competencies, and if they pick in a wrong way, they will bear the consequences. Gathering a minimal number of people with sufficient skills is a key challenge here, so let’s quickly recap what they will be responsible for:
Development – this covers the design of architecture, development of new features and bug fixing. It may happen that other product teams will implement changes in service owned by the Team, simply because they need it and it’s common in organisations with DevOps culture, but it always must be reviewed and accepted by the Team. Remember, they are fully responsible for this service. If service is down, nobody will accept explanation ‘it’s not my code that is broken‘.
Environment provisioning – the Team needs to make sure that they have the required infrastructure. They prepare and maintain tools to provide (and decommission) IT environments. What’s more, they decide if something requires a dedicated tool and process. Perhaps, it’s not a commonactivity and investing in automation doesn’t make sense? It’s up to them. If one can precisely assess what resources are required, (e.g. it’s going to be an application used by internal employees), then you can do it effectively even without cloud. Otherwise it’s simply too risky, or even impossible.
Build, package, release – the Team decides what a delivery process looks like, from coding to production. How do they integrate, what tests and when are being run, how security aspect is being addressed? Teams classified as elite performers can decrease time required to push new code to production to less than one hour. Since in this topology the Team owns the whole delivery chain, thesky is the limit, and it’s only the matter of competencies and priorities to become one of those super-efficient teams.
Monitoring & alerting –since they run it, they need information what’s going on in the production environment. The Team is responsible for the availability of the service, so it’s in their best interest to have an insight into what’s going on there and to be proactively informed about any malfunction.
When does it make sense to use it?
There are a lot of aspects that the Team has to manage, but once properly resourced, the Team can be highly effective. It works well when you need to quickly build and confront product ideas with the market; when your top priority is short time-to-market, or when you need to build Proof-of-Concept or a Pilot product. However, it may happen that once a business owner confirms that this new service is valuable, the Team will have to invest time in adopting the product and delivery process to the standards followed by the company.
This topology makes sense also if your organisation has just a few product teams and there is not much value in standardisation. After all if you have just a few bikes in your garage, you aren’t gonna spend money and time on sophisticated tools used by professional bike fleet servicing companies.
If you work in highly regulated branch and every product needs to be compliant from day one, then you need to pay special attention whether the Team is aware of those policies. It is absolutely doable, however this is done through education of the Team rather than formal delivery process or tools provided.
In a nutshell key challenge is resourcing and the Team needs to be very good at following “you aren’t gonna need it” rule, but inevitably they will build their own solutions and inhouse tools.
How it works?
In Team Topologies book this model is called Ops as Infrastructure-as-a-Service, however I will use term Mature Operations. This way we can emphasise the fact this model depends on true platform team and in many cases those were formed by former Ops/Infra teams.
In this topology the Team takes care of the whole service, similarly to Full DevOps, however in this case they can and should use tools provided by platform team.
Environment provisioning – let’s assume hosting is on cloud, so platform team can provide scripts (e.g. Terraform scripts) easing provisioning. Of course, the Team doesn’t have to use it and they can prepare everything by themselves from scratch, BUT if they do and there are issues with provisioned resources, the platform team will help to solve them. Otherwise they will be left alone. Additionally, in case of elements that need to be standardised (e.g. unified tagging of resources required for Cost Management activities) such tools give it for free.
Build, package, release – mature platform teams not only provide automation servers as a service but also pipelines as a code (Jenkins Pipelines, Azure Pipelines, Circle CI Pipelines to name a few). The Team must focus on how to effectively use them and continuously feedback platform team on what new features are required.
Monitoring & alerting – some companies, once new service is stabilised and amount of new features and active development is minimal, still transfer responsibility to support teams. In such case standardisation in Monitoring & alerting area makes transition process much simpler.
When it makes sense to use it?
This is a very good model if your company has many internal teams or cooperates with many vendors. Reused components provide standardisation, hence it’s easier for the client to move ownership for a delivered product. The same standardisation can help satisfy regulatory requirements.
The greatest risk is related to the level of cooperation between product teams and platform teams. Product teams should refrain as much as possible from building custom tools, but in order to be able to do so, platform teams must collect feedback and quickly implement the requested features. The lack of such a culture, multiplied by the fact that the product and platform teams are physically separate, is a recipe for failure.
How it works?
In this model the Team stands between Devs (Product teams) and Ops/Infra. Its main goal is to spread DevOps concepts through knowledge sharing, transparency and cooperation. They educate Ops and effectively push them toward Mature Operations model. They do prepare tools for product teams and, if necessary, they introduce standards and policies, but in a manner that allows product teams to still be effective. Let’s give a few examples:
Environment provisioning – the Team can extend and enrich everything that is provided by Ops and Infrastructure teams. A common challenge is that those tools aren’t provided in a self-service mode. Main goal of the Team is to remediate that. An example can be automation server (e.g. Jenkins) as a service, , but without any predefined pipelines. By being aware of the best practices, the team can create such pipelines and speed up the kick-off phase of each new product. Additionally, the Team can periodically review whether the environments and services created by product teams follow the best practices as well as the company’s rules.
Build, package, release – if Ops or Infrastructure team has any offering, that could be used by product teams but it’s not available as a self-service, the Team tries to mitigate that. It can be again automation server or static code analysis or security scans. One of our clients had a process when every release candidate was analysed by dedicated team and this usually took two weeks. The Team found out that this step can be effectively replaced by integrating Zed Attack Proxy and Trivy scans within Continuous Integration process and as such done in less than 30 minutes.
Monitoring & alerting – on the one hand the Team provides tools impacting delivery process that can take care of standardisation and as such ease monitoring (e.g. resources’ naming conventions, tagging, logging infrastructure, exposure of key metrics). On the other hand by working closely with Operations and educating them, the Team significantly simplifies Ops work and increases overall level of automation.
When it makes sense to use it?
If you need to optimize delivery processes and shorten time-to-market by introducing automation, such a Team will do the job. They will decide what is worth automating, (because it’s used by several teams), or what should remain manual work (e.g. a small new service that will be built in two weeks doesn’t need a full blown pipeline). However, in order to be efficient, they need time to understand the organisation’s challenges as well as its structure.
Thanks to having an insight into how their tools are being used, they can become either gate keepers or advocates. A Gate keeper – won’t allow to do something wrong; an Advocate – will educate what is good approach and what is a bad practice. DevOps paradigms strongly suggest the later, however let’s be realistic. Different organisations have different needs, so it’s very often a combination of both. How the Team will work depends on the company’s strategy around cooperation type with vendors. If vendor produces only a software and returns it to the client, then a Gate keeper makes more sense. If a vendor needs to produce a software and will be responsible for this service at least for a few months, then an Advocate makes more sense.
Another use case for this model is when you want to change your Operations team into a Platform team. A properly resourced team will not only have technical competencies required in order to implement the company’s strategy, but also will be very good at coaching and knowledge sharing. The same applies if you want to change the way your development teams work. If you want to move away from software delivery towards a situation when development teams will work in DevOps mode and be responsible for the service end-to-end, then this is your model as well.
Bear in mind that the Team working in DevOps Advocacy model may be limited in areas that are owned by operations/infrastructure team. To be able to optimise change they will need strong mandate from business stakeholders, but more importantly all sides must understand and follow main DevOps paradigms, namely knowledge sharing, transparency and collaboration.
Summary and Conclusions
This model allows you to introduce quickly a lot of knowledge and expertise to organisation without affecting its current structure and this is exactly what Dave and his company did. They didn’t ask the vendor to provide the whole Team, but instead they hired several contractors – the model is the same, though. After several months, once the Product teams and Operations learnt new techniques and tools and got used to the shared responsibility model, they smoothly moved to Mature Operations. Well done! Nevertheless, bear in mind that not every company can or should be aspiring to this particular model. What mostly determines best topology is the size of your company, level of regulations in your business domain and if and how you want to interact with vendors.