You know how it goes – you’ve got your project running, a backlog is set up, stories completed, and then the client asks you to change direction. For example, they want you to modify the communication between microservices from asynchronous to synchronous because it may reduce management costs.
Well… You can react in many ways. You can, for one, see it as an opportunity to learn something new. This is how I got to know the service mesh.
What is service mesh?
Service mesh is a configurable infrastructure layer for applications based on microservice architecture. Its aim is to improve internal communication between microservices.
There are two leading products on the market that deal with service mesh:
- Istio – hosted only inside a Kubernetes cluster (I worked with this one).
- Linkerd – hosted in Kubernetes and outside of it.
What’s the main purpose of service mesh?
Using as few words as possible: service mesh makes dealing with microservice architecture easier. The microservice architecture assumes that our system consists of multiple loosely-coupled services. Those services need to communicate with each other, and every architect during designing a system should consider the following subjects:
- Service discovery
- Inter-services communication failure strategy (i.e. retries)
- Load balancing
- Monitoring and logging
The service mesh allows you to design and implement those features in a microservice world much easier. The above subjects are cross-cutting, so they should be the same for every microservice. Of course, sometimes there are some exceptions, but in most projects these subjects appear.
Additionally, if the system consists of multiple services and these services can have multiple versions – it’s hard to manage and monitor such environment. Products that provide service mesh implementation can help with this problem as well. Service mesh can even span across multiple environments. We can get one control panel where we can configure communication rules between microservices, and some other tools described further in the document that may be used to monitor and trace our application and cluster.
How does service mesh work?
I’ll focus on Istio here, since it’s the solution I’ve worked with.
Istio implements service mesh architecture using a sidecar proxy pattern. It means that Istio injects a sidecar proxy for every service and the entire communication between microservices always goes through the sidecar proxies.
The sidecar proxy in Istio is represented by an extended version of the Envoy proxy. The main responsibilities of the sidecar proxy are listed below:
- Dynamic service discovery
- Load balancing
- TLS termination
- HTTP/2 and gRPC proxies
- Circuit breakers
- Health checks
- Staged rollouts with %-based traffic split
- Fault injection
- Rich metrics
To do the above tasks, the proxy needs to be constantly connected with a control plane, which provides a configuration and current statistics, based on which Envoy proxy can make decisions. Additionally, there are the following features which may be configured:
- Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
- Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection.
- A pluggable policy layer and configuration API supporting access controls, rate limits and quotas.
- Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.
- Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.
More details about how Istio works are here: https://Istio.io/docs/concepts/what-is-Istio/
It’s important to understand that Istio is a group of connected products. Those products create a service mesh ecosystem. During installing an Istio solution you may choose to use the following tools: Prometheus, Jaeger (tracing), Grafana, Kiali and much more.
By default, Istio when deployed into Kubernetes cluster contains a default configuration that allows you to work without an additional cost. There is possibility to change a default configuration to gain a better customization of the functionalities, it requires to: configure Istio components and mark your Kubernetes components using labels that you described in a cluster config. Based on it, a sidecar proxy is created and knows which configuration to retrieve from a control plane component.
When should you use service mesh?
When you need to create a microservice application with synchronous communication.
Currently (June 2019), there is an extremely limited support for asynchronous communication. Gwen Shapira from Confluent made a great video presenting ways how to connect Kafka with a service mesh, but all of the presented ways should be treated as a workaround.
Currently, the community is working on adding support for the AMQP protocol.
Service mesh pros and cons
- If you finally configure Istio, you will get a nice ecosystem of tools that you can use to configure, manage and monitor your cluster. It’s awesome.
- You don’t need to implement the following capabilities from scratch (it is much easier to design and introduce those items):
- Logging and monitoring of the application
- Logging and monitoring of the cluster
- Retries policy
- Load balancing
- Service discovery
- Circuit breakers
- It may decrease the time required to setup some common things in a distributed system. Additionally, these subjects are not solved by us every day, so we can refer to some initial good practices.
- Settings are not stored in a microservice config file, but they are defined in a control panel that exists in a Kubernetes cluster. If you want to change some routing rules, you do it without deploying the application again.
- The service mesh is an immature technology, so if you decide to use it you will be struggling with the following problems:
- Documentation is often outdated and not synchronized between projects that create Istio ecosystem (June 2019).
- Lack of articles describing how to do something – you need to sit down, introduce changes and check what happens.
- Open issues on GitHub. You have to read open issues to gain knowledge how to do something.
- Mismatch of versions between products. In our case, Istio worked perfectly but other tools like Jaeger didn’t. We had to upgrade Istio and other tools in this ecosystem to solve issues with Jaeger. To limit this issue is better to deploy Istio using Helm Charts.
- If an Istio instance uses other tools (Jaeger, Prometheus, Grafana, Kiali) and some of them have issues related to their immaturity… Well, no one wants to be in that situation.
- Istio may be configured easily with default settings in Kubernetes, because there are many sources that describe this. It may be enough for a development environment but some higher environments like test would require custom configuration. If there is no experienced person in the project, this configuration may be time consuming.
Giants like Google, IBM, and Lyft are involved in the creation of Istio. They also are planning to migrate to this product (as well as Netflix and other large and small companies). The solution may still be immature but it finds its uses already, so given the size of the players that plan to use it on a day-to-day basis, it’s definitely worth giving a try.