Some note on the talk “Fully Realizing the Microservices Vision with Service Mesh” by Arijit Mukherji of SignalFx at AWS re:Invent 2018 (DEV312)
Find the video at https://www.youtube.com/watch?v=eTHhsbKfpWg
A service mesh is an infrastructure layer for service to service to communication. It makes communication visible (observable), manageable and controllable.
To put it another way, a Service Mesh is a happy marriage between a proxy and a policy
How do microservices talk to each other?
- Directly: via IP address or domain name; similar to direct dialing someone on the phone
- Service discovery: when a service comes up, it registers itself; analogy is a telephone directory)
- Service mesh: Add a Layer7 proxy between the services. Analogy: You have an assistant who is making the call, and looking up the number, for you
Planes:
- Data plane: where the requests flow
- Control plane: A policy between the services that controls how the proxies are behaving; it controls how the communications are happening
Common uses cases of the service mesh:
1. Error handling
- Automatic retry on errors
- Circuit breaker
2. Load Balancing
- Proxy acts as a Layer 7 Load Balancer
- Every Microservice to Microservice can be load balanced without each service (and development team) needing to worry about it
3. Request routing
- Route requests in interesting ways (I’m guessing this could be things like don’t allow HTTP GET on /private”, canary type control or intelligent routing based on customer, or indeed your internal infrastructure, needs)
4. Security
- e.g. encrypt/decrypt data
How did we get here?
Lessons learned from networking…
1. Standard on a simple architecture
- The problem with having communication code as part of our app library (even as a library dependency) is that in a polyglot env, you need to rewrite that common in different languages
- With proxies and Service Meshes, we pull that code into an independent binary/service that can be independenlty pushed and configurable
2. Replace humans with bots
- Autogenerate and deploy policy driven configs
3. Adapt configuration based on feedback
- Continually observe the environment and react to it
- Typically, systems have desired and observed states, which are not always the same
- Use the discrepancies to report feedback. Feedback is critical.
Service mesh can assist in the following
A) Code deployments
- Fully automated deployments
- Many deployments still require manual involvement.
- Can we automate the entire workflow?
- Specify in config how the deployment should go
e.g.
1. Deploy v2
2. Route x% to v2 (canary)
3. Monitor functionality, perofrmance, errors, compring v1 to v2
4. Rollback or continue
B) Runtime behavior optimization
- Handle errors, retries, timeouts (already discussed)
- Circuit breaking
If you have n instances of a serice, and x are not working well (e.g., failing, slow) then since the proxies are sending requests on your behalf, proxies can stop routing to those impaired instances - Cost and performance optimization
If you are in a service mesh environment,
– For costs savings: select instances that are closer (from a network point of view e.g. in same region or AZ) may save network costs (???)
– For performance optimization, you might route to those that are responding fastest.
In other words, you can change behavior on the fly.
C) Testing – Chaos Engineering
Service mesh can aid choas engineering
- Error simulation
It can assist simulate errors such as errors to a specific host, x % of all requests fail, errors on a particular customer or region - Automate chaos experiments
D) Service mesh and monitoring
Service mesh will
1. increase breadth of coverage
- Fix spotty adoption through auto-instrumentation of ALL communication
- Unified vendor-agnostic target for all telemetry (Metrics, Logs, and traces)
2. Establish a high bar on quality
And consolidate monitoring
3. Provide feedback