Akshat's Blog

I share my perception of Life, the Universe and Everything here.

Load Balancing in gRPC for Frontend Backend Communication Part:1

| Comments

We needed a strategy for load balancing incoming requests from our user-facing applications (android and ios) for communication in gRPC. The reasons for choosing a strategy is mentioned here. Implementation details can be found in Part:2

Need for gRPC

I work at Go-Jek as a product engineer. Here we have multiple use-cases wherein gRPC would make more sense compared to the traditional REST api for communication between frontend-backend. Examples: Customer application live tracking a driver’s location, recording location of a driver, chat sessions, etc. We are building a small feature which can even be implemented using REST api. But we decided to use this as an opportunity to explore gRPC.1

Choosing a load balancing(LB) strategy

I started from the official load balancing blog

First decision

The first decision was choosing between proxy or client side LB.

In Proxy LB, the clients themselves do not know about the backend servers. Clients can be untrusted. This architecture is typically used for user facing services where clients from open internet can connect to servers in a data center. In this scenario, clients make requests to LB. The LB passes on the request to one of the backends, and the backends report load to LB.

In Client side LB, the client is aware of multiple backend servers and chooses one to use for each RPC. The client gets load reports from backend servers and the client implements the LB algorithms. In simpler configurations server load is not considered and client can just round-robin between available servers.

This was an easy choice. I chose Proxy side LB

Here are cons in Client side LB

  • Have to implement same logic in multiple languages
  • Client cannot be trusted
  • Client has to take care of service discovery

For sake of balance, here are cons for Proxy side LB

  • LB is in the data path
  • Higher latency
  • LB throughput may limit scalability

Sidenote: Client Side Lookaside LB may seem promising but still client will have a lot of logic which could easily be put into the LB.

Second decision

Now comes a very difficult choice: Transport level LB or Application level LB

I chose Transport level LB because:

  • RPC load doesn’t vary a lot among connections (atleast for the current feature)
  • Latency is paramount. (Although for a frontend backend communication, I am sure a difference in a few milliseconds at LB level won’t matter)
  • Far more complex code base for Application level LB

Some cons of Application level LB are:

  • Difficult to scale
  • Requires very high performance hardware. (With the maturity of infrastructure at Go-Jek this is not an issue really, but is omttwa😅)

One major reason for choosing Transport level LB is that Application level LB would have been over-kill for us. For the current feature implementation, we don’t need session stickiness which is one of the major reasons people choose Application level LB

These blogs helped me weigh pros-cons of Transport level LB vs Application level LB 2

Third decision

The highest amount of yak-shaving went in this decision. I knew in my heart all along that I’d use envoyproxy as a LB. To be thorough, I went through a whole lot of blogs3 just to be completely sure.

I won’t mention the reasons for not choosing HAProxy or NGINX. Envoyproxy wins hands-down because it fit the following use-cases we needed:

  • Our frontend application can switch to HTTP/1.1 based REST apis anytime it wants😍 Envoy supports a gRPC bridge filter that allows gRPC requests to be sent to Envoy over HTTP/1.1. Envoy then translates the requests to HTTP/2 for transport to the target server. The response is translated back to HTTP/1.14
  • If we decide to switch to Application level LB, only envoy has support for that in HTTP/2.
  • Future of Envoy is promising5
  • Has first class support for HTTP/2
  • Hot restarts
  • Multithreaded architecture
  • Service discovery. (Go-Jek is rapidly moving towards an architecture where service mesh is inevitable)

In Part 2, I will illustrate a LB implemented using envoyproxy. WIP can be found here: github link

I hope this blog was helpful 😀. Please leave your thoughts in the comments section below. You can reach out to me on email: akshat.iitj⚽gmail.com