Health checking gRPC servers on Kubernetes
Author: Ahmet Alp Balkan (Google)
Update (December 2021): Kubernetes now has built-in gRPC health probes starting in v1.23. To learn more, see Configure Liveness, Readiness and Startup Probes. This article was originally written about an external tool to achieve the same task.
gRPC is on its way to becoming the lingua franca for communication between cloud-native microservices. If you are deploying gRPC applications to Kubernetes today, you may be wondering about the best way to configure health checks. In this article, we will talk about grpc-health-probe, a Kubernetes-native way to health check gRPC apps.
If you're unfamiliar, Kubernetes health checks (liveness and readiness probes) is what's keeping your applications available while you're sleeping. They detect unresponsive pods, mark them unhealthy, and cause these pods to be restarted or rescheduled.
Kubernetes does not support gRPC health checks natively. This leaves the gRPC developers with the following three approaches when they deploy to Kubernetes:
- httpGet probe: Cannot be natively used with gRPC. You need to refactor your app to serve both gRPC and HTTP/1.1 protocols (on different port numbers).
- tcpSocket probe: Opening a socket to gRPC server is not meaningful, since it cannot read the response body.
- exec probe: This invokes a program in a container's ecosystem periodically. In the case of gRPC, this means you implement a health RPC yourself, then write and ship a client tool with your container.
Can we do better? Absolutely.
Introducing “grpc-health-probe”
To standardize the "exec probe" approach mentioned above, we need:
- a standard health check "protocol" that can be implemented in any gRPC server easily.
- a standard health check "tool" that can query the health protocol easily.
Thankfully, gRPC has a standard health checking protocol. It can be used easily from any language. Generated code and the utilities for setting the health status are shipped in nearly all language implementations of gRPC.
If you
implement
this health check protocol in your gRPC apps, you can then use a standard/common
tool to invoke this Check()
method to determine server status.
The next thing you need is the "standard tool", and it's the grpc-health-probe.
With this tool, you can use the same health check configuration in all your gRPC applications. This approach requires you to:
- Find the gRPC "health" module in your favorite language and start using it (example Go library).
- Ship the grpc_health_probe binary in your container.
- Configure Kubernetes "exec" probe to invoke the "grpc_health_probe" tool in the container.
In this case, executing "grpc_health_probe" will call your gRPC server over
localhost
, since they are in the same pod.
What's next
grpc-health-probe project is still in its early days and it needs your feedback. It supports a variety of features like communicating with TLS servers and configurable connection/RPC timeouts.
If you are running a gRPC server on Kubernetes today, try using the gRPC Health Protocol and try the grpc-health-probe in your deployments, and give feedback.
Further reading
- Protocol: GRPC Health Checking Protocol (health.proto)
- Documentation: Kubernetes liveness and readiness probes
- Article: Advanced Kubernetes Health Check Patterns