
Kuberhealthy is an operator for synthetic monitoring and continuous validation. It ships metrics to Prometheus and enables you to package your synthetic monitoring as Kubernetes manifests.
Kuberhealthy schedules, tracks, monitors, and manages Kubernetes pods for checks — which means your monitoring logic can:
graph TB
prometheus["Prometheus /metrics"]
browser["Web Interface"]
subgraph cluster["Kubernetes Cluster"]
svc["Kuberhealthy Service :80"]
crds["HealthCheck CRDs"]
controller["Kuberhealthy Controller"]
pod["Check Pod\n(api-smoke-test)"]
svc --> controller
controller -->|watches| crds
controller -->|schedules| pod
pod -->|"POST /check"| controller
end
prometheus --> svc
browser --> svc
Kuberhealthy provides the HealthCheck custom resource definition. Each HealthCheck tells Kuberhealthy to start a short-lived checker pod on a schedule.
The checker pod runs your validation logic, then reports back to Kuberhealthy. Results flow to the built-in status UI, JSON API, and Prometheus metrics endpoint.
Installing Kuberhealthy is easy. Just apply the kustomize, ArgoCD, or Helm manifests and you're ready to start applying healthcheck resources.
Helm (recommended)
sh
helm install kuberhealthy deploy/helm/kuberhealthy -n kuberhealthy --create-namespace
Kustomize
sh
kubectl apply -k github.com/kuberhealthy/kuberhealthy/deploy/kustomize/base?ref=main
ArgoCD
sh
kubectl apply -f deploy/argocd/kuberhealthy.yaml
sh
kubectl -n kuberhealthy port-forward svc/kuberhealthy 8080:80
http://localhost:8080 to see the status UI, then apply a HealthCheck or build your own.healthcheck CRD looks likeThis is the core object Kuberhealthy manages. It tells the controller what checker container to run, how often to run it, and how long to wait before considering the run failed. You can use kubectl get healthcheck or kubectl get hc.
This example uses the built-in deployment check, which creates a test deployment, rolls it out, and tears it down on a schedule:
apiVersion: kuberhealthy.github.io/v2
kind: HealthCheck
metadata:
name: deployment
namespace: kuberhealthy
spec:
runInterval: 10m
timeout: 5m
podSpec:
spec:
containers:
- name: deployment
image: docker.io/kuberhealthy/deployment-check:v0.1.1
env:
- name: CHECK_DEPLOYMENT_REPLICAS
value: "4"
- name: CHECK_DEPLOYMENT_ROLLING_UPDATE
value: "true"
resources:
requests:
cpu: 25m
memory: 15Mi
limits:
cpu: "1"
serviceAccountName: deployment-sa
See CHECK_CREATION.md for the environment variables injected into every check pod and how to build your own.
Apply your healthcheck check:
> kubectl apply -f api-smoke-test.yaml
healthcheck.kuberhealthy.github.io/api-smoke-test.yaml created
> kubectl get healthcheck
NAME NAMESPACE LAST RUN AGE OK
api-smoke-test kuberhealthy 2m ago 2d true
Consume Check Status with Prometheus:
/metrics (Prometheus)
kuberhealthy_check{check="api-smoke-test",namespace="kuberhealthy",status="1"} 1
kuberhealthy_check{check="db-connectivity",namespace="kuberhealthy",status="0"} 1
kuberhealthy_check_duration_seconds{check="api-smoke-test",namespace="kuberhealthy"} 0.23
kuberhealthy_check_success_total{check="api-smoke-test",namespace="kuberhealthy"} 142
Fetch the status by API:
/json
{
"ok": true,
"checks": {
"kuberhealthy/api-smoke-test": {
"ok": true,
"errors": [],
"lastRun": "2024-01-15T14:32:01Z",
"runDuration": "230ms"
}
}
}
Get started with CHECK_CREATION.md and the HealthCheck registry, then pick a check client for your language:
| Language | Client |
|---|---|
| Go | github.com/kuberhealthy/kuberhealthy/v3/pkg/checkclient |
| Python | kuberhealthy |
| TypeScript | @kuberhealthy/kuberhealthy |
| JavaScript | @kuberhealthy/kuberhealthy |
| Rust | kuberhealthy |
| Ruby | kuberhealthy |
| Java | Maven / Gradle |
| Bash | Shell script helper |
Example: Go check that validates an internal API
package main
import (
"fmt"
"net/http"
"github.com/kuberhealthy/kuberhealthy/v3/pkg/checkclient"
)
func main() {
resp, err := http.Get("http://my-api.default.svc.cluster.local/health")
if err != nil {
checkclient.ReportFailure([]string{fmt.Sprintf("request failed: %s", err)})
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
checkclient.ReportFailure([]string{fmt.Sprintf("unexpected status %d from /health", resp.StatusCode)})
return
}
checkclient.ReportSuccess()
}
The check client handles KH_REPORTING_URL, KH_RUN_UUID, and deadline enforcement automatically.
See the full documentation index in docs/README.md.
See docs/ADOPTERS.md for organizations running Kuberhealthy in production.
good first issue tag.#kuberhealthy in the Kubernetes Slack workspace$ claude mcp add kuberhealthy \
-- python -m otcore.mcp_server <graph>