MCPcopy
hub / github.com/asobti/kube-monkey

github.com/asobti/kube-monkey @v0.5.3 sqlite

repository ↗ · DeepWiki ↗ · release v0.5.3 ↗
276 symbols 1,048 edges 36 files 62 documented · 22%
README

Build Go Report Card License Docker Pulls Artifact Hub

kube-monkey is an implementation of Netflix's Chaos Monkey for Kubernetes clusters. It randomly deletes Kubernetes (k8s) pods in the cluster encouraging and validating the development of failure-resilient services.

Join us at #kube-monkey on Kubernetes Slack.


kube-monkey runs at a pre-configured hour (run_hour, defaults to 8 am) on weekdays, and builds a schedule of deployments that will face a random Pod death sometime during the same day. The time-range during the day when the random pod Death might occur is configurable and defaults to 10 am to 4 pm.

kube-monkey can be configured with a list of namespaces * to blacklist (any deployments within a blacklisted namespace will not be touched)

To disable the blacklist provide [""] in the blacklisted_namespaces config.param.

Opting-In to Chaos

kube-monkey works on an opt-in model and will only schedule terminations for Kubernetes (k8s) apps that have explicitly agreed to have their pods terminated by kube-monkey.

Opt-in is done by setting the following labels on a k8s app:

kube-monkey/enabled: Set to "enabled" to opt-in to kube-monkey
kube-monkey/mtbf: Mean time between failure (in days). For example, if set to "3", the k8s app can expect to have a Pod killed approximately every third weekday.
kube-monkey/identifier: A unique identifier for the k8s apps. This is used to identify the pods that belong to a k8s app as Pods inherit labels from their k8s app. So, if kube-monkey detects that app foo has enrolled to be a victim, kube-monkey will look for all pods that have the label kube-monkey/identifier: foo to determine which pods are candidates for killing. The recommendation is to set this value to be the same as the app's name.
kube-monkey/kill-mode: Default behavior is for kube-monkey to kill only ONE pod of your app. You can override this behavior by setting the value to: * kill-all if you want kube-monkey to kill ALL of your pods regardless of status (including not ready and not running pods). Does not require kill-value. Use this label carefully. * fixed if you want to kill a specific number of running pods with kill-value. If you overspecify, it will kill all running pods and issue a warning. * random-max-percent to specify a maximum % with kill-value that can be killed. At the scheduled time, a uniform random specified % of the running pods will be terminated. * fixed-percent to specify a fixed % with kill-value that can be killed. At the scheduled time, a specified fixed % of the running pods will be terminated.

kube-monkey/kill-value: Specify value for kill-mode * if fixed, provide an integer of pods to kill * if random-max-percent, provide a number from 0-100 to specify the max % of pods kube-monkey can kill * if fixed-percent, provide a number from 0-100 to specify the % of pods to kill

Example of opted-in Deployment killing one pod per purge

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: monkey-victim
  namespace: app-namespace
spec:
  template:
    metadata:
      labels:
        kube-monkey/enabled: enabled
        kube-monkey/identifier: monkey-victim
        kube-monkey/mtbf: '2'
        kube-monkey/kill-mode: "fixed"
        kube-monkey/kill-value: '1'
[... omitted ...]

For newer versions of kubernetes you may need to add the labels to the k8s app metadata as well.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: monkey-victim
  namespace: app-namespace
  labels:
    kube-monkey/enabled: enabled
    kube-monkey/identifier: monkey-victim
    kube-monkey/mtbf: '2'
    kube-monkey/kill-mode: "fixed"
    kube-monkey/kill-value: '1'
spec:
  template:
    metadata:
      labels:
        kube-monkey/enabled: enabled
        kube-monkey/identifier: monkey-victim
[... omitted ...]

Overriding the apiserver

Use cases:

  • Since client-go does not support cluster dns explicitly with a // TODO: switch to using cluster DNS. note in the code, you may need to override the apiserver.
  • If you are running an unauthenticated system, you may need to force the http apiserver endpoint.

To override the apiserver specify in the config.toml file

[kubernetes]
host="https://your-apiserver-url.com:apiport"

How kube-monkey works

Scheduling time

Scheduling happens once a day on Weekdays - this is when a schedule for terminations for the current day is generated. During scheduling, kube-monkey will:
1. Generate a list of eligible k8s apps (k8s apps that have opted-in and are not blacklisted, if specified, and are whitelisted, if specified) 2. For each eligible k8s app, flip a biased coin (bias determined by kube-monkey/mtbf) to determine if a pod for that k8s app should be killed today 3. For each victim, calculate a random time when a pod will be killed

Termination time

This is the randomly generated time during the day when a victim k8s app will have a pod killed. At termination time, kube-monkey will: 1. Check if the k8s app is still eligible (has not opted-out or been blacklisted or removed from the whitelist since scheduling) 2. Check if the k8s app has updated kill-mode and kill-value 3. Depending on kill-mode and kill-value, execute pods

Docker Images

Docker images for kube-monkey can be found at DockerHub

Building

Clone the repository and build the container.

go get github.com/asobti/kube-monkey
cd $GOPATH/src/github.com/asobti/kube-monkey
make build
make container

Configuring

kube-monkey is configured by environment variables or a toml file placed at /etc/kube-monkey/config.toml and expects the configmap to exist before the kube-monkey deployment.

Configuration keys and descriptions can be found in config/param/param.go

Example config.toml file

[kubemonkey]
dry_run = true                           # Terminations are only logged
run_hour = 8                             # Run scheduling at 8am on weekdays
start_hour = 10                          # Don't schedule any pod deaths before 10am
end_hour = 16                            # Don't schedule any pod deaths after 4pm
blacklisted_namespaces = ["kube-system"] # Critical apps live here
time_zone = "America/New_York"           # Set tzdata timezone example. Note the field is time_zone not timezone

Example environment variables

KUBEMONKEY_DRY_RUN=true
KUBEMONKEY_RUN_HOUR=8
KUBEMONKEY_START_HOUR=10
KUBEMONKEY_END_HOUR=16
KUBEMONKEY_BLACKLISTED_NAMESPACES=kube-system
KUBEMONKEY_TIME_ZONE=America/New_York

Example Config to test kube-monkey works by enabling debug mode

Note: this will keep attacking pods every 60s regardless of what you configured for the startHour and endHour.

[debug]
enabled= true
schedule_immediate_kill= true

Notifications

Kube-monkey supports notifications and can notify an endpoint of your choice after an attack. It can be a Slack webhook or a custom API.

Example Config for posting attack notifications to an HTTP endpoint

[notifications]
  enabled = true
  reportSchedule = true
  [notifications.attacks]
    endpoint = "http://url1"
    message = "message1"
    headers = ["header1Key:header1Value","header2Key:header2/Value"]

Placeholders

The message supports the following placeholders: * {$name}: victim's name * {$kind}: victim's kind * {$namespace}: victim's namespace * {$timestamp}: attack's time from Unix epoch in milliseconds * {$time}: attack's time * {$date}: attack's date * {$error}: result's error, if any * {$kubemonkeyid}: kube-monkey id (set using KUBE_MONKEY_ID env variable otherwise empty)

  message: '{
            "what": "Kube-monkey(${kubemonkeyid}) attack of {$name} in {$namespace}",
            "who": "{$name}",
            "when": {$timestamp}
           }'

The header supports a special placeholder to retrieve the value of an environment variable. This is useful when calling an API that has a protected endpoint. A typical scenario will be to pass an API token to the Kube-monkey container, this token is stored in a Kubernetes Secret and you want to pass it via an environment variable.

headers = ["api-key:{$env:API_TOKEN}", "Content-Type:application/json"]

{$env:API_TOKEN} will be replaced by the environment variable API_TOKEN value.

Note if the environment variable does not exist, the notification call will NOT be cancelled. The value will resolve to an empty string, and a warning will show up in the logs.

Deploying

Manually 1. First, deploy the expected kube-monkey-config-map configmap in the namespace you intend to run kube-monkey in (for example, the kube-system namespace). Make sure to define the keyname as config.toml

For example kubectl create configmap km-config --from-file=config.toml=km-config.toml or kubectl apply -f km-config.yaml

  1. Run kube-monkey as a k8s app within the Kubernetes cluster, in a namespace that has permissions to kill Pods in other namespaces (eg. kube-system).

See dir examples/ for example Kubernetes yaml files.

  1. You should be able to see debug logs by kubectl logs -f deployment.apps/kube-monkey --namespace=kube-system here the deployment.apps/kube-monkey is the k8s deployment for kube-monkey.

Helm Chart

See How to install kube-monkey with Helm.

Logging

kube-monkey uses glog and supports all command-line features for glog. To specify a custom v level or a custom log directory on the pod, see args: ["-v=5", "-log_dir=/path/to/custom/log"] in the example deployment file

Standardized glog levels grep -r V\([0-9]\) *

L0: None

L1: Highest Level current status info and Errors with Terminations

L2: Successful terminations

L3: More detailed schedule status info

L4: Debugging verbose schedule and config info

L5: Auto-resolved inconsequential issues

More resources: See the k8s logging page suggesting community conventions for logging severity

Instructions on how to get this working on OpenShift 3.x

git clone https://github.com/asobti/kube-monkey.git
cd examples
oc login http://someserver/ -u system:admin
oc project kube-system
oc create -f configmap.yaml
oc -n kube-system adm policy add-role-to-user -z deployer system:deployer
oc -n kube-system adm policy add-role-to-user -z builder system:image-builder
oc -n kube-system adm policy add-role-to-group system:image-puller system:serviceaccounts:kube-system
oc run kube-monkey --image=docker.io/ayushsobti/kube-monkey:v0.4.0 --command -- /kube-monkey -v=5 -log_dir=/var/log/kube-monkey
oc volume dc/kube-monkey --add --name=kubeconfigmap -m /etc/kube-monkey -t configmap --configmap-name=kube-monkey-config-map

OpenShift 4.x

git clone https://github.com/asobti/kube-monkey.git
cd examples
oc login http://someserver/ -u system:admin
oc project kube-system
oc create -f configmap.yaml
oc -n kube-system adm policy add-cluster-role-to-user edit -z default --rolebinding-name kube-monkey-edit
oc run kube-monkey --image=docker.io/ayushsobti/kube-monkey:v0.3.0 --command -- /kube-monkey -v=5 -log_dir=/var/log/kube-monkey
oc set volume dc/kube-monkey --add --name=kubeconfigmap -m /etc/kube-monkey -t configmap --configmap-name=kube-monkey-config-map

Ways to contribute

See How to Contribute

License

This project is licensed under the Apache License v2.0 - see the LICENSE file for details.

Extension points exported contracts — how you extend this code

VictimSpecificAPICalls (Interface)
(no doc) [4 implementers]
internal/pkg/victims/victims.go
VictimKillNumberGenerator (Interface)
(no doc) [2 implementers]
internal/pkg/victims/victims.go
VictimBaseTemplate (Interface)
(no doc) [1 implementers]
internal/pkg/victims/victims.go
VictimAPICalls (Interface)
(no doc) [1 implementers]
internal/pkg/victims/victims.go
Victim (Interface)
(no doc)
internal/pkg/victims/victims.go

Core symbols most depended-on inside this repo

Name
called by 43
internal/pkg/victims/victims.go
Victim
called by 39
internal/pkg/chaos/chaos.go
Kind
called by 34
internal/pkg/victims/victims.go
Error
called by 25
internal/pkg/chaos/chaosresult.go
Namespace
called by 16
internal/pkg/victims/victims.go
Add
called by 15
internal/pkg/schedule/schedule.go
ReplacePlaceholders
called by 10
internal/pkg/notifications/util.go
KillValue
called by 10
internal/pkg/victims/victims.go

Shape

Function 152
Method 106
Struct 13
Interface 5

Languages

Go100%

Modules by API surface

internal/pkg/victims/victims.go45 symbols
internal/pkg/config/config_test.go24 symbols
internal/pkg/config/config.go24 symbols
internal/pkg/victims/victims_test.go19 symbols
internal/pkg/chaos/chaos_test.go16 symbols
internal/pkg/notifications/util_test.go13 symbols
internal/pkg/chaos/chaosmock.go13 symbols
internal/pkg/chaos/chaos.go11 symbols
internal/pkg/schedule/schedule_test.go10 symbols
internal/pkg/schedule/schedule.go8 symbols
internal/pkg/notifications/util.go6 symbols
internal/pkg/victims/factory/statefulsets/eligible_statefulsets_test.go5 symbols

Dependencies from manifests, versioned

github.com/davecgh/go-spewv1.1.1 · 1×
github.com/evanphx/json-patchv5.6.0+incompatible · 1×
github.com/fsnotify/fsnotifyv1.6.0 · 1×
github.com/go-logr/logrv1.2.4 · 1×
github.com/go-openapi/jsonpointerv0.20.0 · 1×
github.com/go-openapi/jsonreferencev0.20.2 · 1×
github.com/go-openapi/swagv0.22.4 · 1×
github.com/golang/glogv1.1.2 · 1×
github.com/google/gnostic-modelsv0.6.8 · 1×

For agents

$ claude mcp add kube-monkey \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact