This project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as EC2 maintenance events, EC2 Spot interruptions, ASG Scale-In, ASG AZ Rebalance, and EC2 Instance Termination via the API or Console. If not handled, your application code may not stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down.
The aws-node-termination-handler (NTH) can operate in two different modes: Instance Metadata Service (IMDS) or the Queue Processor.
The aws-node-termination-handler Instance Metadata Service Monitor will run a small pod on each host to perform monitoring of IMDS paths like /spot or /events and react accordingly to drain and/or cordon the corresponding node.
The aws-node-termination-handler Queue Processor will monitor an SQS queue of events from Amazon EventBridge for ASG lifecycle events, EC2 status change events, Spot Interruption Termination Notice events, and Spot Rebalance Recommendation events. When NTH detects an instance is going down, we use the Kubernetes API to cordon the node to ensure no new work is scheduled there, then drain it, removing any existing work. The termination handler Queue Processor requires AWS IAM permissions to monitor and manage the SQS queue and to query the EC2 API.
You can run the termination handler on any Kubernetes cluster running on AWS, including self-managed clusters and those created with Amazon Elastic Kubernetes Service. If you're using EKS managed node groups, you don't need the aws-node-termination-handler.
⚠️ Note: Windows Server 2019 support has been removed as GitHub Actions no longer supports this version. Please migrate to Windows Server 2022. For more details, see GitHub's deprecation announcement.
Both modes (IMDS and Queue Processor) monitor for events affecting your EC2 instances, but each supports different types of events. Both modes have the following:
Must be deployed as a Kubernetes DaemonSet.
Please note that IMDS does not support lifecycle hooks, but it does support lifecycle state change. When using IMDS mode with the ASG target lifecycle state, ASG will update instance metadata to be Terminated before it terminates the node. NTH will monitor the path latest/meta-data/autoscaling/target-lifecycle-state for changes and will cordon and drain when the target state is set to Terminated.
Must be deployed as a Kubernetes Deployment. Also requires some additional infrastructure setup (including SQS queue, EventBridge rules).
We can use the Queue Processor for both ASG Lifecycle Termination Hooks and Instance State Change Events for termination of nodes. Below listed are the details on how AWS EC2 takes actions for graceful shutdowns. You can pick one that is best suitable for your use, based on the configuration and workloads.
When using the ASG Lifecycle Hooks, ASG first sends the lifecycle action notification then waits until it has been completed or times out. This allows time for NTH to receive the notification via SQS, cordon and drain the node, and then complete the lifecycle action. Once the ASG receives the completion it then instructs EC2 to terminate the instance.
When using the EC2 Console or EC2 API to terminate the instance, a state-change notification is sent and the instance termination is started. EC2 does not wait for a "continue" signal before beginning to terminate the instance. When you terminate an EC2 instance, it should trigger a graceful operating system shutdown which will send a SIGTERM to the kubelet, which will in-turn start shutting down pods by propagating that SIGTERM to the containers on the node. If the containers do not shut down by the kubelet's podTerminationGracePeriod (k8s default is 30s), then it will send a SIGKILL to forcefully terminate the containers. Setting the podTerminationGracePeriod to a max of 90sec (probably a bit less than that) will delay the termination of pods, which helps in graceful shutdown.
You can set NTH to send heartbeats to ASG in Queue Processor mode. This allows for a much longer grace period (up to 48 hours) for termination than the maximum heartbeat timeout of two hours. The feature is useful when pods require long time to drain or when you need a shorter heartbeat timeout with a longer grace period.
Terminating:Wait stage.Heartbeat Interval. Each heartbeat signal resets this timeout, extending the duration that an instance remains in the Terminating:Wait state. Without this lifecycle hook, the instance will terminate immediately when termination event occurs.Heartbeat Interval (required) and Heartbeat Until (optional). NTH operates normally without heartbeats if neither value is set. If only the interval is specified, Heartbeat Until defaults to 172800 seconds (48 hours) and heartbeats will be sent. Heartbeat Until must be provided with a valid Heartbeat Interval, otherwise NTH will fail to start. Any invalid values (wrong type or out of range) will also prevent NTH from starting.Heartbeat Interval (Required)heartbeatIntervalheartbeat-intervalHeartbeat Until (Optional)Heartbeat IntervalheartbeatUntilheartbeat-untilHeartbeat Interval: 1000 secondsHeartbeat Until: 4500 secondsHeartbeat Timeout: 3000 seconds | Time (s) | Event | Heartbeat Timeout (HT) | Heartbeat Until (HU) | Action |
|---|---|---|---|---|
| 0 | Start | 3000 | 4500 | Termination Event Received |
| 1000 | HB1 Issued | 2000 -> 3000 | 3500 | Send Heartbeat |
| 2000 | HB2 Issued | 2000 -> 3000 | 2500 | Send Heartbeat |
| 3000 | HB3 Issued | 2000 -> 3000 | 1500 | Send Heartbeat |
| 4000 | HB4 Issued | 2000 -> 3000 | 500 | Send Heartbeat |
| 4500 | HB Expires | 2500 | 0 | Stop Heartbeats |
| 7000 | Termination | - | - | Instance Terminates |
Note: The instance can terminate earlier if its pods finish draining and are ready for termination.
helm upgrade --install aws-node-termination-handler \
--namespace kube-system \
--set enableSqsTerminationDraining=true \
--set heartbeatInterval=1000 \
--set heartbeatUntil=4500 \
// other inputs..
terminating:wait state.enableSqsTerminationDraining=false and specifying heartbeat flags is prevented in Helm. Directly editing deployment settings to bypass this will cause NTH to fail.workers flag simultaneously.| Feature | IMDS Processor | Queue Processor |
|---|---|---|
| Spot Instance Termination Notifications (ITN) | ✅ | ✅ |
| Scheduled Events | ✅ | ✅ |
| Instance Rebalance Recommendation | ✅ | ✅ |
| ASG Termination Lifecycle Hooks | ❌ | ✅ |
| ASG Termination Lifecycle State Change | ✅ | ❌ |
| AZ Rebalance Recommendation | ❌ | ✅ |
| Instance State Change Events | ❌ | ✅ |
| Issue Lifecycle Heartbeats | ❌ | ✅ |
| NTH Release | K8s v1.32 | K8s v1.31 | K8s v1.30 | K8s v1.29 | K8s v1.28 | K8s v1.27 | K8s v1.26 | K8s v1.25 |
|---|---|---|---|---|---|---|---|---|
| v1.25.6 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| [v1.25.5](htt |
$ claude mcp add aws-node-termination-handler \
-- python -m otcore.mcp_server <graph>