hub / github.com/aws/aws-node-termination-handler

github.com/aws/aws-node-termination-handler @v1.25.6

repository ↗ · DeepWiki ↗ · release v1.25.6 ↗ · Ask this repo → · + Follow

479 symbols 1,999 edges 62 files 156 documented · 33% ● updated 19d agov1.25.6 · 2026-04-02★ 1,75911 open issues

README

AWS Node Termination Handler

Gracefully handle EC2 instance shutdown within Kubernetes

Project Summary

This project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as EC2 maintenance events, EC2 Spot interruptions, ASG Scale-In, ASG AZ Rebalance, and EC2 Instance Termination via the API or Console. If not handled, your application code may not stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down.

The aws-node-termination-handler (NTH) can operate in two different modes: Instance Metadata Service (IMDS) or the Queue Processor.

The aws-node-termination-handler Instance Metadata Service Monitor will run a small pod on each host to perform monitoring of IMDS paths like /spot or /events and react accordingly to drain and/or cordon the corresponding node.

The aws-node-termination-handler Queue Processor will monitor an SQS queue of events from Amazon EventBridge for ASG lifecycle events, EC2 status change events, Spot Interruption Termination Notice events, and Spot Rebalance Recommendation events. When NTH detects an instance is going down, we use the Kubernetes API to cordon the node to ensure no new work is scheduled there, then drain it, removing any existing work. The termination handler Queue Processor requires AWS IAM permissions to monitor and manage the SQS queue and to query the EC2 API.

You can run the termination handler on any Kubernetes cluster running on AWS, including self-managed clusters and those created with Amazon Elastic Kubernetes Service. If you're using EKS managed node groups, you don't need the aws-node-termination-handler.

⚠️ Note: Windows Server 2019 support has been removed as GitHub Actions no longer supports this version. Please migrate to Windows Server 2022. For more details, see GitHub's deprecation announcement.

Major Features

Both modes (IMDS and Queue Processor) monitor for events affecting your EC2 instances, but each supports different types of events. Both modes have the following:

Helm installation and event configuration support
Webhook feature to send shutdown or restart notification messages
Unit & integration tests

Instance Metadata Service (IMDS) Processor

Must be deployed as a Kubernetes DaemonSet.

Monitors EC2 Instance Metadata for:
Spot Instance Termination Notifications
Scheduled Events
Instance Rebalance Recommendations
Autoscaling Group Target Lifecycle State changes

IMDS Processor with ASG Target Lifecycle State change

Please note that IMDS does not support lifecycle hooks, but it does support lifecycle state change. When using IMDS mode with the ASG target lifecycle state, ASG will update instance metadata to be Terminated before it terminates the node. NTH will monitor the path latest/meta-data/autoscaling/target-lifecycle-state for changes and will cordon and drain when the target state is set to Terminated.

Queue Processor

Must be deployed as a Kubernetes Deployment. Also requires some additional infrastructure setup (including SQS queue, EventBridge rules).

Monitors an SQS Queue for:
Spot Instance Termination Notifications
Scheduled Events (via AWS Health)
Instance Rebalance Recommendations
ASG Termination Lifecycle Hooks to handle the following:
- ASG Scale-In
- Availability Zone Rebalance
- Unhealthy Instances, and more
Instance State Change events

We can use the Queue Processor for both ASG Lifecycle Termination Hooks and Instance State Change Events for termination of nodes. Below listed are the details on how AWS EC2 takes actions for graceful shutdowns. You can pick one that is best suitable for your use, based on the configuration and workloads.

Queue Processor with ASG Lifecycle Hooks

When using the ASG Lifecycle Hooks, ASG first sends the lifecycle action notification then waits until it has been completed or times out. This allows time for NTH to receive the notification via SQS, cordon and drain the node, and then complete the lifecycle action. Once the ASG receives the completion it then instructs EC2 to terminate the instance.

Queue Processor with Instance State Change Events

When using the EC2 Console or EC2 API to terminate the instance, a state-change notification is sent and the instance termination is started. EC2 does not wait for a "continue" signal before beginning to terminate the instance. When you terminate an EC2 instance, it should trigger a graceful operating system shutdown which will send a SIGTERM to the kubelet, which will in-turn start shutting down pods by propagating that SIGTERM to the containers on the node. If the containers do not shut down by the kubelet's podTerminationGracePeriod (k8s default is 30s), then it will send a SIGKILL to forcefully terminate the containers. Setting the podTerminationGracePeriod to a max of 90sec (probably a bit less than that) will delay the termination of pods, which helps in graceful shutdown.

Issuing Lifecycle Heartbeats

You can set NTH to send heartbeats to ASG in Queue Processor mode. This allows for a much longer grace period (up to 48 hours) for termination than the maximum heartbeat timeout of two hours. The feature is useful when pods require long time to drain or when you need a shorter heartbeat timeout with a longer grace period.

How it works

When NTH receives an ASG lifecycle termination event, it starts sending heartbeats to ASG to renew the heartbeat timeout associated with the ASG's termination lifecycle hook.
The heartbeat timeout acts as a timer that starts when the termination event begins.
Before the timeout reaches zero, the termination process is halted at the Terminating:Wait stage.
By issuing heartbeats, graceful termination duration can be extended up to 48 hours, limited by the global timeout.

How to use

Configure a termination lifecycle hook on ASG (required). Set the heartbeat timeout value to be longer than the Heartbeat Interval. Each heartbeat signal resets this timeout, extending the duration that an instance remains in the Terminating:Wait state. Without this lifecycle hook, the instance will terminate immediately when termination event occurs.
Configure Heartbeat Interval (required) and Heartbeat Until (optional). NTH operates normally without heartbeats if neither value is set. If only the interval is specified, Heartbeat Until defaults to 172800 seconds (48 hours) and heartbeats will be sent. Heartbeat Until must be provided with a valid Heartbeat Interval, otherwise NTH will fail to start. Any invalid values (wrong type or out of range) will also prevent NTH from starting.

Configurations

`Heartbeat Interval` (Required)

Time period between consecutive heartbeat signals (in seconds)
Specifying this value triggers heartbeat
Range: 30 to 3600 seconds (30 seconds to 1 hour)
Flag for custom resource definition by *.yaml / helm: heartbeatInterval
CLI flag: heartbeat-interval
Default value: X

`Heartbeat Until` (Optional)

Duration over which heartbeat signals are sent (in seconds)
Must be provided with a valid Heartbeat Interval
Range: 60 to 172800 seconds (1 minute to 48 hours)
Flag for custom resource definition by *.yaml / helm: heartbeatUntil
CLI flag: heartbeat-until
Default value: 172800 (48 hours)

Example Case

Heartbeat Interval: 1000 seconds
Heartbeat Until: 4500 seconds
Heartbeat Timeout: 3000 seconds

Time (s)	Event	Heartbeat Timeout (HT)	Heartbeat Until (HU)	Action
0	Start	3000	4500	Termination Event Received
1000	HB1 Issued	2000 -> 3000	3500	Send Heartbeat
2000	HB2 Issued	2000 -> 3000	2500	Send Heartbeat
3000	HB3 Issued	2000 -> 3000	1500	Send Heartbeat
4000	HB4 Issued	2000 -> 3000	500	Send Heartbeat
4500	HB Expires	2500	0	Stop Heartbeats
7000	Termination	-	-	Instance Terminates

Note: The instance can terminate earlier if its pods finish draining and are ready for termination.

Example Helm Command

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set enableSqsTerminationDraining=true \
  --set heartbeatInterval=1000 \
  --set heartbeatUntil=4500 \
  // other inputs..

Important Notes

Be aware of global timeout. Instances cannot remain in a wait state indefinitely. The global timeout is 48 hours or 100 times the heartbeat timeout, whichever is smaller. This is the maximum amount of time that you can keep an instance in terminating:wait state.
Lifecycle heartbeats are only supported in Queue Processor mode. Setting enableSqsTerminationDraining=false and specifying heartbeat flags is prevented in Helm. Directly editing deployment settings to bypass this will cause NTH to fail.
The heartbeat interval should be sufficiently shorter than the heartbeat timeout. There's a time gap between instance startup and NTH initialization. Setting the interval just slightly smaller than or equal to the timeout causes the heartbeat timeout to expire before the first heartbeat is issued. Provide adequate buffer time for NTH to complete initialization.
Issuing heartbeats is part of the termination process. The maximum number of instances that NTH can handle termination concurrently is limited by the number of workers. This implies that heartbeats can only be issued for up to the number of instances specified by the workers flag simultaneously.

Which one should I use?

Feature	IMDS Processor	Queue Processor
Spot Instance Termination Notifications (ITN)	✅	✅
Scheduled Events	✅	✅
Instance Rebalance Recommendation	✅	✅
ASG Termination Lifecycle Hooks	❌	✅
ASG Termination Lifecycle State Change	✅	❌
AZ Rebalance Recommendation	❌	✅
Instance State Change Events	❌	✅
Issue Lifecycle Heartbeats	❌	✅

Kubernetes Compatibility

NTH Release	K8s v1.32	K8s v1.31	K8s v1.30	K8s v1.29	K8s v1.28	K8s v1.27	K8s v1.26	K8s v1.25
v1.25.6	✅	✅	✅	✅	❌	❌	❌	❌
[v1.25.5](htt

Extension points exported contracts — how you extend this code

Monitor (Interface)

Monitor is an interface which can be implemented for various sources of interruption events [5 implementers]

pkg/monitor/types.go

UptimeFuncType (FuncType)

UptimeFuncType cleans up function arguments or return type.

pkg/uptime/common.go

IEC2Helper (Interface)

(no doc) [1 implementers]

pkg/ec2helper/ec2helper.go

DrainTask (FuncType)

DrainTask defines a task to be run when draining a node

pkg/monitor/types.go

Core symbols most depended-on inside this repo

New

called by 55

pkg/ec2metadata/ec2metadata.go

pkg/monitor/sqsevent/sqs-monitor.go

pkg/monitor/sqsevent/sqs-monitor.go

Shape

Function 291

Method 136

Struct 46

Interface 4

FuncType 2

Languages

Go100%

Modules by API surface

pkg/node/node.go51 symbols

pkg/ec2metadata/ec2metadata_test.go30 symbols

pkg/node/node_test.go29 symbols

pkg/monitor/sqsevent/sqs-monitor_test.go28 symbols

pkg/monitor/sqsevent/sqs-monitor.go17 symbols

pkg/ec2metadata/ec2metadata.go16 symbols

pkg/observability/opentelemetry_test.go14 symbols

pkg/interruptioneventstore/interruption-event-store.go13 symbols

pkg/test/aws-mocks.go11 symbols

pkg/monitor/sqsevent/sqs-monitor_internal_test.go11 symbols

pkg/logging/versioned.go11 symbols

pkg/webhook/webhook_test.go10 symbols

Dependencies from manifests, versioned

github.com/Azure/go-ansitermv0.0.0-2021061722524 · 1×

github.com/MakeNowJust/heredocv1.0.0 · 1×

github.com/Masterminds/goutilsv1.1.1 · 1×

github.com/Masterminds/semver/v3v3.2.0 · 1×

github.com/Masterminds/sprig/v3v3.2.3 · 1×

github.com/aws/aws-sdk-gov1.55.4 · 1×

github.com/beorn7/perksv1.0.1 · 1×

github.com/cespare/xxhash/v2v2.3.0 · 1×

github.com/chai2010/gettext-gov1.0.2 · 1×

github.com/davecgh/go-spewv1.1.1 · 1×

github.com/emicklei/go-restful/v3v3.11.0 · 1×

github.com/evanphx/json-patchv5.6.0+incompatible · 1×

For agents

$ claude mcp add aws-node-termination-handler \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/aws/aws-node-termination-handler @v1.25.6

AWS Node Termination Handler

Gracefully handle EC2 instance shutdown within Kubernetes

Project Summary

Major Features

Instance Metadata Service (IMDS) Processor

IMDS Processor with ASG Target Lifecycle State change

Queue Processor

Queue Processor with ASG Lifecycle Hooks

Queue Processor with Instance State Change Events

Issuing Lifecycle Heartbeats

How it works

How to use

Configurations

Heartbeat Interval (Required)

Heartbeat Until (Optional)

Example Case

Example Helm Command

Important Notes

Which one should I use?

Kubernetes Compatibility

Extension points exported contracts — how you extend this code

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents

`Heartbeat Interval` (Required)

`Heartbeat Until` (Optional)