MCPcopy
hub / github.com/InternLM/InternLM / monitor_exception

Method monitor_exception

internlm/monitor/monitor.py:151–160  ·  view source on GitHub ↗

Catch and format exception information, send alert message to Feishu.

(self, alert_address: str = None, excp_info: str = None)

Source from the content-addressed store, hash-verified

149 self.last_step_loss = cur_step_loss
150
151 def monitor_exception(self, alert_address: str = None, excp_info: str = None):
152 """Catch and format exception information, send alert message to Feishu."""
153 filtered_trace = excp_info.split("\n")[-10:]
154 format_trace = ""
155 for line in filtered_trace:
156 format_trace += "\n" + line
157 send_alert_message(
158 address=alert_address,
159 message=f"Catch Exception from {socket.gethostname()} with rank id {gpc.get_global_rank()}:{format_trace}",
160 )
161
162 def handle_sigterm(self, alert_address: str = None):
163 """Catch SIGTERM signal, and send alert message to Feishu."""

Callers 1

train.pyFile · 0.80

Calls 2

send_alert_messageFunction · 0.85
get_global_rankMethod · 0.80

Tested by

no test coverage detected