2025-06-05 Registration Failure on GRR

Updated at June 6th, 2025

+ More

Table of Contents

Affected Services Event Summary Event Timeline June 5, 2025 Root Cause Impact Summary Future Preventative Action Immediate Actions Taken Long-term Actions

Affected Services

Grand Rapids (GRR) Voice

Event Summary

A critical system service crash on the GRR (Grand Rapids) core server caused a brief but complete service disruption for devices registered to GRR. The Network Management Service (NMS) crashed at 2:27 PM ET, immediately triggering automatic failover protocols that redirected active calls and device registrations to alternate servers. The service auto-recovered within 2 minutes, allowing all devices to successfully re-register and resume normal operations.

Event Timeline

June 5, 2025

2:27 PM ET - The Network Management Service (NMS) on the Grand Rapids core server experienced a critical crash. This service controls device registration, call routing, call processing, and other essential voice functions. The crash resulted in:

Immediate disconnection of active calls on GRR
Device registration failures for GRR-registered phones
Automatic failover activation to alternate servers (ATL, IAD, PHX)

2:29 PM ET - The NMS service automatically restarted and began recovery operations. Devices that had failed over to alternate servers started re-registering back to the GRR server as normal operations resumed.

2:33 PM ET - Full service restoration confirmed. All device registrations and call processing returned to 100% normal operation on the GRR server. No manual intervention was required for the recovery process.

2:53 PM ET - Our support team engaged vendor support to investigate the core dump file generated during the crash. This investigation will help identify the root cause and prevent future occurrences.

Root Cause

The Network Management Service (NMS), which serves as the core component for device registration and call routing on the GRR server, experienced an unexpected critical failure. This service crash created a complete service interruption for all GRR-registered devices.

The NMS handles multiple essential functions including:

Device registration and authentication
Call routing and processing logic
Session management for active calls
Integration with carrier services

When this service failed, it immediately triggered our automatic failover mechanisms, redirecting traffic to healthy servers while the service underwent automatic recovery.

Impact Summary

Duration: 2 minutes (2:27 PM - 2:29 PM ET)
Scope: All devices registered to the Grand Rapids (GRR) server
Services Affected: Voice calling (inbound and outbound), device registration
Automatic Recovery: NMS service auto-restarted without manual intervention
Failover Performance: Alternate servers successfully handled redirected traffic
Customer Impact: Brief call disconnections, followed by automatic re-registration

The short duration and automatic recovery minimized customer impact, though users experienced momentary service interruption during the transition.

Future Preventative Action

Immediate Actions Taken

No immediate preventative actions were required, as the incident involved an unexpected service crash with successful automatic recovery. Our existing failover mechanisms performed as designed.

Long-term Actions

Vendor Investigation: Vendor support will conduct a comprehensive analysis of the core dump file generated during the NMS service crash. This investigation will:

Identify the specific cause of the service failure
Determine if this represents a known issue or new bug
Implement appropriate fixes in future NMS service versions
Provide recommendations for additional monitoring or preventative measures

Monitoring Enhancement: We will review our current monitoring thresholds for the NMS service to ensure we have optimal early warning capabilities for similar issues

outage registration loss grr outage grr major incident incident june 6 june 6 outage