• Home
  • Announcements
  • Platform Events

2020-08-14 - Portal, API, SNAPMobile Outage (Resolved)

Written by Marissa Orsini

Updated at April 22nd, 2023

Contact Us

  • The Essentials
    FAQs Forms
  • Announcements
    Carrier Events mFax Events Platform Events Release Notes
  • Billing Administration
    Datagate OneBill
  • Faxing
    mFax - Analog mFax - Digital Native Fax
  • Hardware & Software
    Manual Configuration Provisioning NDP Axis Cisco Fanvil Grandstream Polycom Snom Yealink Mobile Applications Desktop Applications Mobile-X SNAPbuilder TeamMate Connector UC Integrator
  • Hosted Voice
    Auto Attendants Branding Call Queues Call Routing CDRs Conferencing E-911 Features Fraud Integrations Inventory / Phone Numbers Local & Toll Free Porting Onboarding Recommendations SNAP.HD SIP Trunking SMS / MMS Users Voicemail Caller ID
  • Troubleshooting
    VoIPmonitor Firewalls PBX
  • Ray's Stuff
+ More

Table of Contents

Affected Services Event Summary Event Timeline (All times 24-hour format, EST) July 8th, 2020 Root Cause Analysis Future Preventative Action

Event Description: Portal, API, SNAPMobile Outage

Event Start Time: 2020-08-14 5:05 EST

Event End Time: 2020-08-14 7:23 EST

RFO Issue Date: 2020-08-19

 

Affected Services

Some Phones registered to Atlanta lost registration. Devices configured with SRV or UDP failed over. Devices manually registered to core1-atl did not regain registration.


Event Summary

On July 8th, 2020, the ACS cluster crashed on three separate occasions causing 30 seconds outages. Several phones lost registration and the ability to make or receive calls.


Event Timeline (All times 24-hour format, EST)

July 8th, 2020

  • 14:19 ACS Cluster crashed
  • 14:19 NOC team was notified
  • 14:19 ACS Cluster Restored after 30 seconds
  • 14:19 ACS Cluster crashed
  • 14:19 NOC team was notified
  • 14:19 ACS Cluster Restored after 30 seconds
  • 14:20 NOC team was notified
  • 14:20 First report identified by partner
  • 14:21 ACS Restored along with Registration of Devices.
  • 14:22 Second report identified by partner
  • 15:39 Tier 3 & 4 engineers began an investigation. A report posted to Discord.
  • 15:40 ACS Cluster crashed
  • 15:40 NOC team was notified
  • 15:40 ACS Cluster Restored after 30 seconds

Root Cause Analysis

In troubleshooting with our senior engineers, we determined that the crash was inside the RTP layer of the switch specifically on an object called CRTPRelayTap. This suggests the issue was likely during an audio tap for audio monitoring. 

It has been marked a bug and will be corrected via a patch.


Future Preventative Action

The patch is undergoing testing and is due to be installed on 07/30/2020 during our maintenance window. 

Update: Patch was applied on 7/30/2020 and confirmed to resolve issue.

portal api outage resolved

Was this article helpful?

Yes
No
Give feedback about this article

Related Articles

  • 2019-05-06 - Atlanta Network Failure (Resolved)
  • 2019-06-06 - NYJ004 Network Failure (Resolved)
  • 2020-07-08 - ACS001 Core Failure (Resolved)
  • 2019-11-22 - ACS001 Core Failure (Resolved)

Knowledge Base Software powered by Helpjuice

Expand