Incident: 5 minute outage due to a RAID storage failure
Incident Date: 2019-05-19 21:21 - 21:26 EDT
Affected Services: All
There was a 5 minute outage caused by a drive failure on our primary database server. The issue manifested itself in a way that prevented our high availability database cluster from self-healing. No data or transactions were lost during this incident. No funding events were impacted by this incident.
Our gateway and portals would have been unavailable for approximately 5 minutes. Approximately 3000 transactions and 1000 merchants are expected to have been impacted, based on run-rates from the prior week.
The server was manually forced offline, which allowed our database cluster to self-heal by failing over to another healthy node. The drive causing the issue on the primary node was replaced.
We experienced approximately 5 minutes of downtime due to a storage failure and may need to perform additional maintenance during Monday morning’s maintenance window (0430 EDT). We will provide a postmortem within 2 days.
The Gateway issue is resolved and service has been fully restored. We will provide a postmortem / incident report within 2 business days.
We have confirmed that the Gateway service is currently experiencing widespread disruptions at this time. Our network operations center is investigating the issue and will update this incident with additional information as it becomes available.
We will provide regular updates until the situation is corrected.