Emergency Maintenance on Network Infrastructure
Scheduled Maintenance Report for Pax8
Postmortem

Incident Report

Planned emergency maintenance was carried out last night. This was to remedy issues between our border routers and the core network. These issues were caused by a bug that originated back in August 2019. This was a recommended fix outlined by Juniper support after they identified a bug that can present itself under very specific circumstances.

The work was undertaken in two parts, firstly on the backup router, then proceed to update the primary router after checking backup change was successful. All work and outages occurred within the advertised maintenance window from 10pm to 2am.

Unfortunately the change to the link up to the backup router caused a downstream issue that affected all devices connected to the backbone network and the primary router. This initial outage occurred at 10:10pm and was resolved once connectivity was restored at 10:45pm.

At this time we were halfway through the required changes and Network team made the decision to complete the work so that we did not leave the system in an unstable condition, as noted by Juniper support. We completed the change on the primary router uplink at 11:45pm and this also caused some instability in the backbone network with the dynamic routing. This cleared at midnight.

There are still some outstanding instabilities, specifically around the Azure Stack environment and these are being raised with Juniper support this morning. The Network team are working through the individual issues to resolve them as soon as possible.

Network Team

Posted Dec 13, 2019 - 14:18 NZDT

Completed
The scheduled maintenance has been completed.
Posted Dec 13, 2019 - 02:00 NZDT
Update
Services have been restored. The team are still busy with testing and verification of the changes implemented.
Posted Dec 13, 2019 - 01:37 NZDT
Update
Testing underway. Some services are still in the process of being restored.
Posted Dec 13, 2019 - 00:44 NZDT
Update
Remediation steps have been completed. Systems are being brought online and post change testing is currently underway.
Posted Dec 13, 2019 - 00:26 NZDT
Update
Remediation steps are currently underway. Update to follow once work is complete
Posted Dec 13, 2019 - 00:10 NZDT
Update
The team are performing additional remediation tasks that may cause an additional 5-10 minute disruption to service.
Posted Dec 12, 2019 - 23:54 NZDT
Update
We are currently testing individual systems that may still be affected.
Posted Dec 12, 2019 - 23:53 NZDT
Update
A small number of customer VMs are still offline and we are working on restoring service to those customers impacted.
Posted Dec 12, 2019 - 23:24 NZDT
Update
A detailed postmortem of the Auckland network outage experienced tonight will be made available during business hours tomorrow once more details become available.
Posted Dec 12, 2019 - 22:57 NZDT
Update
Network services have been restored in our Auckland datacentre. Engineers are currently monitoring the situation. Outage window began 22:17 and resolved by 22:45.
Posted Dec 12, 2019 - 22:53 NZDT
Update
The scheduled network change has resulted in loss of network services in our Auckland datacentre. Engineers are currently reverting the change as quickly as possible in order to restore service.
Posted Dec 12, 2019 - 22:41 NZDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Dec 12, 2019 - 22:00 NZDT
Scheduled
The Auckland Datacenter will be undergoing an emergency maintenance to one of our Internet border routers.

There will be a disruption to the IP Transit between the internet and Internal Services lasting between 1 - 5 minutes as a new configuration is being applied to the border routers.

There will be no impact to backend services, storage or services running inside the data centre.
Posted Dec 06, 2019 - 13:02 NZDT
This scheduled maintenance affected: Managed Compute - Powered by VMware (Auckland) and Network Services (Auckland).