Incident Details

Show Details For An Incident

DC5 Data Centre Network Issue11/08/2023 14:47

Resolved [24/08/2023 16:04]

== Root Cause ==

On Friday 11th August at 14:40, monitoring systems detected a significant issue with traffic routing via the Data Centre's DDoS mitigation solution, triggering a Major Incident response. Core network devices in DC5 and THN2 London data centres were failing to handle traffic as expected. The service disruption was caused by a routing problem within the DC5 London Data Centre. Under normal operating conditions traffic would have routed via an additional resilient London Data Centre. However, a failure by a third-party supplier meant that the route to the resilient Data Centre was unavailable for the full duration of the incident.

The Data Centre encountered a significant issue pertaining to the routing of traffic by the Data Centre's DDOS mitigation solution. This was a complex issue resulting in a lengthy investigation process across multiple appliances in the DC5 data centre. The Data Centre's investigations confirmed the issue was in the network layer and therefore made the necessary amendments leading to service restoration.

Customers may have experienced disruptions in DNS services for domain names hosted with our network. Our DNS servers, namely ns1.interdns.co.uk and ns2.interdns.co.uk, are typically hosted in separate data centers within London, each on distinct IP ranges. These servers are designed to ensure uninterrupted DNS service, but as a result of this incident spanning both centres, services were impacted.

== Next Steps ==

The data centre has undertaken internal reviews. The root cause was analysed and their technical teams defined a detailed action plan, which includes an immediate review of appliance configuration, software upgrades, resiliency validation and process improvements.

We have undertaken a strategic initiative to enhance our DNS infrastructure. Our plan includes expanding our presence into additional data centers and establishing two entirely independent network setups. These measures are intended to safeguard against any future occurrences of similar disruptions, ensuring the continued reliability of our DNS services.

We apologise for the disruption and inconvenience this has caused you and your customers and appreciate your patience and understanding during this time.

Update [11/08/2023 20:46]

Services are all back. Please consider them to be at risk whilst the data centre work on the issue in-hand. A full explanation of the issue will be provided in due course, but the focus lay on restoring stability.

Update [11/08/2023 20:25]

We are aware that there is a repeat issue this evening. We are liaising with the data centre.

Update [11/08/2023 15:28]

We are starting to see the data centre again. It remains unclear from the centre what the issue is. We will provide further details once understood.

Investigating [11/08/2023 14:47]

We are currently investigating an issue affecting the DC5 data centre. This will impact access to our control panel as well as shared hosting and email services. Updates to follow.

Current Status