Major incident on our hosting infrastructure in Luxembourg

Jan 9, 2020  - written by  in Incidents

Situation update: January 13th, 2020  15:00 PM (CET)

On Wednesday January 8, 2020 at 6:53 AM Pacific (14:53 UTC), an incident occurred on one of our ZFS storage units used for our PaaS and IaaS hosting services (Gandi Simple Hosting and Gandi Cloud respectively).
As of Monday, January 13, 2020, we can announce that we have successfully restored the data.
  • The PAAS/Simple Hosting instances have been started.
  • IAAS/Cloud Hosting instances must be started by customers.
For any problem, please contact support.
We will send you a complete post mortem as soon as possible.

Situation update: January 10th, 2020  12:00 PM (CET)

Since January 8th, 2020, a storage incident has been impacting some of our IAAS/PAAS customers at our Luxembourg datacenter.

Our technical teams are still trying to recover the data, and at present we are able to provide you with the following information:
  • we have managed to import the ZFS data pool
  • the copying of the data to another storage unit is still in progress
  • about 50% of the copy is complete at this time
  • the remaining steps of the data recovery can only begin once this copy has been completed
  • we do not yet have any guarantee with regards to data integrity
This event does not have any impact on any other Gandi service:
  •     Domain names
  •     DNS
  •     Email
  •     SSL Certificates

Situation update: January 9th, 2020

On Wednesday January 8, 2020 at 6:53 AM Pacific (14:53 UTC), an incident occurred on one of our ZFS storage units used for our PaaS and IaaS hosting services (Gandi Simple Hosting and Gandi Cloud respectively).

The storage unit became unavailable, prompting an interruption in service for all PaaS and IaaS services using the disk associated with that unit.

We followed the established procedures:

  • move the control of data to an emergency machine
  • inform customers impacted by the incident by email

In addition, we communicated live about the incident from when we first became aware of it via our Twitter accounts @gandinoc, @gandi_net, and @gandibar.

The data import on the emergency machine was not possible due to a corruption of the meta-data that we are not aware of the cause of.

We’ve since been trying to force the data import, a maneuver that requires distributing valid meta-data.

Despite the best efforts of our technical teams to try to restore the data in the affected storage unit, we are currently not able to recover them. The result of this operation, at the time of this posting, is uncertain.

This type of incident is extremely rare and in this case is limited to a single storage unit.

We will provide a full postmortem as soon as we can.

We’re very sorry for this truly unfortunate incident and we offer our sincere apologies to anyone impacted.

The Gandi team