The Gandi Community

Incident with Simple Hosting (Resolved)

Following an incident that occurred during an update, some Simple Hosting websites stopped responding correctly (they display an error message). We have identified the problem and are correcting it at this time. It is neither necessary nor recommended that you reboot your instance.

We will post more information here when we have an update for you.

The following updates are in CET for February 19, 2013:

13:08 The origin of the problem has been found, we are verifying the solution and will apply it shortly

13:14 The operation is still in progress. 25% of the platform is impacted.

13:44 The script did not work. We are correcting it and testing it on several instances before launching it for all the others. We can confirm that no data has been lost.

14:25 The script works. We are applying it to all of the instances affected. This will take about one hour before all of them are fixed.

15:53 The update is taking longer than expected. Estimated time to resolution is 16:50-17:00. 

 

Incicent resolved. There are the technical details:

The affected instances are all now restarted. Any residual issues we will handle on a case-by-case basis. The deployment of a migration script failed, and all Simple Hosting instances were affected. A configuration change that should have been applied on the next restart was instead applied directly to the Apache service, and the logs were rotated. In parallel, an automatic recovery was executed on the instances in the middle of a migration. The end result was that the instances were started with a partial update applied. Consequently, for us to correct this problem, we had to stop the majority of the instances, and determine which were in an inconsistent state. We then restarted the instances, and forced the migration of the incompletely updated systems. This took longer than expected, which is why our initial recovery estimates were inaccurate. No data was lost during this incident, and your instance should be fully functional.

Please kindly accept our apologies for this incident. This week, we will be discussing how to ensure that this never happens again in this way.