On Monday, March 12 a major network issue occurred on our Gandi Mail service which resulted in problems for our customers with sending and receiving email over the course of several hours. This incident has been resolved and Gandi Mail now functions normally. All emails sent during this incident have since been delivered, with the exception of those sent to Yahoo email addresses, where our servers have been temporarily blacklisted.
The underlying issue was related to a data center migration between our legacy data center, FR-SD2, and our new data centers in general. In particular, on Sunday we migrated the Gandi Mail platform off of FR-SD2.
Monday morning Paris time as a typically high volume of traffic arrived, we noticed a problem in the network architecture of the recently deployed servers: the network capacity was not sufficient to absorb the added Monday morning traffic.
As a result, this added traffic overwhelmed the entire network and rendered our other services unstable as well. Our teams first intervened to isolate this network traffic due to Gandi Mail usage so that we could keep it from taking down our other services.
Monday midday Paris time Gandi Mail was completely isolated but remained unstable.
At the same, our engineers reworked Gandi Mail’s network architecture in order to increase its capacity. At the beginning of the afternoon Paris time on Monday we were able to deploy this new architecture and the strain on the network began to decrease.
In order to help accelerate this process, we decided Monday afternoon Paris time to relax the rules on our email volume monitoring tool. This tool allows us to limit the impact of large spam campaigns regularly launched by botnets through our customers’ email accounts (without their knowledge).
This unfortunately resulted in inadvertently letting a spam campaign pass through and allowed a large volume of email to be sent by a botnet through our servers. We quickly realized what was going on, but not before several hundred thousand emails were sent, which resulted in several email service providers blocking email from our servers, specifically Yahoo and Microsoft (including Hotmail and Outlook).
Emails sent and received during this time were being added to a growing backlog in our email wait list.
Thanks to this wait list, emails that can’t be processed immediately can be held until such time as they can be. For example, if you send an email to somebody but the incoming email server for that person’s email service is not available, the email isn’t lost, but is put aside to be sent again later. Based on the number of times sending the email has failed, the email gets placed on lists with increasingly long wait times (that is, the frequency of each attempt to send the email goes down).
Due to this fact, emails sent during the incident were therefore taking longer to send even once service was restored than new emails sent after the incident was resolved.
This resulted in a situation midday Paris time Tuesday where some emails sent Monday morning Paris time during the incident were still not sent.
The current situation:
- We have stabilized Gandi Mail’s servers; sending and receiving emails from either of our two hosted webmail services or from an email client, including emails containing attachments now works normally.
- Emails sent to Yahoo email addresses, however, are still not delivered and our teams are actively working with Yahoo to get our servers unblocked.
- All other email sent during the incident Monday have now been delivered.
If you are still running into problems with sending and receiving email, please use the form at help.gandi.net to report the problem, including whether the issue is with sending or receiving email, the software being used (an email client like Thunderbird or webmail), and the sender and receiver email addresses involved.
This incident occurred in the context of closing a legacy data center, FR-SD2, which we’ve communicated more in depth about previously. This was a major project for our technical teams that took months of preparation, including the migration of portions of all of our services, from Simple Hosting infrastructure, to Gandi Cloud, to Gandi Blog, and of course Gandi Mail.
The migration ends this week, with the final closure of FR-SD2. At that point, we’ll be able to take a step back and take what lessons we can from the difficulties encountered along the way and the impact those had on the quality of service provided and our own internal processes.
We would emphasize as a reminder that this was a necessary migration, required to get our services off of the now out-dated FR-SD2 data center infrastructure and onto our new infrastructure which will enable us to further evolve the services we offer. Our new data centers, FR-SD3, FR-SD4, and FR-SD5 will allow us, and by extension you, to take advantage of a superior network architecture, from new equipment, and allow us to improve overall performance on all of our services.
Of course, we are aware of the impact, in some cases severe, of these various issues on the quality of our services and we sincerely apologize for any inconvenience it may have caused you. Please rest assured that every team at Gandi is working hard to get us through the final steps of this transition and leave your services with us in the best shape possible.