January 20, 2017
One of our Google Cloud EU based load balancer instance’s host machine has shut down unexpectedly, and had to be restarted on a new host machine. Google Cloud usually live migrates instances from one host machine to another without downtime, however this was a serious hardware failure which didn’t allow the process to complete in time. The new load balancer instance was up and running 4 minutes after our engineers have been notified and after running the necessary tests to ensure its consistency it began serving requests 7 minutes after the initial compute.instances.hostError event. We apologize for the inconvenience.
One of our load balancers was overwhelmed and we resolved the issue as soon as possible. The downtime was around 5 minutes and our engineers have made the necessary steps to avoid such instances in the future. We apologize for the inconvenience!
The issue has been resolved.
One of our host machines (has been running for an extended period of time without any issues) experienced a kernel crash because of a bug in the Btrfs driver. After a quick assessment of the situation, we decided to hard reboot the instance and restore the running containers one by one checking data consistency.
We have reported the bug to the official Linux Kernel dev list. They confirmed that there’s a bug in the Btrfs driver.
We are awfully sorry for the inconveniences, we are working together with the developers to uncover the cause and to patch it as soon as possible!
Thanks for your understanding!
The issue has been resolved.
Google Cloud shut one of our VMs down without notice. One of the disks attached to the machine experienced a complete failure, and thus we are unable to restart the machine. Google Cloud is working on restoring the disk.
July 28, 3:44 pm
We are continuing to experience an issue with with one of our US-based Google Compute Engine instances. We apologize for any inconvenience affected clients are experiencing. Google Cloud continues to provide updates to us as they work through this.
July 28, 4:32 pm
Google Cloud has determined the reason for the failure and is working to replace the disk. Access to our snapshots are blocked as well as of now. We apologize for any inconvenience affected clients are experiencing.
July 28, 7:05 pm
The affected disk is not showing erroneous anymore, we have started creating a snapshot of the VM instance. We are in the process of booting the machine now as well.
July 28, 7:10 pm
The snapshot creation lasted about 5 minutes, and once it was finished we performed a manual hard reboot on the affected VM instance. We sincerely apologize for any inconvenience caused. We are re-architecting our entire system to prevent a situation like this in the future.
We are investigating reports of an issue with one of our US based Google Cloud Compute Engine instances. We will publish an update as soon as we know more.
July 26, 2:10 pm
We are experiencing an issue with with one of our US based Google Compute Engine instances.
For everyone who is affected, we apologize for any inconvenience you may be experiencing. We are already working on tracking down the issue.
July 26, 7:28 pm
The issue with the US based Google Compute Engine instance has been identified. The root of the problem is degraded disk performance in one of our ZFS pools on the VM instance which may cause service intermittence for those clients whose sites are hosted on this machine.
We are already working on resolving the issue and making appropriate improvements to prevent or minimize future recurrence.
July 27, 2:50 am
After identifying the root cause we performed a manual hard reboot on the affected VM instance. The complete downtime was 60 minutes while the emergency maintenance lasted.
To prevent future incidents, we’ll live-migrate the affected containers to another host machine. The process won’t affect your sites in any way, there won’t be any downtime.
We sincerely apologize for falling below the level of service you rely on!
Our server provider for our legacy VPS server architecture is currently investigating issues affecting sites hosted in the Frankfurt and Tokyo data centers, and also possibly in the Dallas data center.
April 22, 6:24 pm
The connectivity issue has been resolved.
Our server provider currently investigating network connectivity issues in our Atlanta data center. We’re keeping an eye on the issue and we’ll be updating you as soon as anything changes
Our server provider currently investigating network connectivity issues in our Atlanta data center. We’re in contact with them continuously and will have updates for you soon.
January 1, 8:44 pm
This is a distributed DoS attack which is targeting the infrastructure in Atlanta. There is no ETA at the moment, but their engineers and leadership team members are working closely to solve this issue as soon as they can.
January 2, 3:44 am
Here’s an update from our server provider: “Our network operations and systems teams have been working non-stop for the last ~36 hours toward a resolution of the Atlanta outage. We have acquired a dedicated transit link that is now directly connected to the Linode network, and we are waiting for this transit provider to apply DDoS mitigation hardening, after which we believe that Atlanta should be restored to full service. ”
January 2, 9:23 am
Our server provider believes that they’ve closed all of the attack vectors that can lead to a DDoS taking down the entire Atlanta datacenter