Update: The broad outage across many AWS services appeared to have been resolved at approximately 16:50 UTC. “With the network device issues resolved, we are now working towards recovery of any impaired services. We will provide additional updates for impaired services within the appropriate entry in the Service Health Dashboard,” according to a message on AWS Status.
Problems within the Amazon Web Services infrastructure caused large chunks of the Internet to either load slowly or not load at all starting 12:00 ET/15:30 GMT on Dec. 7, according to data from real-time outage monitoring service DownDetector. Amazon said the problems were in the US-EAST-1 region, which refers to Amazon's data centers in Virginia, and impacted Elastic Compute Cloud (EC2), Connect, DynamoDB, Glue, Athena, Timestream, Chime and other AWS Services hosted in that region.
Network monitoring company ThousandEyes posted updates throughout the day. The screenshot from the ThousandEyes console shows that the API endpoint using the AWS API Gateway began to time-out after 10 seconds. "Corresponding with the HTTP timeouts, we see greatly increased transaction times of between 20-30 seconds, as well as transaction timeouts," the company noted.
"We also saw widespread impact to Amazon’s EC2 service across multiple regions, including in the U.S., Europe, and APJC, although the user impact varied depending on user IP address. Amazon’s S3 service also appeared to be impacted. Both of these services are dependencies for many non-Amazon apps and services, so collateral impacts may be broad," ThousandEyes said.
"The root cause of this issue is an impairment of several network devices," Amazon said in an update, and noted that recovery is being impeded by the fact that the outage impacted Amazon's own monitoring and incident response tools. AWS customers may be unable to login using root login credentials, the company said in an update, and recommended "using IAM Users or Roles for authentication."