How a typo at Amazon took down part of the internet

    354

    You might recall when a disruption to Amazon Web Services (AWS) on Tuesday brought down a slew of websites for users across the East Coast. Now Amazon is letting us know the cause and, as it turns out, the massive cloud-computing network is vulnerable to simple human error.

    In a blog post, the company explained the Amazon Simple Storage Service (S3) team was debugging an issue to their billing system when a team member executed a command to remove a small number of servers.

    “Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended,” the blog post wrote.

    As a result, each of the affected systems required a full restart. During that time, S3 wasn’t able to service requests, causing websites relying on those servers to slow down or go completely offline. They included sites like Netflix, Quora, Slack, Coursera, Sqaure, Twillio, and Medium. Amazon itself acknowledged it was experiencing issues that hindered AWS’s ability to show the errors on its health dashboard.

    In response to the error, Amazon outlined some safeguards it’s putting in place to avoid this situation again. For instance, the tool the employee used to remove server capacity will now do so more slowly. The company had also planned to repartition the subsystem into smaller cells later this year to make debugging more manageable. They’re now reprioritizing it to happen now.

    Amazon also issued an apology for the impact the outage had on businesses. “Finally, we want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.”

    System outage aside, AWS is continuing to play an ever larger role in Amazon’s finances and business offerings. In Q4 2016, the public cloud provider raked in $926 million in operating income and $3.53 billion in revenue and made up 8 percent of Amazon’s total revenue for the quarter.

    Avatar
    Kelly Paik writes about science and technology for Fanvive. When she's not catching up on the latest innovations, she uses her free-time painting and roaming to places with languages she can't speak. Because she rather enjoys fumbling through cities and picking things on the menu through a process of eeny meeny miny moe.