An Analysis of a Global Cloud Computing System Failure

The paper digs into the true root causes of Skype’s global outage last Christmas focusing on which mechanisms, tools, operational and engineering practices could have prevented such a failure from escalating to a global outage or from happening altogether. It derives 11 practical lessons how to build reliable clouds and other types of large-scale systems (including those built on cloud platforms). Since the first version of this paper was written another spectacular regional outage of Amazon EC2 cloud took place. After reading Amazon’s postmortem, it appears that following the above guidelines would have prevented or contained that failure as well.

Senior ManagerArchitecture Innovation of Accenture   

Author`s Bio: 

Alex brings over two decades of technology leadership experience. His interests include: Cloud, Internet-scale computing, dependable systems and SOA. Alex built his first global cloud last century, delivered Amazon’s Auto Scaling and launched several Cloud 2.0 initiatives. He is now working on a system designed for 1.2 Billion users.

Short URL: http://vertical-cloud.com/?p=3464

Posted by on Mar 27 2012. Filed under Administrator, Architecture, Availability, Engineer, Enterprise, Featured, Information Technology, Intermediate, Large, Manager, Mid-Sized, Operations, Performance, Public Clouds, Reliability, Services, Testing, Transformation. You can follow any responses to this entry through the RSS 2.0. You can leave a response or trackback to this entry
cloud computing conference 2012
cloud computing conference 2013, cloud slam, santa clara, california cloud event, ibm, sponsor cloud computing conference 2014, cloud slam, san francisco, june 2014, california cloud event, dell, sponsor cloud computing conference 2013

Recently Commented

  • stan: The cause cannot always be the company; instead, it must also be managers’ pursuit of their own values within...
  • Darryl Ragantesi, guest: Darn, Interviewer could not get to the guts into what Mike is into. It’s all high...
  • Steven Woodward: As a leader at several Cloud SDOs I appreciate the efforts that CISCO puts forward in several...
  • Steven Woodward: As a leader at several Cloud SDOs I appreciate the efforts that CISCO puts forward in several...
  • Sridhar Somaraju: Very high level, you should articulate more specific benefits related to retail, corporate and...

Get weekly updates

Receive latest cloud news in your inbox