Facebook engineering team reveals underlying causes of October 4th outage
The Engineering team at Facebook has revealed the underlying factors that led to six hours of global outage that also affected WhatApp and Instagram.
The global social networking company lost billions of dollars to the outages that started about 4.43pm (WAT) on Monday.
According to NetBlocks, which tracks internet outages and their impact, the outage had already cost the global economy about $160m per hour.
Facebook stock had gone down by about 5.5 per cent. Mark Zuckerberg, the chief executive officer of Facebook, also lost nearly $7bn since the outage began.
Facebook and its affiliates have 2.9 billion monthly active users. The shutdown is reminiscent of 2019, when Facebook shut down in its biggest outage till date (a 24-hour shut down).
For hours when navigating to Facebook, Instagram, and WhatsApp websites, a server error appeared, which indicated there was an issue with Facebook’
This has been confirmed by in a blog post ‘Update about the October 4th outage’, written by Santosh Janardhan, vice president, Engineering and Infrastructure at Facebook.
He started by apologizing “To all the people and businesses around the world who depend on us, we are sorry for the inconvenience caused by today’s outage across our platforms.
“We’ve been working as hard as we can to restore access, and our systems are now back up and running.
Janardhan wrote that the underlying cause of this outage also impacted many of the internal tools and systems “we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem”.
“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication”.
The disruption to network traffic, he said, had a cascading effect on the way Facebook’s data centers communicate, bringing its services to a halt.
“Our services are now back online and we’re actively working to fully return them to regular operations.
“We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime.
“People and businesses around the world rely on us everyday to stay connected. We understand the impact outages like these have on people’s lives, and our responsibility to keep people informed about disruptions to our services.
“We apologize to all those affected, and we’re working to understand more about what happened today so we can continue to make our infrastructure more resilient”.