Bigger, Better Brandwatch: James Stanier on Flexible Working and a Global Engineering Team
By Gemma JoyceApr 17
Published February 18th 2016
How many seconds in a minute? When the answer is 61, the internet has a meltdown.
As the Earth’s rotation is gradually slowing down, a leap second needs to be added every now and then to keep the clocks aligned with the stars.
The problem is that every time it has happened in recent memory, large sections of the internet have not been able to cope and have gone offline.
Reddit, Twitter, LinkedIn and Slack are just some of the sites that have been affected in the last few years.
But it’s not just rogue seconds that take websites offline. Depending on where you live in the world, you may remember Twitter being mysteriously unavailable just one month ago.
Not long afterwards Instagram, Tinder and Facebook went down too.
As we increasingly rely on social media to keep in touch with friends, stay up to date with the news and find funny videos, the prospect of being cut off is enough to send us into panic mode.
The 2012 leap second took Reddit down, among other sites.
Reddit is run on open source Linux operating systems, which nearly ground to a halt after the leap second was added. The problem was a subsystem in Linux. This subsystem is used when an application is sleeping and waiting for the operating system to complete another task.
In some cases it sets an alarm clock to wake the sleeping application if the OS is taking too long.
When the leap second hit, all of these subsystems were a second ahead of the main OS. This made them ring the alarm clocks, waking up countless sleeping applications at once and overloading the machines’ CPUs.
“Almost every time we have a leap second, we find something,” Linus Torvalds, Linux’s creator, told Wired. “It’s really annoying, because it’s a classic case of code that is basically never run, and thus not tested by users under their normal conditions.”
In July 2015 several websites crashed, with Instagram, Pinterest, Amazon and Netflix going offline for 40 minutes, and thousands of others were affected too.
Initially it was thought that the leap second bug had struck again, until the real reason emerged. These websites all rely on Amazon Web Services, which suffered a brief network disruption on July 1st.
Amazon hosts a huge number of websites (11.6 million as of May 2013), so any issues with AWS impact huge portions of the net.
When Amazon realized the connectivity issue was creating such a big problem they paused all requests to their servers to bring additional capacity and restart the servers.
Basically, they had to turn it off and on again. Luckily the problem was resolved before the hour was out, meaning Instagram users were free to continue posting gorgeously filtered photos of their food before it even had the time to go cold.
January 2015 saw Facebook, Instagram and Tinder go down for about an hour, and this time the reason was disputed.
Facebook blamed the problem on ‘a change that affected our configuration systems’. Some claimed that the huge snowstorm that was pummeling the US East Coast had knocked out its servers.
One other possible explanation emerged 30 minutes into the outage, when a hacker group called Lizard Squad claimed responsibility for bringing the world’s favorite social media sites down.
While Facebook denied the group were responsible, Lizard Squad have a history of attacks, bringing down Xbox Live and Playstation Network over Christmas 2014 via a DDoS attack.
In a word: panic.
We have grown so used to being connected via social networks that their constant supply is as important as electricity and oxygen for some people. The response tends to combine outrage, fear and humor.
When Twitter was offline briefly in January 2015, users were quick to laugh at the support team who sent a tweet about the outage, which of course would not have been visible to anyone experiencing problems.
Some people use it as an opportunity to explore the real world.
Others see it as a sign of the coming apocalypse.
Next time a social network goes down, we recommend staying calm, stepping away from the computer, and trying again in 30 minutes.
Social networks hate going down for any amount of time, so you can rest assured that a lot of developers are running around in a panic while being shouted at. Before too long, the service should be up and running again, and normal life will resume.