The digits zero and one are the natural language of computers. Almost anything can be represented inside a computer's memory simply by arranging zeros and ones into the proper sequence. However, because most computer memory consists of nothing more than a microscopic magnetic charge, these binary digits (bits) can also be susceptible to the conditions of their physical environment.
Our bits are stored inside increasingly compact devices that function outside in the harsh environment of Planet Earth. Many of our devices are routinely subjected to extremes in temperature, in addition to hazards such as cosmic rays, which strike the Earth's surface as often as 10,000 times per square meter, per second. Under adverse conditions such as these, a one occasionally and inadvertently flips state to become a zero, or vice versa.
For us, the common Internet users, bit errors can have a profound effect on our Internet traffic. For example, through the flip of a single bit, the domain name "s.ytimg.com" can become the domain name "snytimg.com. When this happens, Internet traffic originally destined for YouTube is sent to a completely different address. That's because the letter n from this example is only one binary digit different from the dot character.
Other letters share a similar relationship. The letter o and the forward slash (/) differ by only one binary digit, as do the letter c and the character #. These characters can also cause mischief in the routing of Internet traffic. There is even a word to describe the registration of these bit error domains: bitsquatting. Misdirecting Internet traffic to malicious bitsquatted domains has serious implications for computer security. However, bit errors can also have terrible, even life threatening, consequences.
Consider a 2005 advisory from St. Jude Medical in Mississauga, Ontario, to doctors who surgically implanted one of five models of implantable cardioverter defibrillators (ICDs). These devices use electric shocks to stimulate the heart muscle and help prevent sudden cardiac arrest. According to the advisory, cosmic radiation-induced bit flips affecting ICD memory chips "can trigger a temporary loss of pacing function and permanent loss of defibrillation support." Among the 36,000 installed devices, there were 60 reported cases of the anomaly, the advisory said, resulting in a significant failure rate of 0.17%.
Fasten your seat belt
In Australia in 2008, Qantas Flight QF72 was carrying more than 300 passengers at cruising altitude when it suddenly nose dived 650 feet. The pilots were able to bring the plane back to its original altitude before it suddenly plunged again, this time falling 400 feet. Some passengers were thrown out of their seats, and some were ejected out of their seatbelts, according to a 313-page report by the Australian Transport Safety Bureau (ATSB). Some passengers were flung so violently that the impact damaged the aircraft cabin ceiling.
The ATSB investigation was able to eliminate almost all the potential causes of failure except one -- an airplane computer bit error caused by cosmic radiation. According to the ATSB report, "The CPUmodules for the two affected units did not have error detection and correction (EDAC)."
Bit errors were also the focus of attention in a series of highly publicized lawsuits against Toyota Motor Corp. over a flaw in the electronic throttle control system that caused cars to accelerate out of control spontaneously. Last fall, the company settled a lawsuit in Oklahoma City after a jury returned a $3 million verdict in favor of two victims of a crash (one of whom died). An expert witness testified that a single flipped bit in the car's computer memory, perhaps as a result of cosmic radiation, could cause runaway acceleration, and that the working memory in the throttle system did not possess EDAC. Just this week, Toyota reached a $1.2 billion settlement with the US Department of Justice after a criminal probe of the carmaker's safety record related to unintended acceleration.
As we connect more and more with so-called smart devices, it's important to be mindful of potential consequences that may not be completely obvious from the start. Gartner predicts that, by the year 2020, there will be more than 26 billion Internet-connected "things" -- not including PCs, tablets, or smartphones. These things will range from smart home climate controllers and door locks to cloud-connected picture frames -- even smart Crock-Pots and toilets. They are all susceptible to bit errors, because the cost of adding error-checking and correcting memory inflates the base cost of an item beyond what consumers are willing to pay.
A 2009 study conducted at one of Google's datacenters found the rate of these DRAM errors in the wild to average anywhere from 25,000 to 75,000 FIT (failures in time per billion hours of operation) per Mbit. If there are 26 billion things connected to the Internet, then by 2020, every hour there will be somewhere between 650,000 and 1,950,000 errors per hour per Mbit. A modest installation of only 128 Megabytes of RAM contains 1,024 Megabits. Thus we can expect to see, minimally, anwhere from 665.6 million to 1.996 billion errors per hour across the entire Internet of Things.
These errors will undoubtedly affect us all. Let's chat about how in the comments.