Folks, how many of you remember the good old days when the IT guy would show up at your desk with a floppy disk or CD, bringing in the latest patches? It was the perfect time for a fun chit chat while the updates did their thing on the screen. One of those days, amid the usual laughter and tech talk, we got to hear about an unfortunate aircraft disaster in history which impacted us deeply.
The Tenerife Airport Disaster
The Tenerife airport disaster took place on March 27, 1977, when two Boeing 747 passenger jets collided on the runway at Los Rodeos Airport on the Spanish island of Tenerife. The collision happened as KLM Flight 4805 began its takeoff in dense fog while Pan Am Flight 1736 was still on the runway. The impact and subsequent fire resulted in the deaths of all 248 people on the KLM plane and 335 of the 396 people on the Pan Am plane, leaving only 61 survivors in the front section of the latter aircraft. With a total of 583 fatalities, this disaster remains the 3rd most deadliest accident in aviation history.
The most intriguing aspect of this unfortunate airline disaster incident is the events which lead to this incident. Let’s look at them one by one in detail.
Event 1 : The Diversion
On March 27, 1977, a terrorist bombing at Gran Canaria Airport forced multiple flights, including KLM Flight 4805 and Pan Am Flight 1736, to be diverted to Los Rodeos Airport on the Spanish island of Tenerife.
Event Probability: 1/1000
Event2: The Capacity Limitation
Los Rodeos was a regional airport that couldn't easily handle the traffic diverted from Gran Canaria, including five large airliners. With only one runway and one major taxiway, the diverted planes had to park on the taxiway, making it unusable for taxiing.
Event Probability: 1/100
Event 3: The Fog
As the diverted flights awaited clearance to return to Gran Canaria, dense fog began to settle over Los Rodeos Airport. The heavy fog significantly reduced visibility, making it challenging for pilots and air traffic controllers to see the runways and taxiways clearly.
Event Probability: 1/100
Event 4: Miscommunication and Language Barriers
With visibility severely compromised, clear and precise communication became crucial. However, a series of miscommunications and language barriers between the flight crews and air traffic control led to confusion about taxiing instructions and takeoff clearance. the crew mistakenly assumed they had received clearance from air traffic control. Despite the lack of explicit permission, Captain of KLM flight initiated the takeoff roll, believing the runway was clear.
Event Probability: 1/10
Event 5: Runway Confusion
At the same time, Pan Am Flight 1736, piloted by Captain was instructed to follow the KLM aircraft and exit the runway. Due to the poor visibility and unclear signage, the Pan Am crew missed the designated exit and continued down the runway, unaware of the KLM plane's intentions.
Event Probability: 1/100
Event 6: The Collision
As KLM Flight 4805 accelerated for takeoff, the Pan Am crew suddenly saw the KLM aircraft emerging from the fog, heading straight toward them. In a desperate attempt to avoid a collision, the Pan Am crew tried to turn sharply off the runway, but it was too late. The KLM aircraft collided with the Pan Am plane, resulting in a catastrophic explosion and fire. The exact timing of both planes being on the runway at the same moment was the result of all previous delays and errors compounding into this fatal incident.
Event Probability: 1/1000
Combined Probability
1/1000 * 1/100 * 1/100 * 1/10 * 1/100 * 1/1000
= 1 / 10000000000000
This clearly illustrates how unfortunate and rare this incident is (1 in 10 trillion), yet it occurred in our times and was able to claim 583 precious lives.
It has ingrained in us the importance of addressing risks as individual occurrences rather than assessing them on combined probabilities when assessing risks from a cost perspective.
But to this day, in modern risk assessment world we do tend to make decisions based on combined probability than individual incident occurrence reduction.
Crowd Strike’s on Microsoft
During the second week of July 2024, we faced another incident that is still causing widespread inconvenience and mayhem at the time of writing this blog.
The Incident
The CrowdStrike and Microsoft incident began with a CrowdStrike update that caused compatibility issues with Windows, leading to widespread system crashes and blue screen errors (BSOD).
The Impact
This incident had a great impact at global level due to the widespread adaptability of Windows + Crowd Strike in modern critical infrastructure such as,
Aviation
Healthcare
Border and Immigration Services
Manufacturing
Emergency Response Systems
At this moment, it is also important to understand the events which lead to this incident one by one,
Event 1 : Windows In Critical Infrastructure
Windows is widely adopted in critical sectors such as healthcare, finance, and aviation due to its reliability and comprehensive support and extensive monopoly
Event Probability: 1/2
Event 2: Crowd Strike in Windows
Crowd Strike's advanced threat detection and response capabilities have made it a preferred choice in the cybersecurity market, particularly in protecting endpoints.
Event Probability: 1/3
Event 3: Windows Update / 3rd party Update Compatibility Issues
Although compatibility issues between updates from different vendors are less frequent but still plausible due to the complexity of software environments.
Event Probability: 1/100
Event 4: Blue Screen Errors (BSOD) due to Updates
The specific occurrence of BSOD due to update conflicts is relatively rare but possible given the right circumstances.
Event Probability: 1/1000
Combined Probability
1/2 * 1/3* 1/100 * 1/1000
= 1 / 600000
Interesting there is nearly a 1 in million chance that this could become an occurrence
The Mitigation
Slow is smooth, and smooth is fast
Small is good, and good is efficient
In our opinion , the steps one could take are,
Minimize the risks individually than writing them off collectively
We are thankful that there was no major significant human cost due to this incident, but it is important to also remember that Nature DOES NOT repeat lessons in a kind manner all the time.