In this chapter Woods and Patterson explain how events like incidents can cause a drastic increase in the need to coordinate with others as well as an increase in the amount of cognitive function that we as responders must bring to bear.
Many of us have probably experienced this already. But the authors use this as a lens to help understand why there can be a gap in technology that is predicted two help us and how it actually plays out in practice, often causing more trouble and more or at least different failure modes than the things it was replacing.
The authors begin by defining “the escalation principle”. It states that when the situations we experience move away from the normal, everyday problems towards the chaotic and abnormal, we increasingly have to coordinate with more people and also we have to think harder about the problem we’re facing.
The more the underlying system is disturbed, the more difficult the problem can be, the faster we need to move, the more we need to coordinate with others and take in more information.
On top of this, we often have to use these technology systems whether that some form of automation some modeling product or something else. If they were not designed with this sort of escalation in mind then it can be that at this point where we need the most that they can actually cause us the most burden.
Draw from Wood’s previous work, there is a multistep process that occurs during an event:
- The original fault occurs, effects cascade through the system
- While that’s happening, responders are trying to deal with it. Cognitive demand increases.
- At the same time, because the fault is propagating through the system, experts may be brought in, so now we need to coordinate more.
This is a dynamic process: it is continuing to change over time both as the problems are felt throughout the system, but also as responders take action and change the incident itself.
These are the moments where we need our tools the most, but also where they can hamper us the most. As the authors put it: “Difficulties arise because interacting with the technological devices is a source of workload as well as a potential source of support”
This is the sort of thing that we want to avoid when we’re building tools for ourselves, for teams, for anyone. We want to make sure that our tools can help people when they are at their busiest when they are facing the most data being thrown at them, not just when things are quiet.
Of course, people are coping in their own way as this happens, as tools fail to take this into account, they are either likely changing their strategies or trying to minimize communication or in some cases just ignoring a particular system altogether so that they can focus on other things.
The authors use a space shuttle launch as an example of how various demands can escalate. It starts after liftoff: a controller noticed some hydraulic fuel levels low in an auxiliary power unit (APU). They figured that meant a hydraulic leak. Which caused them to start to ask questions of themselves and of the system. Did they have to do something right away? How bad is the leak? Could it endanger the astronauts? What was the line at which they said they would abort for this problem?
The controller figures out the leak was slow enough that it wasn’t over the abort threshold, so they are able to proceed. As the mission continues, at the same time, another group is informing the flight director and other people in mission control of this leak. The authors point out that these people are coordinating in a process that is “cognitively economical” which helps everyone stay up-to-date. They use “voice loops” that allows everyone to talk and to hear each other, and switch over multiple channels.
Next, they have to plan how they’re going to respond to this new information about this leak. How it is that they’re going to balance keeping everything safe while continuing the mission, goals that can be in conflict. Ultimately they decide to change the order in which APUs are shut down so that they can get more information on the problem and communicate this to the astronauts
All the while, there is now new communication to be done, new things to assess, new things to respond to. Experts are able to be brought in on the APUs and raise new issues and resolve others. They’re able to host meetings to decide what they need to do to change the mission and how they can calibrate everyone’s expectations.
Ultimately additional personnel were brought in and brought up to speed so they can help with all these demands. This example demonstrates how escalation can occur. It all started at something very small, noticing a drop in hydraulic fluid level, but ballooned into multiple teams, multiple meetings, multiple changes.
This escalation principle helps explain a pattern the authors had observed before. How practitioners were being impacted by technology. They call this “clumsy automation” which is a term that Weiner coined in 1989.
Clumsy automation happens when a human cannot coordinate very well with a machine and as a result, the machine or system is only really useful during periods of low demand or a calm workload.
This is often in stark contrast to how the systems were justified to be invested in and developed, that they would help remove work from busy people. Despite that the authors have found that they often create new work, force users to adopt new strategies, force them to know more, communicate even more at the very time they can least accommodate this. This opens up new risks in the system that perhaps didn’t exist before in simpler times.
“Our fascination with the possibilities afforded by automation often obscures the fact that new automated devices also create new burdens and complexities for the individuals and teams of practitioners responsible for operating, troubleshooting, and managing high-consequence systems.”
Was there ever a more apt description of things that we should watch out for when building tools?
The authors also touch on a second pattern they’ve noticed, which is that trying to make automation increasingly intelligent, even to the point of creating AI still doesn’t solve the problem. This is primarily because of how it comes up with answers. These “expert systems” didn’t behave like a normal member of the team did or like other humans might. They don’t care how busy you are or how focused you are, they may still just throw lots of data at you. They would come to you with conclusions, but you couldn’t see how they got there. This causes people to adapt, often ignoring them instead.
So why do things get built this way? The author suggests there are likely lots of reasons, but this escalation principle is this a little bit of view. Because the technology works in our lower demand scenarios when things aren’t going crazy, it’s easy to design for those scenarios. Most often that lower workload is normal, the times that we have this high escalation a relatively rare. Additionally, responders are adapting, they’re coping successfully despite these tools being burdensome. So when they are succeeding, whether that’s through ignoring the new system entirely, or something else, it can cover up this burden.
So what can we do about it?
The authors give us a few tips about things that we need to make sure we’re designing for and considering. Specifically:
- How is new knowledge going to be in integrated during an escalating situation?
- How are more resources going to be obtained in the situation to help deal with these new demands?
- How are those people brought in going to get up to speed quickly?
“The concept of escalation is not simply about problems, demands on cognition or on collaboration, or technological artifacts. Rather, it captures a dynamic interplay between all these factors ”