- “An excellent strategy in one context can be the worst and another”
- “Attentional checking and action planning are tightly connected”
- “Poor interface and display system designs can to degrade control of attention and monitoring by creating tunnel vision or getting lost effects”
- “Human performance can improve if one frees up the processing capacity of the human in the loop”
This is a paper by De Keyser and Woods from Systems Reliability Assessment: Proceedings of the Ispra Course in which they discuss the problem of operators diagnosing complex problems fixating in various ways, namely reasoning and behavior.
Primarily, they go about how fixation occurs, with a few suggestions on how to improve. Having an idea of what leads to fixation and what it looks can be useful though.
They define several different patterns of fixation:
- “Everything but that” - when one seems to have a lot of different theories but never the right one. From the outside this looks like jumping from one action to another and never seeing results. This is the same as what Rasmussen calls “thematic vagabonding”.
- “This and nothing else” - when someone is stuck on just one strategy or one goal and is set in their ways. From the outside this is the easiest one to notice because of how often this person will repeat their actions or statements without result. There’s cases with this type of fixation where they will note the lack of results, but still won’t change their strategy. They have some sort of hypothesis about the cause or about what the correct responses are, and can’t move off on to other ones.
- “Everything is okay” - an operator doesn’t react to change. This is the case or can be the case even if there a lot of cues and evidence that something is wrong, they will still ignore them. This can occur because they may not be wrong in all cases. For example in cases of false alarms, especially false alarms happening very often, it can make sense to ignore them. This can be a form of alarm fatigue.
Fixation events are heavily influenced by “attentional checking”, when we’re engaged and purposefully brining our attention to something. Of course, we have limited resources, so we must spend our attention in limited places. We cannot simply just intake all of the possible information in complex situations, so we must pick some subset. We then compare what we’re seeing to our mental model of the system or situation. It’s a mismatch here between the observed state and our model of the normal state that will cause us to intervene.
There are three different parts of attentional checking:
Sensitivity - The ability of an investigator to see the differences between different states. Typically the more experienced someone is the better they’re able to discriminate very fine details in different state changes. You may have observe this in others especially if you are not a car person when your mechanic can hear the sound of an engine and note that it’s different and maybe even know the problem from that.
Degree of extension - This is essentially how large you spread your attention. It’s influenced by a lot of things like how high your mental workload is. The amount of information that you can take in can decrease even as risk increases. The organization and its structure may also come in to play here. Research has shown that if you have a really rigid organizational structure you may be prone to just restricting your attention to the area you can control.
Depth of checking - This is all about what level of abstraction you are observing. If you are taking action at a very low level, you might be tempted to look for results at only that very low level and potentially miss what’s happening at the higher level.
For example, if you make a configuration file change you may restart a daemon to see if that change actually took, but may not zoom out to look at the broader system. That mismatch between the abstraction levels means that the chance to see this cue will be missed. This isn’t always low to high, it can be high to low. Any sort of difference in the level of abstraction being checked can be prone to this.
So why do we do this? We do these things so that we can “ economically” reason in cases where we can’t take in everything. But at some point, we need to do some sort of checking again as the system is changing over time, in order to be effective. This reassessment is what would cause us to rethink our course of action.
Other fixation reasons
Some fixation can occur because we just don’t have the knowledge to recognize or respond to a given situation. When we don’t know we are then prone to making mistakes. This manifests a few different ways:
The functional behavior of the system is not well understood - This is where if we don’t understand all the links all the variables, and how they interact. Training can often correct this
No understanding fo the relationship between environment and process - They say this hasn’t more to do with experienced in training, though system simulation might help.
Not enough understand of the different states possible in a system or its failure modes - This is especially difficult for people new to the system or process as incidents tend to produce abnormal states.
“The process of action is not well controlled”. This happens if responder understands what’s occurred but doesn’t know how to fix it. This can involve things like missing or not knowing the side effects of different actions or not knowing about the six date of the system over time.
“Gaps in knowledge do not lead necessary to fixation errors in most cases. If an operator has a good feedback on the effects of his or her actions and if there very visible traces the evolution of the system the external conditions to structure his experience are present”
De Keyser and Woods tell us that these gaps in knowledge don’t usually lead to fixation errors in a lot of cases if a responder has the ability to get feedback from a system and see visible traces of how the system is evolving and changing and what external effects are present.
This is something to keep in mind, similar to last week, as we develop observability systems.
Fixation can also occur because of the problem of “inert knowledge”. This is where you know something in one case or you know, the underlying fact but you’re not able to call to mind or apply it in a different situation. Or perhaps even realize it’s relevant. This can especially current and difficult incidents because a responder may not have had to put the pieces together in this way.
Responders can also get stuck because they don’t have a way to formulate the problem. This can be the result of organizational goal conflicts. To resolve them, this often requires a compromise between goals and shifting priorities. A big part of problem formulation is deciding what hypotheses make sense and are worth pursuing. This is especially important during periods of limited resources and time pressure. Fixation errors here can be the result of not expanding the list of potential hypotheses that would fit the evidence.
The authors cite a study by Paul Johnson where “experienced medical diagnostician’s performance on a problem prone to fixation. One class of errors occurred and hypothesis generation and included failure to call the mind the correct alternative”
There’s also the problem of stopping early as we generate our hypotheses especially if we find a highly plausible hypotheses early on. The previously mentioned Johnson study “also found revision errors in the performance of experience diagnosticians where they were unable to shift from a highly plausible but incorrect initial hypothesis to the actually correct hypothesis.” This tells us that it’s not a simple matter of knowing more or more expertise, that these problems can happen to us regardless of our skill level.
Fixation errors can also occur depending on how it is that we assess a situation and begin to diagnose it. There’s a possibility we can create situational assessments that resist fixation, but in order to do that we need to reason about multiple views. If we fall prey to confirmation bias, seeking only information that confirms what we already thing, when gathering evidence we can get stuck as well. There’s also the problem as Rasmussen puts it “over reliance on familiar processing shortcuts”. Being aware of these pitfalls can help us avoid them or at least recognize them in ourselves.
These revision failures can also happen because of multiple explanations that can account for some part of what we’re seeing. Because of this, some new piece of evidence might be dismissed as belonging to some other part, which will keep us on the current track instead of reconsidering.
So how can we prevent and fix these fixation errors? There’s three approaches:
- Change the world and the environment that these actions are taking place in so that it’s not is prone to encouraging fixations.
- We could also improve how it is that we as individuals or team problem solve so that our situational assessment skills are better.
- We can simply recognize that in these complex, event-driven incidents, that we’re going to make some mistakes and instead focus on recovering from fixations before they can cause damage.
Changing the environment is one approach. At first this might seem strange how can we change the environment of the problem. We can’t prevent the problem of course but we do have some ability to control how we experience it. The authors tell us that “ the variability of the environment also increases the risk of errors”. I think standardizing some things like instant response structure and some things like that would help reduce the variability of the environment. “ A good design of the systems or process itself and efficient flow of information may prevent many errors”. We could use more consistent observability tools perhaps across different products or areas to help reduce this variability as well and maintain that efficient flow of information.
Often, training is cited as the solution for some of these problems, but the authors acknowledge that “the inadequacy of training sessions for developing operational skill reflects ignorance of the real conditions of work in complex environments”.
Cognitive ergonomics in design
Next, we can look at how it is that our systems can work to support us. The authors call this “ cognitive ergonomics”. They cite the absence of feedback as something that can reduce the chance to understand system behavior. If you can’t see the results of your actions very easily, it’s hard to know what the state of the system is and what actions are correct.
Monitoring and field of attention
This is especially relevant to us in software as we tend to build and consume different dashboards. “Online representations of the world can help or hinder problem solvers to recognize what information or strategies are relevant to the problem at hand”.
The authors all tell us that our tools can either work to help us or hurt us: “Poor interface and display system designs can to degrade control of attention and monitoring by creating tunnel vision or getting lost effects”
One way we can do better about this is to integrate more data into our views. Often our representations can be modeled off of hardware gauges, where this one sensor for one display like odometers.
There’s always a chance that a responder operator can have so much faith in their judgment they ignore all other signals. Typically because of some sort of lack of faith reliability in the instruments themselves or the machines. This can happen when there is a lot of false alarms especially coupled with when it’s very difficult to get more evidence to know whether or not that monitor or alarm is right.
Having a high number of false alarms as well can cause poor performance as people will learn to work around them. Ideally we would reduce false alarm rates; potentially by indicating some notion of how likely this alarm is. Additionally that sort of integrated display we talked about that could show different indicators for the same part of the system state that it was displaying could help. This one is good when remember as our dashboard tools often do this already.
Further, we can help by developing these tools dashboard or just cognitive aids: “human performance can improve if one frees up the processing capacity of the human in the loop”.
An example of fixation
De Keyser closes by giving us an analysis of a fixation incident in this case from water treatment.
An operator at a water treatment plant showed up to his shift at night. Before he got there the flow rate between tanks was almost doubled. He then faced a job that was unfamiliar and kind of rare, to move water from one tank and then to another until they’d all been filled.
An alarm went off, indicating that the tanks began to overfill. The operator didn’t react them, but was able to say in hindsight that he had noticed them. This continued for about six hours.
He ignored it, because the alarm didn’t actually mean the tank was full, it meant it was only 80% full so he saw no need to respond when he heard the alarm. This is a good example of why we need actionable alerts. If an alert is not actionable it likely shouldn’t be an alert.
- Fixation can occur because of the way we formulate problems or develop hypotheses
- Its possible to change the analysis environment to discourage fixation
- Feedback in automation can help reduce fixation
- System design can help or hurt
- Changing problem solving strategies can help
- Reducing false alarms can help people select a strategy that doesn’t involve ignoring alerts
- Setting up computer systems that can people can offload work to can help
- Dashboards that integrate more data instead of having a 1:1 relationship between a signal and gauge can help give a better view of the underlying system