The Psychology Behind Alarm Response: Insights from Human Factors

Alarm systems are vital to process control, but poorly designed alarms can compromise operator performance. We explore the complexities of alarm management and highlights how human factors can optimise control system design.

Control systems in highly automated plant are complex. Often the operation of entire process plans will be managed from a central control room by one or two control room operators with limited on-plant intervention. 

Alarms are a key component of process control. Operators oversee multiple separate process areas, often from a single distributed control system (DCS). It is impossible for an operator to have visibility of all process parameters at one time and well design process control and alarm systems support the operator by providing early indication of process changes requiring action.

Alarm management is a complex topic. Poorly designed alarm system can undermine operator response. Weaknesses in alarm management have been attributed to many incidents in the high hazard sectors including Three-Mile island, Texas city and Milford Haven. 

It is essential that high hazard sites operate with confidence that, when a high-criticality alarm arises, those charged with the task of responding to the alarm can indeed do so. This confidence is particularly important at times of high workload or elevated alarm levels, for example during a serious plant upset. Failure to ascertain whether a reliable operator response is probable undermines the foundations upon which the entire alarm system is based. 

However, providing this verification can be difficult. The number of variables associated with the effective design and presentation of an alarm can be significant. In short, there are often many alarms to assess and many factors to consider for each alarm.

Alarm system performance metrics – are all relevant problem metrics being addressed?

Often, whenever alarm management is being discussed, the conversation quickly shifts to alarm metrics. This involves trying to understand the frequency with which alarms arise. There are a range of important metrics to consider. However, the attention-grabbing metric is often average alarm rate. This is clearly important as it provides an indication of overall operator workload. A high average alarm rate means that operators will spend a disproportionate amount of time simply responding to process alarms throughout the shift. Not only is this stressful and tiring, but this takes attention away from proactive control of the system and efforts to keep process control within healthy control margins.

However, often companies will focus on this single number as an improvement initiative when alarm metrics point to separate, or (sometimes) worse problems with respect to alarm system performance. For example, high alarm rates during alarm ‘floods’ which arise during a process upset, or large numbers of standing alarms (alarms that have been acknowledged but which remain active).

These present different types of challenge to operators. High numbers of alarms in a ‘flood’ can often become unmanageable, overwhelming the capacity of the operator to identify the cause of the initial upset and respond as required. Meanwhile, high numbers of standing alarms can add to the visual ‘clutter’ of a control system (populating the alarm list and being permanently visible on the control system mimics). This can ‘mask’ important alarm signals by making information on the Human-Machine Interface (HMI) more difficult to process.

Looking beyond the metrics – human interaction with the system

From an HF perspective, any improvement strategy needs to focus on the problem in its entirety. Poor alarm management has been implicated in many high-profile disasters. Whilst excessive alarm load was recognised as an important factor in each of these incidents, other inadequacies relating to alarm design, presentation and management were also identified. For example, the investigation into the Longford Gas plant explosion found that the plant was habitually run beyond alarm set points, whilst the Texaco Milford Haven investigation identified poor prioritisation and delayed alarm response as contributory factors.

Focusing solely on metrics provides limited insight into how operators interact with a DCS to respond to an alarm and whether, and where, the operator encounters any difficulty in doing so. 

Weaknesses in alarm system design which we have seen in the course of our work include:

Sometimes these issues can be dismissed by senior management as operators have ‘learned’ to cope with poor alarm systems over time and develop strategies for doing so. However, this often under-estimates the impact poor systems can have on operators who must sit in front of these systems for up to 12-hours at a time. It is easy to sometimes lose sight of the criticality of this position and the fact that, on a bad day, a sub-standard HMI could easily set these operators up to fail. Operators having to deal with poor control systems will often describe how stressful and mentally tiring their shifts can be.

The contrast can be stark when exposed to a well-designed, optimised control and alarm system. There are marked reductions in operator workload – this can be seen in terms of how operators are navigating the system, identifying process deviations in advance of alarms and making proactive control changes. It’s not just the low alarm rates that strike you, it’s the way the alarm information is presented – alarms are quicker to identify and diagnose, with response requiring less effort. 

Human Factors is intrinsic to good control system and interface design. Industry guidance such as EEMUA 191 and BS EN  62682 are excellent sources of information and HF principles feature heavily within this guidance. For good reason, this guidance relates to all aspects of alarm system management and wider organisational arrangements which should be in place to support the design, maintenance and improvement of alarm systems (e.g. how an alarm should be presented within the DCS, and what information and functionality should be available to support operators in navigating to the required controls to execute a response). 

Alarm response and situational awareness – a method of assessment

Human Reliability Associates developed an assessment tool, and associated guidance, to support assessment of the HF aspects of alarm system design. This is based on EEMUA 191 and is intended to reliably and rapidly analyse critical alarms against the alarm system design principles. The intention of the tool, called the Alarm Review Tool (ART), was to examine all HF aspects of alarm response from signal presentation, availability of DCS information for diagnosis, to execution of response. 

The objective was to provide a technique to assess how the alarm and control system maximises the situational awareness of those tasked with responding to alarms. Situational awareness is the term commonly used to describe how people perceive, understand and respond to the situation around them.

To achieve this objective, the process organises the guidance presented in EEMAU 191 according to a simple information processing framework (see, Rasmussen, 1986). Such a model describes how operators interacting with a HMI make sense of unfolding events around them. It proposes that a stimulus is identified (perceived), decoded (diagnosed), computations are made (plans developed) which then prompt an output (purposeful action). 

The Psychology Behind Alarm Response: Stages of alarm response by an operator
Stages of alarm response by an operator

The image above shows the stages of alarm response by an operator. When an alarm signal arises, the operator must identify that signal, diagnose the alarm cause and location, determine the appropriate response (sometimes amongst alternatives) then execute the response. A well-designed alarm system should support the operator during each discrete stage of alarm response. 

The tool provides questions against which response to an alarm can be determined for each of these phases. Depending on the answers provided, the tool provides suggestions for modification or improvement of the control system to support more reliable operator response. 

The Psychology Behind Alarm Response: Reviewing the usability of an alarm using ART
Reviewing the usability of an alarm using ART

Failure to verify the reliability of operator response to critical alarms risks missing serious alarm system deficiencies which may inhibit successful operator response. Alarm system metrics, when used in isolation, provide limited insight into the types of alarm design issues which present challenges to the operator during response to individual alarms.

We developed this tool into industry guidance for the Energy Institute – Guidance for optimising operator plant situational awareness by rationalising control room alarms. This is not intended to replace existing alarm management guidance, e.g.  EEMUA 191. This remains seminal guidance for this topic. However, the EI guidance aims to distil the key HF guidance into a method against which critical alarms can be assessed. It can also support design projects by clearly outlining important HF and control system design principles which should be incorporated during process design and development.

Conclusions  

Alarm management is a complex safety management topic and excellent industry guidance (e.g. EEMUA 191 and BS EN 62682) exists to support different aspects of the topic. It is important also to recognise how closely many of the alarm management issues raised here are associated with more general aspects of HMI design (EEMUA 201 is another excellent guidance source recommended for this area). 

Achieving sustained improvements to poorly functioning alarm systems requires significant commitment. It can be complex and time-consuming and requires in input of operators, control system, process and HF specialists.

However, it is important, when both designing an alarm system and embarking on a process of alarm system improvement, never to lose sight of the operator at the end of the alarm signal – specifically how a critical alarm presents to them and how easy it is for them to enact a response.

Alarm metrics are important. It goes without saying that healthy alarm rates will maximise the chance of successful operator response to alarms. However, it is important to develop an improvement strategy that reflects the totality of the problems that metrics are highlighting. 

Similarly, to maximise the success of response to each alarm, it is vital that control system design principles, that align with good practice, are developed and applied. This is of even greater importance when developing new projects, where there is the opportunity to get the alarm system ‘right’ from the outset.


If you would like to see the Alarm Review Tool in action, get in touch with us at [email protected]

At HRA, we offer consultancy services to support Human Factors in various safety-critical sectors. To learn more about how we can support you, click here.