Thursday, January 30, 2014

You Can’t Fix Stupid, And Other Myths of Safety Management

We’ve all seen it – employees doing things that just boggle our minds. Whether it’s knowingly violating procedures during high-risk operations or even just missing what seem like extremely obvious signs that something is wrong. Just in the last week we were at a maintenance shutdown where an incident occurred in a confined space where the entrant ignored orders from both his attendant and his own supervisor to leave the space. He subsequently quickly exited the space when his supply of fresh air cut out because his air hoses weren’t connected correctly and his escape bottles wasn’t turned on properly.

It’s a common picture, right? We’d have safe organizations if it weren’t for all these unreliable people! You just can’t fix stupid.

If there is a more ignorant phrase commonly used by safety professionals we don’t know what it is. You can’t fix stupid! We don’t blame people who say it, we’ve even said it from time to time. But that statement represents a gross and dangerous oversimplification of human behavior in complex systems. Here are a few reasons why:

1. It’s not stupid! Often when people “you can’t fix stupid” or similar phrases they are describing behavior in hindsight with only a small part of the picture. The fact of the matter is that often what we describe as “stupid” or even behaviors that we call “errors” are often only that because of the results, not the actual behaviors themselves. When people push boundaries and throw off the shackles of conformity and it leads to success we call those people innovators and entrepreneurs and we thank them for taking the initiative. When it doesn’t work out we point out how we knew better and marvel at how stupid they were to think it would work.

The point is that it’s not the behavior that’s stupid, it’s our descriptions that are stupid and irrational. We often look at the same behavior and call it something different depending on the result.

Consider this, if confined space entrant hadn’t run out of air he likely would have gotten his job done quicker. What do you suppose would have been the result?

2. You can fix it! Whether you call it stupid or not, we do have influence over human performance. Let’s just assume that the problem was that the person was extraordinarily dumb – who decided that this person was the right person to do that job? At SCM our mascot is a bulldog named Winifred. If we put Winifred in an Algebra class and she doesn’t get a passing grade whose fault is that – Winifred’s for not studying enough, or ours for putting the dog in a situation where she could not possibly succeed?

Social science research suggests that behavior is decided by a combination of three influences – individual factors (personality, intelligence, etc.), social factors (peer influences, team interactions, etc.), and environmental factors (tools and equipment, incentives, organizational culture, etc.). Organizations have influence, directly or indirectly, over all three areas. For example, in the confined space example above we can look at all three areas and ask questions such as:

How were these employees selected to do this high risk job and what training and experience do they have (individual factors)?

Why weren’t the interactions between the entrant, the attendant, and the supervisor more positive and how can we ensure that those relationships create safety rather than reduce it, such as through crew resource management training (social factors)?

What are the incentives in place for the organization and why was the respiratory protection system so prone to failure?

The bottom line here is that using phrases such as you can’t fix stupid provide an excuse for safety managers to not look deeper in the system to maximize human performance. Often when we resort to these sorts of phrases to explain behavior it is a symptom of an organization that expects human nature to conform to its system. That’s sort of like hoping that water will start to flow uphill all by itself. If instead of putting the blame on people (at any level in the organization) we look deeper at the individual, social, and environmental influences and begin to create an organization that leverages these influences to maximize human performance we may find that rather than being the weakest link in our safety system, people actually become the most reliable and important part of creating safety in our organizations.

Wednesday, January 22, 2014

Management of Change by Managing Assumptions

Research on human cognition teaches us that most of us go through life on autopilot most of the time. We don’t have the mental resources necessary to actively think through every action and decision, so we use do most things without thinking. This works most of the time because we’ve built up a lifetime of rules and mental short-cuts that allow us to operate that way safely. For example, think about driving – you don’t have to ask or think about which side the accelerator pedal will be on. It’s always on the right side, and the brake pedal is just to the left of it. The next time you get into a car to drive you’re likely not going to check where the pedals are.

Think about that for a minute – why not? How do you know that the pedals will be where you believe they are? Because that’s where the pedals have always been. You’re basing your future actions on past knowledge. And that works a large majority of the time because things like that often don’t change. So you go through life with the assumption that the pedals in your car won’t change and you’ll be safe in that assumption most of the time.

Until you’re wrong.

Obviously it’s unlikely the pedals on your car will change any time soon. However, it’s amazing when you think about how much of the time you operate based on the assumption that things are the same today as they were yesterday. And that’s where the trouble starts. When those assumptions no longer hold true we are at risk to make mistakes, and sometimes those mistakes come with extreme risk.

Take for example a project that we’re currently working on for a client. It’s a maintenance shutdown (a “turnaround”) at a chemical plant. We’ve worked with this client for years and we’ve had over a decade of annual turnarounds without a lost-time injury. A record like that is certainly something to be proud of, but like we’ve talked about in other blogs (here and here), safety is more than what doesn’t happen, so we are always trying to get the client to look at their current risks, rather than just focusing on past success. And by doing that we found some interesting things:

  1. This maintenance shutdown had to be made shorter in terms of time, but the number of jobs scheduled has not been proportionately reduced, meaning that the number of jobs scheduled per day is not significantly reduced, and increased in some cases.
  2. The engineers planning the process are very young and inexperienced, with an average tenure at the plant of about 2 years.
  3. The majority of the contractors brought in to do the project are either new contractors or contractors who have been at the plant before, but the work crews they are bringing in have never been at the plant before.
  4. The plant’s corporate safety group recently rolled out a new permitting system for all jobs, including confined space and lockout/tagout, that plant employees are still getting used to.

Individually any one of these issues is a small increase to risk. Together they lead to a cascade of assumptions that are no longer valid. A normal shutdown is successful when you have experienced crews working with experienced planners, engineers, and project managers, using safety systems they are familiar with. However, none of those conditions are valid this time.

This situation shows the value of a robust management of change process. Each of the above conditions exists as a result of changes that are relatively commonplace. When you’re dealing with commonplace the assumption is that everything is fine (because common things are usually not that risky). But the management of change processes at the plant, mixed with a healthy preoccupation with failure, identified these conditions and we are working with the plant to build in systems to show employees that their normal assumptions are invalid and to help them reduce the risks.

A key learning point in developing an effective management of change process is to manage assumptions. Here’s some best practices in managing assumptions in a management of change process:

  1. Identify what assumptions in the process, workflow, or systems are safety critical. For everything to work out as you’d like it to, what assumptions are you relying on (tools, people, environmental conditions, training, management processes, engineering processes, economic conditions, etc.) and what are the consequences if those assumptions are invalid?
  2. Once you identify those assumptions that are safety critical, identify what aspects of the assumptions you have control over and what you don’t. For example, we may not have total control over the work crews that contractors send us (although sometimes we do), we do have control over the training programs for engineers and project managers.
  3. For those areas we have control over, make sure you develop systems to ensure that what needs to happen does happen. For example, a learning from this shutdown is to build in better training systems for engineers and project managers. This program will include measurements of competency, so that those with less experience are identified and receive more training.
  4. For those areas we do not have control over, we need to build in resilience and ensure that we’re not pushing safety boundaries. For example, for this shutdown more site-specific training/orientation and supervision has been provided to contractor employees, and engineers and project-managers are looking for ways to take additional pressures off of contractors, such as scheduling pressures.

How you do it depends on the job. But when making safety and risk management decisions, ask yourself, what assumptions are you making and what if you’re wrong?

Tuesday, January 14, 2014

Unknown Unknowns – Being Safe While Dealing with Uncertainty

In the early 1980s, following major industrial accidents such as Three Mile Island, Charles Perrow proposed an interesting theory called “Normal Accident Theory.” The idea behind Normal Accident Theory is that more and more humans are operating in complex environments because of advances in technology and complex social structures. This complexity creates situations where it is difficult, if not impossible, to predict all the ways a system can fail, and if we can’t predict all the ways the system can fail then we can’t necessarily prevent all of the failures. Therefore, the theory proposes that we should essentially get used to major accidents happening (hence the name “Normal Accident Theory”) and take advantage of these accidents to learn what we can when we can about the system in a way that is virtually impossible through other means.

The theory is an interesting one, to say the least, but we here at SCM are not so pessimistic to say that accidents should be considered “normal.” (And we should say that a number of very interesting and useful theories of organizational risk management have come as a response to Normal Accident Theory, such as High Reliability Organization Theory and Resilience Engineering.) However, Normal Accident Theory does a good job of highlighting an important concept that many safety professionals often don’t directly address – uncertainty.

We all intuitively would admit that none of us knows everything there is to know about the operations that we oversee. It’s not necessarily that we can’t know everything in all cases (although that’s sometimes the case), we just don’t have the time and/or resources. This is the source of our uncertainty. Our lives, our organizations, our worlds are far too complex. We sometimes think we understand everything, but most of the time this is a gross oversimplification. Often even knowing all the constituent parts within a system is not good enough. You must understand how each component within the system relates to other components and how that relationship affects other components, etc.

Can uncertainty lead to risk? Well, according to the International Organization of Standardization (ISO) it absolutely does, as the definition of risk according to ISO is “the effect of uncertainty.” Even if you don’t agree with this definition (many don’t) you do have to admit that failure to address uncertainty can lead to increased risk. For example, one of the organizational issues in the NASA Columbia disaster was how the organization dealt with anomalies, such as foam breaking off of the orbiter, which was the direct cause of the accident. These anomalies were sources of significant uncertainty and the failure to account for that uncertainty led to increased unmitigated risk, ultimately leading to disaster.

Many organizations we have looked at have highly developed safety management systems built upon a solid foundation of prevention, which is extremely important. However, a big flaw in many of these management systems is that they fail to account for uncertainty. They account for all the things they know about through their prevention programs. But don’t think about those unknown unknowns, the things they don’t know about.

The problem, of course, becomes one of self-delusion - an organization may point to it’s prevention programs and say that it is operating safely, when in fact the unknown unknowns are pushing the organization dangerously close to the edge of a cliff and a major incident is just around the corner.

So how do we deal with uncertainty in our organizations? We must build resilience into our management systems. Organizations and people have a tendency to push boundaries but we must ensure that we don’t let them push too far and always build in extra capacity to be able to adapt to uncertainty. Some ideas for engineering resilience into our managements systems include:
  • Ensuring robust reporting systems are in place to identify weak signals, which may be the only signals we get of impending danger. 
  • Training employees not only how to handle normal operations, but how to handle surprises in the workplace (which includes, but is more than just emergency response training). Your employees will naturally adapt to circumstances, but maybe not in the way you would like them to unless you give them the tools to do so.
  • Ensuring that prevention through design and management of change processes include allowance for that uncertainty. Design in safety factors or buffers and ensure that operators understand why those safety factors or buffers are in place and the consequences for violating them.
  • Consider uncertainty in your risk assessment processes. When assessing an operation, don’t only consider the things that you know are hazards. Are there things you don’t know or aren’t sure of? How do/could they influence the risks you may face?

There are more ideas that you can incorporate, and we certainly want to avoid going around being proverbial Chicken Little, but we also need to understand that there is always a level of uncertainty in our operations. If we build resilience into ours system we may be able to deal with the risks from that uncertainty.

Tuesday, January 7, 2014

STOP WORK! – The Problem With Safety Being Everyone’s Responsibility

We go into a lot of organizations and one of the things we hear frequently is that everyone at the organization is responsible for safety. By this they typically mean that each person should feel empowered to take actions to ensure that they or others are safe. One way that you see a lot of organizations making this philosophy actionable is that they institute a “stop work” program. The idea is that if an employee witnesses something unsafe they have authority to stop the job and take the steps necessary to make sure that the risks are sufficiently reduced.

Sounds like a great idea right?

One way to define the safety profession is to say that our job is to identify the unintended consequences of human and organizational actions. People typically don’t intend to hurt themselves or others, but their actions often result in them or others getting hurt. Why? The person or organization made a plan to do something (i.e. a job) but they didn’t identify and/or manage the risks that their plan would expose them to, i.e. the unintended consequence of their actions. So our job as safety professionals is to make people and/or organizations aware of the unintended consequences and eliminate or make acceptable the risks derived from those consequences.

Here’s the kicker – even actions designed to make us safe can have unintended consequences. That doesn’t mean these actions are bad, just that they, like everything else, need to be thought out to ensure that in the process of trying to make things better we don’t accidentally make things worse.

Take the “safety is everyone’s responsibility” philosophy, and specifically the policy that all employees have the ability to “stop work” if they see an unsafe condition. In psychology there’s a concept called “diffusion of responsibility.” The idea is that if a person is around other people he or she is less likely to take responsibility for an action. You see this in emergencies, where people in a crowd are less likely to call for emergency services than a single person who witnesses the emergency (“I’m sure one of these people will call for help”). You also see this in charity situations, where people are less likely to give to charities when in the presence of others. Basically, the individual in a group puts the responsibility for action on the group, and not on him- or herself, making individual action less likely.

Now apply that to “stop work” policies – if an employee walks through your job site and sees an unsafe condition but also sees others in the area, diffusion of responsibility predicts that they will be less likely than normal to take action. There are other psychological processes that could be involved as well but you can see what the conclusion is. We end up with a safety policy that looks great on paper but is meaningless in real life and may actually create a false sense of security for the organization and the employees.

Don’t get us wrong; we have seen “stop work” policies that are effective and we do advocate that organizations give employees stop work authority. However, if we ignore the unintended consequences we may end up doing little or no good at all at improving safety in our organizations. The key to implementing an effective “stop work” policy is that you need the following three components:

  • Management commitment. You need managers and supervisors at every level fully on board with the process. If you try to implement the program but you have managers sending a different message to employees the program will fail.
  • Training and communication. Just telling employees that they have this authority isn’t enough. You need to drive home to employees why it’s important that they, individually, take responsibility and take the appropriate actions. If possible give employees scenarios so they can visualize how they would respond to different situations. Remember, stopping work is easier said than done, so make sure employees have the tools they need to fully implement it.
  • Follow through. For the first few months of implementing the policy (or reinvigorating the policy, as the case may be) you need to be extremely responsive to any time someone stops work. If they do, unless the employee did some egregious, you need to put them on a pedestal, even (especially) if they didn’t need to shut down the job because a later investigation found no unsafe condition. Do not allow any punishment for anyone who stops a job (again, unless they did something egregious) otherwise you will kill the program. Remember, you are attempting to make people act differently than they normally do. So you need to give them incentives to do so and remove any punishing elements.

Keep these three components in mind. Without the right foundation in place a “stop work” program, or any program based on the idea that safety is everyone’s responsibility, will not succeed.