Wednesday, April 30, 2014

The Blind Spots of Behavioral Observation Programs

Behavioral observation programs are a mainstay in many safety systems that are looking to move beyond compliance and get employees involved. The idea is pretty straightforward – have employees observe other employees doing job tasks. The observers then judge whether the behavior is “safe” or “unsafe” and provide immediate feedback to the employees who did the tasks. You seem to accomplish a lot with a program such as this, including:
  • Immediate and specific feedback to employees for “unsafe” behaviors, which enhances learning;
  • Employees get involved in the process and take ownership of safety at the site; and,
  • You get another feedback loop that you can use to identify exposures and risks at the site (you can also use it as a handy metric).

This sounds like a panacea for all your safety performance needs. So what’s the problem?

Well, the problem with most behavioral observation programs is that they don’t account for some blind spots that the programs tend to have, both practical and foundational.

Let’s start with an example of the practical – First, when it comes to identifying “safe” and “unsafe” behaviors, your employees are far more likely to identify obvious “unsafe” behaviors that lead to smaller accidents than they are to identify the less obvious behaviors that are more of a grey area and, coincidentally, are more associated with serious injuries and disasters. So, for example, behavioral observation programs are very good at identifying whether or not employees are using the required PPE for a given task. However, these programs are not very good at identifying whether technical procedures that are only indirectly related to safety are being followed or even if those procedures are adequate for the reality the employees are facing. In cases where deviance from procedures is normalized you might have employees note a given task as “safe” because that’s the way the job is normally done, without realizing the risks involved. So the program provides an unreliable data source, causing you to think that your system is “safe” when, in reality, you are drifting toward danger.

The bottom line from a practical perspective – behavior observation works for obvious behaviors. If “safe” and “unsafe” behaviors are not as obvious though then the behavior observation program may be a false indicator. 

This leads to the foundational blind spot of behavior observation programs – the programs tend to assume that behavior is either “safe” or it is “unsafe.” This is categorically false. Behavior is inherently tied to the context and almost any behavior you can think of, if put in another context, is either safe or unsafe. Even the proverbial safety “no-no,” running with scissors, is sometimes the right thing to do (medical professionals run with scissors all the time in emergency situations).

Now it may be possible to identify a behavior that is always unsafe (using some definition of “safe” and “unsafe”), no matter what the context. But that’s not the point. If we really have to think hard to find something that’s always an unsafe behavior, is the idea that behavior is either “safe” or “unsafe” a really useful concept?

What if instead of a behavioral observation program we just had a performance observation program? Instead of judging whether the employee is doing things right or wrong, we just observe and try to understand how employees are doing work. Then, we ask questions (not just about the things we think they did wrong!), listen to stories, trying to find the best way to do the job in the context that the job is to be done. With the rich understanding of the reality the employees at the sharp end face, instead of telling them that what they are doing is wrong, we give them the tools (equipment, knowledge, time, etc.) they need to learn to adapt their behaviors to the contexts they face. We move past the obvious things and get to the real story of how work is performed in the organization. We move from a place of judgment to a place of cooperation. Then we not only get the basic advantages of traditional behavior observation programs noted above, we also eliminate the blind spots and build a foundation of trust between ourselves and the real source of safety in our organizations – our workers.

Wednesday, April 23, 2014

Success Investigations: Safety’s (Under Utilized) Secret Weapon

A worker puts the finishing touches on a project that she has been working on for the better part of two days. It was a bit of a challenge, some of the design drawings that engineering gave to her weren’t accurate, but she was able to figure it out and improvise a solution. Before leaving for the day she sees the site safety manager near the break room. The safety manager sees that the worker is done and immediately rushes to action. He furiously pulls out his phone and begins to coordinate getting a team together, while also yelling at the site supervisor to preserve the scene. A thorough investigation is done to identify the causal factors that led to the success and a report is issued to facilitate lessons learned.

Ok…the above almost never happens, right? Safety professionals don’t investigate success. Our job is to look at failures. Success usually isn’t even on our radar screen. We’re too busy as is. If we had to investigate every time a job went off without a problem we’d never see the light of day again. Besides, isn’t there enough data from looking at failures to make looking at successes a waste of time?

Hold that thought for a minute and consider the following quote from “Risk”, by John Adams:

"Road engineers with their accident statistics frequently dismiss condescendingly the fears of people living alongside busy roads with good accident records, heedless of the likelihood that the good accident records reflect the careful behavior of people who believe their roads to be dangerous."

Wow, what a quote! Consider the implication here – things are working out in a version of success, but not because of those whose job it is to create a safe system. The system could be safe simply because of the people overcoming an unsafe system.

The idea is pretty straightforward – if we have low accident/incident statistics there are a few potential reasons:
  1. You could have a safe system and the statistics reflect that.
  2. You could have an unsafe system and the statistics reflect the ability of your workers to overcome that unsafe system.
  3. You could have an unsafe system, but you’re just really lucky that you’re not having a lot of incidents.

So in two out of the three cases your system is unsafe, you just don’t know it yet. Of course you could argue that you would figure this out if an incident happened and you conduct a thorough investigation of that incident. But that means you have to wait for something bad to happen before you can learn the truth and that just doesn’t seem right.

What if, instead, you occasionally got out from behind that desk and went out to investigate why the majority of the time things are going right. Sidney Dekker advocates at least a two-hour blackout period for managers and supervisors, including safety managers, where all electronic devices must be powered off and the manager/supervisor just observes how work is being done. You should even have conversations with employees to get their perspectives on why things are working out. Many times we’ll find a story similar to the fictional story at the beginning of this post – our employees overcoming obstacles to create safety on a day-to-day basis using a combination of improvisation, creativity, and intelligence. Perhaps if those road engineers from the quote above got out from behind their desk they would see something similar.

How you do this can vary from informal processes, like those discussed above, to formal reviews, such as where you have a debriefing after key projects to identify lessons learned. A lot of really great things can happen when we start to investigate successes in addition to failures.
  • First, you can identify failures before they happen, as opposed to just after.
  • Second, you get a much clearer picture of the environment that your employees operate in, which builds understanding and trust between you and your employees. It also helps you recommend better interventions that will work with the workflow, rather than inhibit it.
  • Finally, it allows you to identify the reasons for success and make recommendations to increase the likelihood of success, in line with the concepts of Safety II.

So the next time you see your employees clocking out for the day safe and sound, don’t just pat yourself on the back for a job well done. Ask yourself what allowed that to happen. Once you understand that you may be able to make it happen again tomorrow.

Wednesday, April 16, 2014

Procedure Violations from the Human Performance Perspective

In March of 2005 the BP Texas City Refinery experienced one of the worst industrial disasters in the history of the United States where 15 people were killed and hundreds were injured. The accident sent shockwaves through the occupational safety and health world, primarily because it appeared that refineries such as the Texas City Refinery appeared to have such a great safety record (at least in terms of the standard ways people define safety – incident statistics). Just like with other such accidents, we all sought to understand why such a horrible tragedy could occur.

Well, at a very basic level the Texas City Refinery explosion was caused by a violation in procedures. One of the biggest precipitating causal factors was that operators who were starting the equipment after a maintenance shutdown controlled the flow of materials into the process unit manually to keep the level inside the tower (that eventually overflowed) at 9 feet, whereas the procedures for that unit required that the operators maintain flow using an automated system to keep the level in the tower at 6.5 feet. This violation of procedures let to a cascade of events that culminated in the tower filling to the top (about 158 feet) and leading to a release and subsequent explosion.

A pretty cut and dry case of a violation. Intentional unsafe behavior led to the accident. Name, blame, shame, and retrain.

Of course it’s never that simple. If we take a step back and recognize that these operators were likely not stupid and that they likely didn’t want to die or get anyone else killed we can a much clearer (and more interesting) picture of what went on here.

So first – why did the operators knowingly choose to violate a procedure? Well, let’s first identify two criteria that are necessary and sufficient for someone to knowingly violate a procedure or rule. First, the person has to believe that no significant punishment will come to them. This punishment can be formal, such as in discipline, or informal, such as the procedure violation leading to an accident. So the operators in this case believed that if they violated the procedure that there was no significant risk of being disciplined and there was no significant risk of an accident happening. Did they not understand that overflowing the tower would be a significant risk? Of course they did. But we have to understand that there’s a large difference between 9 feet and 158 feet. By raising the level a few feet what could the harm be?

Second, to choose to violate a procedure the person has to believe that the violation of the procedure will lead to some positive outcome. What was the potential positive outcome here? The operators frequently violated the procedure because doing so would allow them to avoid the risk of damaging sensitive equipment in the process and start the overall process more efficiently. Why would they do this? Because the procedures didn’t match their reality and because they are rewarded for efficiency and production.

So what’s the real picture we see here? We see the operators responding to the pressures of the organization, poorly written procedures, an abstract and poorly defined concept of the risks they face, and a poorly designed and maintained system. The interesting thing is that despite these poor conditions our workers usually find ways to achieve success.

Obviously some of the deeper system issues include looking at the goals of the organization, which is very important in this case. But how can we help the operators on the ground make better decisions in these contexts in a meaningful way? The learning from Texas City is that operators are not robots that you can program to act in a certain way. Humans take in information from multiple sources and balance that information in order to come up with actions that they feel make the most sense to them in the moment. This process is extremely important and is very effective most of the time (people usually don’t fail, they usually succeed). Our job is to help our employees make better decisions in those moments in two key ways:

Make sure employees have a clear picture of reality and the risks they are facing. Through a combination of poor maintenance (e.g. high level alarms failing) and poor design decisions (e.g. level on the tower indicators did not read levels above 9 feet). If the operators in Texas City were able to know exactly how much product was in the tower they may have been able to stop the cascade of events from happening. But not having any idea how much product was actually in the tower created a situation where the mental model that operators had did not match reality, leading to further mistakes. Allowing employees a better picture of the risks involved in an operation, in a meaningful way, will give employees the tools they need to make better decisions in those moments when they need to choose between production and safety. If the employee perceives the risks as high enough because we’ve provided them with a clear and accurate picture of the situation then they are far more likely to make the safe decision.

Make mistakes and errors forgiving. A simple question – why was the tower even designed to allow the inflow of product that would fill the whole tower? Couldn’t they have designed the system so that, similar to your bathtub, at a certain level the product flows into an overflow tank?

So the next time we see a procedure violation don’t let your first reaction be to blame the employee. What is the context in which the violation occurred? What are the competing goals and realities that the employees had to face? Was the broken rule or violated procedure consistent with the reality of the work to be done? How were the risks of a violation made obvious or hidden from them? Answering these questions can go a long way to not only making your employees safe, but also maximizing human performance in your organization.

Tuesday, April 8, 2014

Meerkats, Complexity, and Safety

We watched a TED Talk recently that was fascinating (click here if you want to watch it too). In it scientist Nicolas Perony details complex behaviors of animals such as puppies, bats, and meerkats. Each of these animals displayed interesting, complex behaviors that were not easily predictable, even by the animals. Take the meerkats for example – when the group of meerkats came across a busy road with vehicle traffic that must be crossed the group stopped, went one at a time, allowing the group leader to not have to go first. Basically the idea was to have the others go first to see if it was safe before the group leader went across. As Perony explains, if the group leader is hurt it can have devastating effects on the pack.

Think about this for a second. The meerkats haven’t had to deal with cars before the last few decades or so. How did they figure out a strategy to protect themselves and their group against such a relatively new threat? They must have had a meerkat safety meeting where they discussed the hazards, wrote out detailed procedures, and then provided thorough training to ensure that all meerkats were aware of the risks. Of course you need the safety officer meerkat there to enforce these rules as well, because you can’t really fix stupid, right? How else can we explain how these animals who are of lesser intelligence than us came up with these complex safety behaviors?

Obviously, we’re joking a bit here. In reality, as Perony explains, it’s likely some simple rules, usually starting with the basic rule that the meerkats want to do things that lead to success (e.g. food) and don’t lead to failure (e.g. getting run over) that the meerkats follow that lead to these complex behaviors, and ultimately to safety. In fact, if we complicated the system by making explicit rules we may actually be less safe in some circumstances (see the example of Perony’s complicated robot). As he explains, simplicity leads to complexity, which leads to resilience. The simplicity allows for the ability to adapt to changing circumstances (e.g. the incursion of humans and vehicles in the meerkat’s habitat) and still have success.

Contrast this approach with the current approaches to occupational safety and health. Our workers have the same innate instinct as meerkats, in that they want success and they don’t want to get hurt or killed at work. The difference of course is that our workers are much smarter than meerkats. So we have a smart, motivated workforce (relatively speaking) – how then do we manage their safety? We build complicated safety programs, policies, and procedures designed to protect our workers from themselves.

Consider for a minute if the world that Perony shows us applied to our workplaces. What if by applying more complicated rules, policies, and procedures, rather than making our organizations more safe, we are making them more brittle, less able to adapt to changing circumstances, more likely to fail, and unsafe?

If, instead of a management system built upon a foundation of command and control, we tap into the vast reserves of motivation and innovation in our workforces by creating management systems built up the ideas of facilitation and trust, we may find that relaxing control leads to simplicity, which leads to complexity, which leads to resilience, safety, and success.