Using Tiger Teams during a major incident

Using Tiger Teams during a major incident

Using Tiger Teams during a major incident

Tiger Teams have often been used in crisis management. Tiger Teams are groups of experts assigned to investigate or solve technical and systemic problems. Thus in an Information Technology (IT) crisis such as a Major Incident (which is an incident that has occurred resulting in severe negative business consequences),Tiger Teams should be deployed.

Tiger Teams were first used in the early years of space flight and one of the early Tiger Team successes was solving the problem related to the Apollo space program. It was discovered that unacceptably large variations of errors existed in the navigation technology. The accuracy for a lunar landing was 200 metres but the accuracy of the early lunar probes was only 2 000 metres. A JPL Tiger Team solved the problem by discovering mascons, which are lunar mass concentrations, that had to be catered for in space navigation. Tiger Teams have continued to enjoy success in the area of space and aviation with one of the most well-known being those related to the Apollo 13 accident, which triggered the slogan, "Failure is not an option." During the Apollo 13 accident, a number of problems had to be solved, including keeping the crew alive, extending the air supply, charging the batteries, restarting the command module, navigating without a computer, keeping up the morale, etc.

Tiger Teams are often used in information security related incidents and are also known as ‘Red Teams'. In a Major incident there are a number of Tiger Teams, up to six, that can be established to assist in resolving incidents. These are the (Echo/Delta/Romeo/Whisky/Bravo/Alpha) teams. We will briefly describe each of these teams below:

Echo team
The Echo team is the Escalations Team and is responsible for stakeholder communications and owns the major incident from cradle to grave. The Major Incident manager is similar to the Flight Director of Apollo 13 who is operationally responsible for Mission Control. The team establishes communications between the service desk, customers, IT executives and other stakeholders. The team needs to be aware of the pressures associated with any service level agreements that state the client's expectations related to resolving outages.

Delta team
The Delta team is the team responsible for diagnostics and also collaborates with the resources responsible for detection. Crucially, the components that are impacted, once identified, should be matched with identities listed in the Configuration Management database (CMDB). The team investigates the possible causes, including the immediate and visual causes, as well as those identified proximate causes. If appropriate, they complete root causation. The priority of the team is to deliver a candidate repair or fix. In the area of an Information Security incident this team is often called the ‘Red team'.

Romeo team
The Romeo team executes the repair which includes the recovery (component has been recovered to previous state as listed in CMDB) and restore (normal business operations have resumed). The team receives input from the Delta team.
This team may have to deal with the issues associated with the logistics of component repair and be familiar with the standard operating procedures of the components being repaired.

Whisky team
The Whisky team is responsible for workaround implementation. Unlike the Delta team, which concentrates on a resolution or fix, the Whisky team typically has a known workaround available that can be implemented in a temporary manner to restore the service impacted to operations, thereby minimizing the negative business consequences. If a known workaround is not available they investigate a potential alternative workaround. Most importantly the workaround, either known or potential, and its expected time period deliverable is communicated to the Echo team, who will determine whether it is prudent to implement the workaround. This will be determined by analysing the expected repair and fix recommendations from the Delta team.

Bravo team
The Bravo team is responsible for business continuity and serve the purpose of business resumption in the event of a high level major incident. Viewed in another manner, this is a total workaround. Often the Bravo team collaborates with the Whisky team on aspects of business continuity that can be used for workarounds. In the event that neither a workaround, fix nor repair is forthcoming, the Echo team may engage the Bravo team to start a recovery of business to an alternative geographic location known as a disaster recovery site.

Alpha team
The Alpha team is responsible for producing an analysis of the Major Incident after it has been resolved. The team's objective is to complete the following statement:

The Major Incident affected the following in . minutes unavailable and/or minutes degraded. . affected by . further root causation. Escalated to .,>
This Major Incident affected the company usual. The outage was normal. The risk is average.

An important consideration is the analysis of risk. Another tool, provided by Dee Smith and Associates, known as the Rapid Risk Assessment tool, can assist in this area. This assessment will identify potential countermeasures (measures available to mitigate any risks) or any threats not yet mitigated.

Major incidents are the most stressful events that occur within IT crisis management. The areas of IT service management include people, processes and technology. Tiger Teams are most suitable in managing and addressing the people component of any IT crisis especially the Major Incident process. The different teams require and use different skill sets and by creating these teams as described above IT resources can be appropriately allocated to their area of expertise.

Read more