The most import IT process of them all

The most import IT process of them all

 

ITIL mentions the Major Incident process as a special case of the incident management process.  Also mentioned is the close relationship to problem management.  However, the Major Incident process requires greater clarity and specification as in many large enterprises the process is crucial for overcoming a crisis. A Major Incident is typically defined as an incident with severe negative business consequences and an important duty of any designated Information Technology (IT) resource is to deal with Major Incidents in a structured manner.  We will address this topic in a series of articles that specifically addresses this process and crisis management in general because when it comes down to the wire, how we deal with a crisis is the most crucial and important process of them all.



The above is the ITIL incident management process which provides an overview of the normal incident process.  However, in a time of crisis when a Major Incident has occurred then an expanded incident lifecycle needs to be followed which is loosely defined by ITIL as per the below diagram:




A strong Major Incident disciple has large enterprise benefits.  Mining the information gained from a detailed analysis of these types of incidents will greatly benefit enterprises by being able to create business cases for service improvements and tools based on the detailed cost of downtime which will be available.  An analysis of the impact of the consequences of Major Incidents will enable the IT disciple in an enterprise to be more effective and efficient.

Major Incidents have a direct relationship to problem management as the underlying triggers are usually problems.  The following answers need to be answered for any problem encountered in an enterprise:

*What is the problem?

*Why is there a problem?

*When did the problem happen?

*How did the problem occur?

*Where did the problem manifest itself?

*Who has been experiencing this problem?

The Major Incident process when it is diligently executed not only assists in identifying the problems being experienced by an enterprise but also goes some way to provide answers to the above questions.  

The benefits of this approach are not immediately obvious but it touches largely on financial management. Within IT, the cost base is always the IT budget and it is not always easy to prove an investment return on the IT budget base. As an example many ITIL based initiatives add to the IT budget base without a corresponding measurable reduction. This would be true of a CMDB implementation, as an example. The Major Incident process is crucial as it focuses on business consequences and a measurement of the associated costs of a return to service. This is where a financial justification can be extracted. The costs associated with a return to service influence the business budget base.

We are addressing business problems and not IT problems.

The route to these problems are lit up by a crucial component of Incident Management known as the expanded incident Lifecycle. In a forthcoming article we will focus on the detailed recording of the times of incidents and also the additional checkpoint events in the lifecycle.

Why is time important?  Remember we are working on a business financial justification which is all about money! Time is money which is what was stated by Benjamin Franklin. An analysis of these times will assist in clarifying some of the following potential issues:

*When is the business impacted by Major Incidents? Is it at recognized stages like month end?

*Is the return to service being prioritized?

*Are we detecting incidents quickly? Are the systems being suitably managed or monitored?

*Are the incidents correctly diagnosed? Is this diagnoses performed within expected time parameters. Are technicians suitably trained?

*Are repair processes initiated within suitable time limits after diagnosis? Is there a logistics issue?

*Are restore times adequate? Is there an issue around continuity or dated technology?

*Does the system start processing and become functional in a useful manner to the business in an acceptable time period after being restored? Are there cumbersome interface issues?

The Major Incident process and a disciplined and integrated approach to dealing with it will provide insight into the above and in all probability the financial justification to correct it.  Within IT, cost justification is always problematic.  We will demonstrate in future articles how the following broad cost categories assist in creating transparency:

*Repair cost – the cost most often budgeted by IT. This is the down time multiplied by the pro rata cost of the resources used.

*Return to service cost – the cost to the business of the associated unavailability. This is made up of the following costs:

*Down time multiplied by population affected and normalized head count charge.

*Down time multiplied by missed revenue opportunity costs per hour.

*Overtime and other direct costs associated with the Major Incident.

*Soft costs such as customer perception and legal charges or penalties.

Many IT shops approach their justifications by putting the cart before the horse.  They identify a tool or service and then start seeking justification.  By having a mature Major Incident process it will be possible to follow a process where the justification is based on an analysis of impact, whereby a decision is made to resolve the underlying concerns by investigating tools or services that address those that are the most important for business to solve.  This firmly attaches the cart behind the horse and as such the Major Incident process becomes the most important IT process of them all.

Read more