Business continuity planning
Business continuity planning (or business continuity and resiliency planning) is the process of creating systems of prevention and recovery to deal with potential threats to a company.[1]
A business continuity plan is a plan to continue operations if a place of business is affected by different levels of disaster which can be localized short term disasters, to days long building wide problems, to a permanent loss of a building. Such a plan typically explains how the business would recover its operations or move operations to another location after damage by events like natural disasters, theft, or flooding. For example, if a fire destroys an office building or data center, the people and business or data center operations would relocate to a recovery site.
Any event that could negatively impact operations is included in the plan, such as supply chain interruption, loss of or damage to critical infrastructure (major machinery or computing /network resource). As such, risk management must be incorporated as part of BCP.[2]In the US, government entities refer to the process as continuity of operations planning (COOP).[3]
In December 2006, the British Standards Institution (BSI) released an independent standard for BCP — BS 25999-1. Prior to the introduction of BS 25999, BCP professionals relied on information security standard BS 7799, which only peripherally addressed BCP to improve an organization's information security procedures. BS 25999's applicability extends to all organizations. In 2007, the BSI published BS 25999-2 "Specification for Business Continuity Management", which specifies requirements for implementing, operating and improving a documented business continuity management system (BCMS).
Business continuity management is standardised across the UK by British Standards (BS) through BS 25999-2:2007 and BS 25999-1:2006. BS 25999-2:2007 business continuity management is the British Standard for business continuity management across all organizations. This includes industry and its sectors. The standard provides a best practice framework to minimize disruption during unexpected events that could bring business to a standstill. The document gives you a practical plan to deal with most eventualities – from extreme weather conditions to terrorism, IT system failure and staff sickness.[4]
This document was superseded in November 2012 by the British standard BS ISO22301:2012.[5]
In 2004, following crises in the preceding years, the UK government passed the Civil Contingencies Act 2004 (The Act). This provides the legislation for civil protection in the UK.
The Act was separated into two distinct parts: Part 1 focuses on local arrangements for civil protection, establishing a statutory framework of roles and responsibilities for local responders. Part 2 focused on emergency powers, establishing a modern framework for the use of special legislative measures that might be necessary to deal with the effects of the most serious emergency.
The Act is telling responders and planners that businesses need to have continuity planning measures in place in order to survive and continue to thrive whilst working towards keeping the incident as minimal as possible.[6]
Analysis
The analysis phase consists of impact analysis, threat analysis and impact scenarios.
Business impact analysis (BIA)
A Business impact analysis (BIA) differentiates critical (urgent) and non-critical (non-urgent) organization functions/activities. Critical functions are those whose disruption is regarded as unacceptable. Perceptions of acceptability are affected by the cost of recovery solutions. A function may also be considered critical if dictated by law. For each critical (in scope) function, two values are then assigned:
- Recovery Point Objective (RPO) – the acceptable latency of data that will not be recovered. For example is it acceptable for the company to lose 2 days of data?[7]
- Recovery Time Objective (RTO) – the acceptable amount of time to restore the function.
The recovery point objective must ensure that the maximum tolerable data loss for each activity is not exceeded. The recovery time objective must ensure that the Maximum Tolerable Period of Disruption (MTPoD) for each activity is not exceeded.
Next, the impact analysis results in the recovery requirements for each critical function. Recovery requirements consist of the following information:
- The business requirements for recovery of the critical function, and/or
- The technical requirements for recovery of the critical function
Threat and risk analysis (TRA)
After defining recovery requirements, each potential threat may require unique recovery steps. Common threats include:
- Epidemic
- Earthquake
- Fire
- Flood
- Cyber attack
- Sabotage (insider or external threat)
- Hurricane or other major storm
- Utility outage
- Terrorism/Piracy
- War/civil disorder
- Theft (insider or external threat, vital information or material)
- Random failure of mission-critical systems
- Power cut
The impact of an epidemic can be regarded as purely human, and may be alleviated with technical and business solutions. However, if people behind these plans are affected by the disease, then the process can stumble.
During the 2002–2003 SARS outbreak, some organizations grouped staff into separate teams, and rotated the teams between primary and secondary work sites, with a rotation frequency equal to the incubation period of the disease. The organizations also banned face-to-face intergroup contact during business and non-business hours. The split increased resiliency against the threat of quarantine measures if one person in a team was exposed to the disease.
Impact scenarios
After identifying the applicable threats, impact scenarios are considered to support the development of a business recovery plan. Business continuity testing plans may document scenarios for each identified threats and impact scenarios. More localized impact scenarios – for example loss of a specific floor in a building – may also be documented. The BC plans should reflect the requirements to recover the business in the widest possible damage. The risk assessment should cater to developing impact scenarios that are applicable to the business or the premises it operates. For example, it might not be logical to consider tsunami in the region of Mideast since the likelihood of such a threat is negligible.
Recovery requirement
After the analysis phase, business and technical recovery requirements precede the solutions phase. Asset inventories allow for quick identification of deployable resources. For an office-based, IT-intensive business, the plan requirements may cover desks, human resources, applications, data, manual workarounds, computers and peripherals. Other business environments, such as production, distribution, warehousing etc. will need to cover these elements, but likely have additional issues.
The robustness of an emergency management plan is dependent on how much money an organization or business can place into the plan. The organization must balance realistic feasibility with the need to properly prepare. In general, every $1 put into an emergency management plan will prevent $7 of loss.[8]
Solution design
The solution design phase identifies the most cost-effective disaster recovery solution that meets two main requirements from the impact analysis stage. For IT purposes, this is commonly expressed as the minimum application and data requirements and the time in which the minimum application and application data must be available.
Outside the IT domain, preservation of hard copy information, such as contracts, skilled staff or restoration of embedded technology in a process plant must be considered. This phase overlaps with disaster recovery planning methodology. The solution phase determines:
- crisis management command structure
- secondary work sites
- telecommunication architecture between primary and secondary work sites
- data replication methodology between primary and secondary work sites
- applications and data required at the secondary work site
- physical data requirements at the secondary work site.
Implementation
The implementation phase involves policy changes, material acquisitions, staffing and testing.
Testing and organizational acceptance
The purpose of testing is to achieve organizational acceptance that the solution satisfies the recovery requirements. Plans may fail to meet expectations due to insufficient or inaccurate recovery requirements, solution design flaws or solution implementation errors. Testing may include:
- Crisis command team call-out testing
- Technical swing test from primary to secondary work locations
- Technical swing test from secondary to primary work locations
- Application test
- Business process test
At minimum, testing is conducted on a biannual schedule.
The 2008 book Exercising for Excellence, published by The British Standards Institution identified three types of exercises that can be employed when testing business continuity plans.
Tabletop exercises
Tabletop exercises typically involve a small number of people and concentrates on a specific aspect of a BCP. They can easily accommodate complete teams from a specific area of a business.
Another form involves a single representative from each of several teams. Typically, participants work through simple scenario and then discuss specific aspects of the plan. For example, a fire is discovered out of working hours.
The exercise consumes only a few hours and is often split into two or three sessions, each concentrating on a different theme.
Medium exercises
A medium exercise is conducted within a "Virtual World" and brings together several departments, teams or disciplines. It typically concentrates on multiple BCP aspects, prompting interaction between teams. The scope of a medium exercise can range from a few teams from one organisation co-located in one building to multiple teams operating across dispersed locations. The environment needs to be as realistic as practicable and team sizes should reflect a realistic situation. Realism may extend to simulated news broadcasts and websites.
A medium exercise typically lasts a few hours, though they can extend over several days. They typically involve a "Scenario Cell" that adds pre-scripted "surprises" throughout the exercise.
Complex exercises
A complex exercise aims to have as few boundaries as possible. It incorporates all the aspects of a medium exercise. The exercise remains within a virtual world, but maximum realism is essential. This might include no-notice activation, actual evacuation and actual invocation of a disaster recovery site.
While start and stop times are pre-agreed, the actual duration might be unknown if events are allowed to run their course.
Maintenance
Biannual or annual maintenance cycle maintenance of a BCP manual is broken down into three periodic activities.
- Confirmation of information in the manual, roll out to staff for awareness and specific training for critical individuals.
- Testing and verification of technical solutions established for recovery operations.
- Testing and verification of organization recovery procedures.
Issues found during the testing phase often must be reintroduced to the analysis phase.
Information/targets
The BCP manual must evolve with the organization. Activating the call tree verifies the notification plan's efficiency as well as contact data accuracy. Like most business procedures, business continuity planning has its own jargon. Organisation-wide understanding of business continuity jargon is vital and glossaries are available.[9] Types of organisational changes that should be identified and updated in the manual include:
- Staffing
- Important clients
- Vendors/suppliers
- Organization structure changes
- Company investment portfolio and mission statement
- Communication and transportation infrastructure such as roads and bridges
Technical
Specialized technical resources must be maintained. Checks include:
- Virus definition distribution
- Application security and service patch distribution
- Hardware operability
- Application operability
- Data verification
- Data application
Testing and verification of recovery procedures
As work processes change, previous recovery procedures may no longer be suitable. Checks include:
- Are all work processes for critical functions documented?
- Have the systems used for critical functions changed?
- Are the documented work checklists meaningful and accurate?
- Do the documented work process recovery tasks and supporting disaster recovery infrastructure allow staff to recover within the predetermined recovery time objective?
See also
References
Notes
- ↑ Elliot, D.; Swartz, E.; Herbane, B. (1999) Just waiting for the next big bang: business continuity planning in the UK finance sector. Journal of Applied Management Studies, Vol. 8, No, pp. 43–60. Here: p. 48.
- ↑ Intrieri, Charles (10 September 2013). "Business Continuity Planning". Flevy. Retrieved 29 September 2013.
- ↑ http://www.fema.gov/guidance-directives
- ↑ British Standards Institution (2006). Business continuity management-Part 1: Code of practice :London
- ↑ British Standards Institution (2012). Societal security – Business continuity management Systems – Requirements: London
- ↑ Cabinet Office. (2004). overview of the Act. In: Civil Contingencies Secretariat Civil Contingencies Act 2004: a short. London: Civil Contingencies Secretariat
- ↑ May, Richard. "Finding RPO and RTO".
- ↑ "Can Your Organization Survive a Natural Disaster?". Boston University. Retrieved 22 December 2014.
- ↑ Glossary of Business Continuity Terms
Bibliography
- Business Continuity Planning, FEMA, Retrieved: June 16, 2012
- Continuity of Operations Planning (no date). U.S. Department of Homeland Security. Retrieved July 26, 2006.
- Purpose of Standard Checklist Criteria For Business Recovery (no date). Federal Emergency Management Agency. Retrieved July 26, 2006.
- NFPA 1600 Standard on Disaster/Emergency Management and Business Continuity Programs — PDF (2010). National Fire Protection Association.
- United States General Accounting Office Y2k BCP Guide (August 1998). United States Government Accountability Office.
Further reading
International Organization for Standardization
- ISO/IEC 27001:2005 (formerly BS 7799-2:2002) Information Security Management System
- ISO/IEC 27002:2005 (renumerated ISO17999:2005) Information Security Management – Code of Practice
- ISO/IEC 27031:2011 Information technology - Security techniques - Guidelines for information and communication technology readiness for business continuity
- ISO/PAS 22399:2007 Guideline for incident preparedness and operational continuity management
- ISO/IEC 24762:2008 Guidelines for information and communications technology disaster recovery services
- IWA 5:2006 Emergency Preparedness
- ISO 22301:2012 Societal security - Business continuity management systems - Requirements
- ISO 22313:2012 Societal security - Business continuity management systems - Guidance
- ISO/TS 22315:2015 Societal security - Business continuity management systems - Guidelines for business impact analysis (BIA)
British Standards Institution
- BS 25999-1:2006 Business Continuity Management Part 1: Code of practice
- BS 25999-2:2007 Business Continuity Management Part 2: Specification
Others
- James C. Barnes. A Guide to Business Continuity Planning. ISBN 978-0471530152.
- Kenneth L Fulmer. Business Continuity Planning, A Step-by-Step Guide. ISBN 978-1931332217.
- Richard Kepenach. Business Continuity Plan Design, 8 Steps for Getting Started Designing a Plan.
- Judy Bell. Disaster Survival Planning: A Practical Guide for Businesses. ISBN 978-0963058003.
- Dimattia, S. (November 15, 2001). "Planning for Continuity". Library Journal: 32–34.