IT Accident Protocol: Difference between revisions

From Bloomex Wiki
Jump to navigation Jump to search
(Created page with "1")
 
No edit summary
Tags: Manual revert Visual edit
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
1
 
=== Emergency & Maintenance Website Shutdown Protocol ===
Effective Date: 19 Feb 2025
 
Responsible Team: IT Department
 
Affected Websites: [any sites under Bloomex’s management]
----
 
=== 1. Purpose ===
To establish a clear process for emergency and scheduled website shutdowns, ensuring proper internal communication and minimizing financial and operational impact.
----
 
=== 2. Scope ===
This protocol applies to all planned and unplanned maintenance that results in website downtime, whether full or partial, for Bloomex and its associated platforms.
----
 
=== 3. Procedure for Emergency Website Shutdown ===
 
==== Step 1: Initial Assessment ====
 
* Determine the severity of the issue:
** Is the issue critical (server crash, DDoS attack, major bug)?
** Is it non-critical but urgent (performance degradation, minor bug fix, necessary updates)?
* Classify the expected downtime:
** Short (<15 min): No major disruption.
** Medium (15 min - 1 hour): The threshold for communication.
** Long (>1 hour): Major issue requiring extended work.
 
==== Step 2: Notify Key Stakeholders ====
 
* If downtime exceeds 15 minutes, notify the following:
** Marketing Team (Skype/Email/SMS): Inform them ASAP so they can pause ads.
** Customer Support Team (Skype): To manage customer expectations.
** Senior Management (optional, based on severity).
 
==== Step 3: Execute the Shutdown & Maintenance ====
 
* Post a temporary maintenance page stating the reason and estimated time of restoration.
* Document the incident in the IT log, noting:
** Date & time
** Nature of the issue
** Actions taken
** Expected restoration time
 
==== Step 4: Update Marketing & Stakeholders ====
 
* Send real-time updates at significant milestones:
** Issue identified
** Work in progress
** Estimated completion time
* Confirm when the site is back online so marketing can resume ads.
 
----
 
=== 4. Procedure for Scheduled Maintenance ===
 
==== Step 1: Planning the Maintenance ====
 
* Schedule maintenance during off-peak hours (e.g., late at night).
* Assess the expected downtime and if it will impact core functions.
* Prepare a rollback plan in case of failure.
 
==== Step 2: Pre-Notification (At Least 24 Hours Before Scheduled Maintenance) ====
 
* Notify key stakeholders:
** Marketing Team (to pause ads)
** Customer Support
** Senior Management (if high-impact)
* Post a notification banner on the website (if applicable).
 
==== Step 3: Execution & Communication ====
 
* Follow the shutdown process with a maintenance page.
* Provide real-time updates to marketing and customer support.
 
==== Step 4: Completion & Post-Maintenance Review ====
 
* Confirm site functionality before marking the task complete.
* Notify stakeholders that the site is operational.
* Analyze the impact and document lessons learned.
 
----
 
=== 5. Additional Recommendations ===
 
* Use an automated alert system for marketing notifications (e.g., SMS/email alert to pause ads).
* Keep a downtime log with details to improve future handling (currently tracked in Grafana).
* Create a script for automated Ads shutdown and restore.

Latest revision as of 12:45, 20 February 2025

Emergency & Maintenance Website Shutdown Protocol

Effective Date: 19 Feb 2025

Responsible Team: IT Department

Affected Websites: [any sites under Bloomex’s management]


1. Purpose

To establish a clear process for emergency and scheduled website shutdowns, ensuring proper internal communication and minimizing financial and operational impact.


2. Scope

This protocol applies to all planned and unplanned maintenance that results in website downtime, whether full or partial, for Bloomex and its associated platforms.


3. Procedure for Emergency Website Shutdown

Step 1: Initial Assessment

  • Determine the severity of the issue:
    • Is the issue critical (server crash, DDoS attack, major bug)?
    • Is it non-critical but urgent (performance degradation, minor bug fix, necessary updates)?
  • Classify the expected downtime:
    • Short (<15 min): No major disruption.
    • Medium (15 min - 1 hour): The threshold for communication.
    • Long (>1 hour): Major issue requiring extended work.

Step 2: Notify Key Stakeholders

  • If downtime exceeds 15 minutes, notify the following:
    • Marketing Team (Skype/Email/SMS): Inform them ASAP so they can pause ads.
    • Customer Support Team (Skype): To manage customer expectations.
    • Senior Management (optional, based on severity).

Step 3: Execute the Shutdown & Maintenance

  • Post a temporary maintenance page stating the reason and estimated time of restoration.
  • Document the incident in the IT log, noting:
    • Date & time
    • Nature of the issue
    • Actions taken
    • Expected restoration time

Step 4: Update Marketing & Stakeholders

  • Send real-time updates at significant milestones:
    • Issue identified
    • Work in progress
    • Estimated completion time
  • Confirm when the site is back online so marketing can resume ads.

4. Procedure for Scheduled Maintenance

Step 1: Planning the Maintenance

  • Schedule maintenance during off-peak hours (e.g., late at night).
  • Assess the expected downtime and if it will impact core functions.
  • Prepare a rollback plan in case of failure.

Step 2: Pre-Notification (At Least 24 Hours Before Scheduled Maintenance)

  • Notify key stakeholders:
    • Marketing Team (to pause ads)
    • Customer Support
    • Senior Management (if high-impact)
  • Post a notification banner on the website (if applicable).

Step 3: Execution & Communication

  • Follow the shutdown process with a maintenance page.
  • Provide real-time updates to marketing and customer support.

Step 4: Completion & Post-Maintenance Review

  • Confirm site functionality before marking the task complete.
  • Notify stakeholders that the site is operational.
  • Analyze the impact and document lessons learned.

5. Additional Recommendations

  • Use an automated alert system for marketing notifications (e.g., SMS/email alert to pause ads).
  • Keep a downtime log with details to improve future handling (currently tracked in Grafana).
  • Create a script for automated Ads shutdown and restore.