What are your best practices for outage notifications? How do you keep the IT leadership team aware in real time? How does your support desk / organization get involved and when?

841 views2 Comments
Sort By:
Oldest
CIO in Banking2 months ago
- Before you even start it’s important to understand what makes a Critical, P1, P2, P3 incident.  This will be specific to a platform, business function and timing of the incident. For example, an incident on a finance system at month end is likely to have a higher impact than if it occurred earlier in the month. Or an incident on a trading system during a market event.

- For critical incidents (notified to the board) a bridge is kept open all the time, and anyone can dial-in to get an update.  However, there are usually timed updates on the hour or every 2hours, where all key people join to provide progress to key stakeholders.  A gold and/or platinum call will have been set up where business heads join to provide business, client, regulatory and financial impacts and technology join with remediation updates.  The relevant board member attends to get updates.  Email communication will go out hourly or every 2-hrs.

- For P1, Tech people can see updates by looking at service now commentary or join the incident bridge. Business Heads can wait for email communication (every 2hrs) and or attend bronze/silver calls depending upon client, regulatory and financial impacts.

- An organisation needs an agreed incident classification, definition of platinum, gold, silver and bronze calls, frequency of emails.  The policy should also include who writes the emails, in what format etc.

- ITIL is a good framework to follow and adapt to your organisation.

- Live updates (based on experience) are never really live unless you are on the incident bridge call and seeing it unfold in real-time.  When people say they want real-time updates, they just want the last meaningful update which they can get from an email or join the bridge call.
1
lock icon

Please join or sign in to view more content.

By joining the Peer Community, you'll get:

  • Peer Discussions and Polls
  • One-Minute Insights
  • Connect with like-minded individuals
Global Chief Cybersecurity Strategist & CISO in Healthcare and Biotech2 months ago
One of the best resources that explains how companies should handle outages and define responsibilities is the "Incident Response Handbook" by the Incident Response Consortium. This handbook provides comprehensive guidelines and best practices for handling incidents, including how to assign roles and responsibilities during an outage, how to communicate internally and externally, and how to coordinate the resolution process effectively. It's widely regarded as a valuable resource for organizations looking to improve their incident response capabilities. Here is the resource link https://www.incidentresponse.org/resources/useful-links/
1

Content you might like

TCO19%

Pricing26%

Integrations21%

Alignment with Cloud Provider7%

Security10%

Alignment with Existing IT Skills4%

Product / Feature Set7%

Vendor Relationship / Reputation

Other (comment)

View Results
5.7k views3 Upvotes1 Comment

Yes, this allows Google to see competitor compensation package structures and improve their own.81%

No, offer letter reviews should be standard industry practice.18%

2.7k views2 Upvotes8 Comments
Head of Enterprise Architecture MERCK Group in Healthcare and Biotecha year ago
Strategy & Architecture
Read More Comments
39k views5 Upvotes34 Comments