When reporting system uptime for a SaaS product, how are you handling partial impact or intermittent issues where the system is running but some customers are experiencing issues while others are unimpacted? Is there a best practice for determining what is reported in system uptime metrics as "down?" With all of the redundancies built in, the application we're supporting is rarely truly down; most issues fall into this category of partial/intermittent where impacted users/customers are limited. Unfortunately, it can often be challenging to identify exactly what customers had issues and for how long. How have you handled similar issues?

5.1k views8 Comments
Sort By:
Oldest
Director in Manufacturinga year ago
We did have an "Impacted" status, but we also struggled to put hard measurements to the degree of impact.  If it were a manufacturing system, and we were still able to meet production quotas, it did not count as "Down", we would just report it as Yellow or Caution.  Any time there were actual production impacts where final output couldn't be produced it was "Down".  That would include for example if in SAP accounts receivable could not send out invoices.  Even though SAP was functioning, if bills were not going out by end of day, it was considered a SEVERITY 2 Outage.  If no functions of SAP were operational that was SEVERITY 1 - ALL HANDS ON DECK!!

When it comes to the "Up/Down" judgement call, an Operations Director had to apply Wisdom to make the metric decision.  And they needed to be able to defend the decision in monthly metrics
1
Chief Technology Officer in Softwarea year ago
Defined P1,P2,P3,P4

P1 - System down for all users
P2 - System up but critical functionality down for more than 5% users
P3 - System up but critical functionality down for less than 1% users
P4 - Featurre issue.
SVP - Software Engineering in Finance (non-banking)a year ago
This is tough to measure and not easy to put in your typical uptime measure but we use third party tools like AlertSite to check uptime independently from whatever measures our vendor provides as we hit these  systems every 5 minutes. It might catch every downtime but if there truly are frequent intermittent system issues, it will eventually see patterns over time because of the frequency of the independent checks.
lock icon

Please join or sign in to view more content.

By joining the Peer Community, you'll get:

  • Peer Discussions and Polls
  • One-Minute Insights
  • Connect with like-minded individuals
Manager in Constructiona year ago
I'd agree with the other comment here, describing the key issue types and % impact on user-base. If it is affecting small number of users then its a P3.

P1 - System down for all users
P2 - System up but critical functionality down for more than 5% users
P3 - System up but critical functionality down for less than 1% users
P4 - Feature issue.
VP of IT in Softwarea year ago
for the most part, we could partial as a full outage unless the partial is very limited functionality. For intermittent disruptions, if it's very up and down we could the entire period - otherwise we evaluate it on a case by case basis and maybe would count a large percentage of the time as the outage time.

Content you might like

Director of IT in IT Services4 days ago
Not sure on comparison, but one of our client is using GE’s Flow Safe Pipeline control system. Seems my point of contacts are happy with it.
1.9k views1 Comment

TCO19%

Pricing26%

Integrations21%

Alignment with Cloud Provider7%

Security10%

Alignment with Existing IT Skills4%

Product / Feature Set7%

Vendor Relationship / Reputation

Other (comment)

View Results
5.7k views3 Upvotes1 Comment
720 views

Acquiring new clients and projects20%

Keeping up with evolving technologies and testing methodologies52%

Building a strong reputation and establishing credibility in the industry53%

Adapting to changing client demands and expectations40%

Ensuring effective communication and collaboration with clients and development teams21%

Developing effective pricing strategies and staying profitable14%

Other (please specify)

View Results
1.5k views
Head of Enterprise Architecture MERCK Group in Healthcare and Biotecha year ago
Strategy & Architecture
Read More Comments
39k views5 Upvotes34 Comments