SAP Knowledge Base Article - Public

3011155 - SAP SuccessFactors Operations Best Practice: Event & Alert Management and Incident Management

Symptom

You operate SAP SuccessFactors and want to know about SAP Best Practice Cloud Operations. You want to establish efficient and effective Event and Incident Management of this cloud solution.

Environment

SAP SuccessFactors (SaaS)

Reproducing the Issue

You need to design, adjust, or otherwise establish Event Management/Alert Management as well as Incident Management for your SAP SuccessFactors environment.

Cause

SAP Best Practice Cloud Operations are not yet followed.

Resolution

Event Management

Event & Alert Management allows to handle events and alerts efficiently and efficiently, consuming data from the monitoring use cases and calculating alerts to be followed up with the help of predefined root cause analysis and alert follow up procedures/guides (including notification management and automatic event reaction).

Event & Alert Management allows to handle thousands of events efficiently and in a unified way for cloud services, systems, users, applications, interfaces and business processes, including the underlying IT infrastructure. It consumes data from the introduced monitoring use cases and calculates robust alerts to notify the concerned end users. Via a central, dashboard-oriented and intuitive alert inbox the received alerts can be processed efficiently. To reduce the amounts of alerts we are using vertical correlation. Symptoms belonging to the same root cause are summarized to one alert. Every fired alert should be associated with an alert response procedure as problem context collection or auto-healing activity.

External monitoring tools, as network management tools, can be easily integrated to make the associated metrics and events visible in alert inbox for further automated or manual processing or vertical correlation. Outbound integration for most common Incident Management or Event & Alert Management systems, e.g. ServiceNow, HP Service Manager, and SAP Solution Manager IT Service Management needs to be provided as standardized as possible. Event & Alert Management is relevant for cloud centric as well as for hybrid customers. It addresses mainly IT users, however, also LOB users may be interested in other aspects e.g. notified in case of a certain problem.

Incident Management

This best practice shall help you to create a quality incident ticket creation to ensure faster resolution. In this document, the following describes the responsibilities of all roles involved in the process.

  • Reporter: Submits a request for Incident resolution. Provides information for reporting the incident.
  • Operation Desk: Tracks open incidents and identifies any incident that requires increased focus to meet committed service levels. Handles day-to-day incident issues and escalates to site lead and incident resolver groups as required to bring the resolution of the incidents back on schedule.
  • Incident Manager: Monitors and validates all incidents. Assures the quality and accuracy of incident information and ensures the transfer of information to the incident resolver.
  • Resolver Group: Responsible for resolving the incident.

The operation desk needs to create a ticket in the following format. The operation desk needs to check or review the severity or priority and categorization of the ticket for correctness.

In case of any mismatch, the change needs to be captured appropriately either in the ticket log or any other offline document. The operation desk needs to acknowledge the ticket as per the current procedure and document it accordingly. The following steps needs to be done by the operation desk to document or update a ticket:

  • The operation desk needs to diagnose the ticket by making use of the knowledge base / Known Error Database (KEDB) as required and provide a correct and suitable resolution.
  • The operation desk needs to document the troubleshooting steps in the work log and update relevant fields suitably in the ticketing tool.
  • The operation desk needs to follow the appropriate steps, as defined, for changing status of the ticket to Hold / Pending or adhere to the Standard Operating Procedure (SOP) for its resolution.
  • The operation desk needs to update all relevant fields on resolution of the ticket.
  • The operation desk needs to follow proper procedure in case the ticket had incomplete information and update it by following the defined process for the engagement.

The ticket needs to be routed in a timely manner following a defined procedure for reassignment of ticket, in case it did not belong to the assigned team/resolver group. The operation desk must escalate the ticket to the next level in a timely manner following a defined process. The operation desk needs to group and tag similar tickets to a parent ticket. The operation desk needs to conduct an RCA for tickets with severity 1 and severity 2, and then update the ticket accordingly.

Establish a continuous improvement process for Incident Management.

Define:

  • Quality requirement: Setup standards (processes and procedures)
  • Ticket templates and checklists preparation

Measure: Ticket quality evaluation method

Check: Regular quality report and meetings to review and track progress (presented on team meetings)

Act:

  • Trainings and instructions provided based on the observations of the evaluations
  • Provide feedback and requirements to ServiceNow implementation
  • Engage CPS, IMC/FMC and ticket handling teams within the SDO

Establish ticket quality evaluation method using the following quality criteria.

Accuracy category:

  • Was the Requester / Initiator accurate in their request?
  • The response/action technically correct?
  • The response/action procedurally correct?
  • Is the resolution description matching the actual resolution?

Content:

  • Write-up understandable
  • Description complete
  • Resolution reusable

Timeliness:

  • MPT - Completed w/I the SLA/SLO
  • Efficiently executed (i.e., no ping pong)
  • Regular updates

Additional checks (Not affecting the end results, can be changed anytime):

  • Knowledge base candidate?
  • Number of reassignments less than 3?
  • Duplicate ticket?

 

Parameter

Question

Classification of parameter

Ticket Analysis

Did the IMC check / review Priority, Business services, CI and categorization of the ticket for correctness?

Process Knowledge

In case of any mismatch, was the change captured appropriately either in the ticket log?

Ticket Handling

Did the engineer acknowledge ticket as per current procedure and documented it accordingly?

Process Knowledge

Did the engineer updated the progress of his analysis on daily basis or as per SLA response time

Ticket documentation/Updation

Did the engineer properly diagnose ticket making use of knowledge base / Known Error Database (KEDB). as required, and provide correct & the most suitable resolution?

Process Knowledge

Did the engineer document troubleshooting steps in the work log and update relevant fields suitably in the ticketing tool?

Ticket documentation/Updation

Did the engineer follow appropriate steps, as defined, for changing status of the ticket to "Hold" / "Pending" or adhere to standard operating procedure (SOP) for its resolution?

Quality Tickets Updation

Ticket Updation

Did the engineer update all relevant fields like Closure notes, Summary Notes & Business impact on resolution of the ticket?

Technical Knowledge

Ticket documentation/Updation

Did the Engineer follow proper procedure in case the ticket had incomplete information and updated it following the defined process for the engagement?

Process Knowledge

 Ticket Re-assignment 

Was the ticket routed in timely manner following a defined procedure for reassignment of ticket in case it did not belong to the assigned tower/resolver group?

Process Knowledge

 Ticket Escalation 

Was the ticket escalated to next level in a timely manner following defined process?

Quality Tickets Updation

 Ticket Grouping 

Were similar tickets grouped and tagged to parent ticket?

Technical Knowledge

Ticket Updation

Was proper RCA conducted for a Severity 1 & 2 ticket and was the ticket updated accordingly?

Process Knowledge

Please note: The PDC Community is dedicated to SAP Partners but we strongly recommend checking this community before opening an incident.

Infrastructure Monitoring

IT Infrastructure Monitoring is also part of customer specific operations. However, we don’t have out-of-the-box visibility in the infrastructure layer. This is more and more covered by IaaS Service Providers as Amazon Web Services, Google Cloud Platform or Microsoft Azure, who provide typically their own infrastructure tooling. In case customers operate the IT Infrastructure by them-self, they are using a tool of their choice. The IT Infrastructure Management software market is quite fragmented and somehow also commodity. From a tool perspective SAP recommends consuming events, metrics from 3rd party tools and enabling vertical correlation scenarios. For cloud-centric customers this topic is not relevant from customer perspective. It addresses IT users exclusively.

SuccessFactors Cloud System Monitoring

There are several tools and resources available to monitor the health and wellness of your SuccessFactors Cloud System.

SuccessFactors Cloud Proactive Monitoring

The Cloud Availability Center (CAC) offers a personalized dashboard overview of the status of your SAP cloud products; with the purpose of providing you with up-to-date status on incidents (service disruptions/ interruptions/ degradation/ maintenance) occurring with your products. It provides an at-a-glance view of current and historical cloud solution status, product and system views, planned events, and the latest notifications. Your S-user personalization in the SAP ONE Support Launchpad gives you a view of your products, the data centers, URLs, and tenants they reside on. This tool is designed to assist you with your productive systems. You can find a link to this tool, and more information in the Cloud Availability Center FAQ as well as in the KBA 2548400 - Getting started with the SAP Cloud Availability Center (CAC).

SuccessFactors Cloud Communications

The Cloud System Notification Subscriptions (CSNS) application makes it easy to add, customize, and manage subscriptions to Cloud Availability Center notifications. Using this tool, SAP Cloud customers can remain constantly informed and receive timely updates regarding their SAP Cloud Services, including – but not limited to – planned and unplanned downtimes, and customer communication.​ You may access the too lfor managing your Cloud System Notifications here. For more details on managing your Cloud System Notifications, please review the following KBAs:

See Also

Knowledge base Articles:

Keywords

OCC; Operations Control Center; CCOE; Customer Center of Expertise; E2EHO; SFSF; SuccessFactors; Cloud Operations; Cloud Monitoring; Event Management; Incident Management , KBA , XX-SER-MCC , Mission Control Center - Knowledge Management , How To

Product

SAP SuccessFactors HXM Core all versions