3059591 - How to Troubleshoot Memory Saturation and Long Garbage Collection on Commerce Cloud Using Dynatrace

Symptom

There are three different types of alerts related to memory which are found in the Problems page in Dynatarce as illustrated below.

The Memory resources exhausted and Long garbage-collection time alerts are the focus of this article since they are reported on the service (pod) level and have a direct impact on application performance.

In contrast, the Memory saturation alert is reported on the host level and has no negative impact on the application performance. In almost all cases where this alert is triggered, no manual corrective measure is necessary (with the rare exception of one scenario described under the Note closing this section) for the reasons described below.

First, we must understand the conditions leading to this alert. Dynatrace reports host memory saturation as a problem when both conditions below are true:
- Memory usage > 80%
  This threshold is defined by default in Dynatrace and cannot be customized. At the lower bounds of the threshold this may seem like a premature alert, but even at 99% this wouldn't be a problem for reasons explained under point 2 below.
- Page faults > 100 faults/s
  A Page Fault generally means the application needs some data that is not in the physical memory (RAM) at the time. When the application needs some data that is not in the local memory and has to retrieve it from the Paging File on the hard disk, it counts as a page fault. It must also be noted that there is an open issue on the Kubernetes GitHub for active page cache being counted against memory available, which may skew the numbers. In any case, when talking purely about monitoring, page faults are completely normal/expected and are no cause for concern.
For both points above, we can rest assured that Kubernetes will manage the resources according to the CPU and Memory limits of each individual service, ensuring that a level beyond 100% cannot be reached (again with the rare exception described under the Note below). Let's imagine a simplified example where a VM having 32GiB of memory is hosting the below containers: backgroundprocessing - 10GiB, backoffice - 10GiB, accstorefront - 10GiB.

Together, these pods contribute 93.75% of the available memory and would trigger a Memory saturation alert. If we scale the number replicas for accstorefront up to 2 for example, Kubernetes would not be able to schedule the new pod on the same VM, so it will find or create another one to host the new container.

This kind of orchestration is continuously being done by Kubernetes in the background in the most optimal way, such that manual intervention is not advised unless absolutely necessary.

Note: Depending on the logging controller used in your environment, there is a possibility that the memory limit can reach a value that is larger than the capacity of the VM. This can be verified in Dynatrace from the Kubernetes option in the left navigation menu, then clicking on AutomationEngine and finally on Analyze nodes.

In the resulting page and under the Node utilization section, filter for Memory and limits and sort the list by Memory limits as illustrated below.

If you find that there is an entry for which the limit is greater than 32 GB, please report this to the support team by creating a ticket in the CEC-HCS component. In any other case, this alert can be ignored while keeping the focus on service level alerts which are described below.

Environment

SAP Commerce Cloud

Product

SAP Commerce Cloud all versions

Keywords

KBA , CEC-SCC-CLA-ENV-EMG , Environment Management , How To

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.