Symptom
The Replication Flow run failed with the following error message:
Transferring initial data failed due to the following error: Timeout occurred when the allocated worker was processing work order.
Environment
SAP Datasphere
Reproducing the Issue
Run the Replication Flow in SAP Datasphere tenant
Cause
27.03.2025 :
When a work order (e.g., transfer order of a data package) for a replication task hangs in the connection to the source system for more than 10 minutes, the underlying worker graph is terminated and not retried. As a result, the entire worker graph is terminated, causing the respective replication flow and all other replication flows scheduled on the same worker graph to stop, returning the error message: “Timeout occurred when the allocated worker was processing work order.”
This issue has been observed particularly during long-running initial loads and real-time delta loads (with delta interval = 0) from ABAP-based systems involving a cloud connector. The hanging connections causing the timeout after 10 minutes can be due to various reasons (network issues, load on the source ABAP system, poor timing during horizontal scaling of services, etc.). However, since the issue occurs sporadically, a general problem can be ruled out at the moment.
In order to handle such situations more resiliently, as part of the hotfix, replication flows will automatically retry such timed-out work orders until they succeed. If the replication flow continues to be stuck in retries, there are no expected side effects, besides that the replication will not proceed and has to be analyzed manually via the support process in case it does not resolve due to the retries.
Resolution
Current status and info from development team (further updates will be shared below):
31.03.2025 -
* The root cause was identified and fixed.
* The service is back to normal.
* Manual actions might be necessary.
A Hotfix has been delivered in the maintenance window Mar 29 - Mar 30th, 2025. This hotfix will solve the "Timeout occurred when the allocated worker was processing work order." issue.
According to our analysis a manual resume of replications in "timeout error state" was often not necessary, they restarted automatically. But better check the status of your replication flows in the Datasphere monitor.
In case replication objects still show the timeout error state, resume each object manually.
Errors with a timestamp of <= Mar-30 are expected as they occurred before the hotfix got deployed.
In case you find newer timeout errors please return your ticket immediately to SAP.
As replication flows are resilient against such errors, they will start picking up all missing data that piled up on the data source.
24.03.2025 - Mitigation attempt changing Kubernetes config only working for a short time.
19.03.2025 - 23.03.2025 - UNDER INVESTIGATION
Keywords
KBA , DS-DI-RF , Replication Flows , CA-DI-IS-ABA-AC , ABAP Integration - ABAP Connectivity , Bug Filed