3195656 - persistence take over with volume <id> but could not replay to end of log

Symptom

A point in time recovery using log backups from a former primary system to a takeover High availability or DR site, fails with the below error

- LogPartition[/hana/log/SID/mnt00001/hdb00004.00003/:0]: #segments 30/30 processed, 30/30 loaded, 30 total, last loaded LogSegment[0:0x14adf80fa140/Writing,ts=2022-04-14 08:43:58.732268][GUID=f0a1f89f-eca3-20220414-055602-0da000000e/PrevGUID=9d2afb7f-ecf3-20220123-082625-0b14a2c070/PersGUID=f0a1f89f-eca3-20220414-055602-0da000000a/RestoreGUID=aeffa451-ecf3-20211121-103821-23a0000037]
[125853]{-1}[-1/-1] 2022-04-14 10:40:44.433930 w Logger RecoveryHandlerImpl.cpp(02667) : Log ends at 0x14adf80fa180, cannot recover until 0x14adf822a140 (inexact)
[117002]{-1}[-1/-1] 2022-04-14 10:40:45.906788 i PITRestart LogReplayCoordinator.cpp(00611) : Log Replay Coordinator (master, volume=4)
- lastReplayStepStartLogPos: 0x14adf802a140
Replay Step Information
- replayStepStatus: ReplayStep_Running
[117002]{-1}[-1/-1] 2022-04-14 10:40:55.906895 i PITRestart LogReplayCoordinator.cpp(00611) : Log Replay Coordinator (master, volume=4)
- lastReplayStepStartLogPos: 0x14adf802a140
Replay Step Information
- replayStepStatus: ReplayStep_Running
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614708 i PITRestart LogReplayCoordinator.cpp(00608) : Exit virtual void DataRecovery::LogReplayCoordinator::waitForFinishRecovery()(2417731547usec)
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614730 i PersistenceManag PersistenceManagerSPI.cpp(01147) : Notify RedoHandler listener about new state: 0
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614740 i CSARedo RedoMerge.cpp(02933) : Cancel ongoing merges that are not ready to be committed
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614745 i PersistenceManag PersistenceManagerSPI.cpp(01167) : Notification of RedoHandler listener finished
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.616404 i Logger RecoveryHandlerImpl.cpp(01927) : Wait for recovery queue threads
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624083 i Logger RecoveryHandlerImpl.cpp(01934) : Recovery finished at log position 0x14adf80fa180
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624121 i PITRestart LogReplayCoordinator.cpp(00148) : recoveryFinished(): vid=4, pos=0x14adf80fa180, reason=0x0
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624126 i PersistenceManag PersistenceManagerSPI.cpp(01147) : Notify RedoHandler listener about new state: 1
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624128 i CSARedo RedoMerge.cpp(02933) : Cancel ongoing merges that are not ready to be committed
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624130 i CSARedo RedoMerge.cpp(02954) : Unblock all async merges and OC before waiting for them
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624250 i PersistenceManag PersistenceManagerSPI.cpp(01167) : Notification of RedoHandler listener finished
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624258 i PersistenceManag LogReplayCoordinator.cpp(00464) : Coordinated log replay finished (master) ...
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624263 i PersistenceManag LogReplayCoordinator.cpp(00465) : - maxDiskPos : 0x14adf80fa180
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624264 i PersistenceManag LogReplayCoordinator.cpp(00466) : - replayPos : 0x14adf80fa180
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624265 i PersistenceManag LogReplayCoordinator.cpp(00469) : - replayTime: 14.04.2022-06.43.58 (1649918638723202)
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624275 i PITRestart LogReplayCoordinator.cpp(01233) : serviceFinishedReplay() myVid=4, isMaster=1, vid=4, isConsistent=1, slaveCount=1, servicesFinishedReplay=0
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624281 i PersistenceManag LogReplayCoordinator.cpp(01249) : PIT Service Finished Replay: 4 isConsistent:1 1/2
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624283 i PITRestart LogReplayCoordinator.cpp(02310) : Enter bool DataRecovery::LogReplayCoordinatorMaster::waitForSlaves()
Arg this = 0x00007f6c53686800
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632073 i PITRestart LogReplayCoordinator.cpp(01233) : serviceFinishedReplay() myVid=4, isMaster=1, vid=3, isConsistent=0, slaveCount=1, servicesFinishedReplay=1
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632092 i PersistenceManag LogReplayCoordinator.cpp(01249) : PIT Service Finished Replay: 3 isConsistent:0 2/2
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632095 i PITRestart LogReplayCoordinator.cpp(01253) : All slaves finished replay. Send global result to all slaves: result=0
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632097 i PITRestart LogReplayCoordinator.cpp(01263) : - slave: 3
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.632454 i PITRestart LogReplayCoordinator.cpp(02310) : Exit bool DataRecovery::LogReplayCoordinatorMaster::waitForSlaves()(8171usec)
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.632959 i assign TREXIndexServer.cpp(01269) : persistence take over with volume <id> but could not replay to end of log
exception 1: no.70029060 (PersistenceLayer/PersistenceSystem.cpp:542) TID: 117002 replay for takeover could to replay until latest log: reach timestamp: 1649918638723202 logpos: 22737423671680
exception throw location:
1: 0x00007f78d36c5d6a in PersistenceLayer::PersistenceSystem::start(NameServer::ServiceStartInfo const&, bool, bool)+0x946 at PersistenceSystem.cpp:542 (libhdbpersistence.so)
2: 0x00007f78d3682230 in PersistenceLayer::PersistenceFactory::startPersistence(NameServer::ServiceStartInfo&, bool, bool, DataAccess::TablePreloadWriteCallback*, ltt::smartptr_handle<DataAccess::DisasterRecoveryPrimaryCallback>, DataAccess::DisasterRecoveryESCallback*)+0xd0 at PersistenceFactory.cpp:553 (libhdbpersistence.so)
3: 0x00007f78f4a537e3 in PersistenceController::startup(PersistenceLayer::PERSISTENCE_MODE, NameServer::ServiceStartInfo*, bool, DataAccess::TablePreloadWriteCallback*, DataAccess::TablePreloadReadCallback*, Backup::RecoverCbc_Federation*)+0x5d0 at PersistenceController.cpp:729 (libhdblogger.so)
4: 0x00005618bdef1c69 in TRexAPI::TREXIndexServer::assign(NameServer::ServiceStartInfo&, bool, TREX_ERROR::TRexError&)+0x1525 at TREXIndexServer.cpp:1250 (hdbindexserver)
5: 0x00005618bdef7e97 in TRexAPI::AssignThread::run(void*)+0x33 at TREXIndexServer.cpp:545 (hdbindexserver)
6: 0x00007f78d2c82fd4 in TrexThreads::PoolThread::run()+0x970 at PoolThread.cpp:462 (libhdbbasement.so)
7: 0x00007f78d2c84e10 in TrexThreads::PoolThread::run(void*&)+0x10 at PoolThread.cpp:142 (libhdbbasement.so)
8: 0x00007f78cd8cdf85 in Execution::Thread::staticMainImp(Execution::Thread*)+0x531 at Thread.cpp:522 (libhdbbasis.so)
9: 0x00007f78cd8bd9e8 in Execution::pthreadFunctionWrapper(Execution::Thread*)+0x1e4 at ThreadInterposition.cpp:639 (libhdbbasis.so)
10: 0x00007f78ccf1fea5 in start_thread+0xc1 (libpthread.so.0)
11: 0x00007f78cb60695d in __clone+0x69 (libc.so.6)
exception type information:

Most recent successful log backups are recorded in the backup.log file of the failing service and it's persistence volume <id> :

'2022-04-13T23:54:39+02:00  P0046666      18024ebd648 INFO    LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 100% 2147307520/2147307520'

The 'backup.log' file shows errors for failed log backups on the persistence during the next subsequent log backup attempt (also present in the HANA service trace file) :

2022-04-13T23:54:52+02:00  P0046666      18024ec0545 INFO    LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 79% 2549268480/3219742720
2022-04-13T23:54:52+02:00  P0046666      18024ec0545 ERROR   LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, BackupLogError
2022-04-13T23:54:52+02:00  P0046666      18024ec0545 ERROR   LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, Error during asynchronous file transfer (io_getevents), rc=28: No space left on device; $fileCallback$=[W] , buffer= 0x00006d8339139000, offset= 2549280768, size= 0/134217728, file= "<root>/.log_backup_4_0_22728986116480_22729036424960.1649886889285" ((open, mode= W, access= rw-r-----, flags= ASYNC|DIRECT|TRUNCATE|UNALIGNED_SIZE), factory= (root= "/hana/backup/SID/log/DB_SID/" (access= rw-r-----, flags= AUTOCREATE_PATH|DISKFULL_ERROR, usage= LOG_BACKUP, fs= xfs, config= (async_write_submit_active=on,async_write_submit_blocks=all,async_read_submit=on,num_submit_queues=1,num_completion_queues=1,size_kernel_io_queue=512,max_parallel_io_requests=64,min_submit_batch_size=16,max_submit_batch_size=64))) {shortRetries= 0, fullRetries= 0 (0/10)}
2022-04-13T23:55:53+02:00  P0046666      18024ecfe41 INFO    LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, BackupLogStarted
2022-04-13T23:55:53+02:00  P0046666      18024ecfe41 INFO    LOGBCKUP to file: /hana/backup/SID/log/DB_SID/log_backup_4_0_22728986116480_22729254485696.1649886953025
2022-04-13T23:55:53+02:00  P0046666      18024ecfe41 INFO    LOGBCKUP start of progress monitoring, volumes: 1, bytes: 17175629824
2022-04-13T23:55:53+02:00  P0046666      18024ecfe41 INFO    LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 0% 0/17175629824

...........

2022-04-14T00:50:32+02:00  P0046666      180251eb640 INFO    LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 29% 4964343808/17175629824
2022-04-14T00:50:42+02:00  P0046666      180251eb640 ERROR   LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, BackupLogError
2022-04-14T00:50:42+02:00  P0046666      180251eb640 ERROR   LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, Error during asynchronous file transfer (io_getevents), rc=28: No space left on device; $fileCallback$=[W] , buffer= 0x00006bb6e88f1000, offset= 4964364288, size= 0/134217728, file= "<root>/.log_backup_4_0_22728986116480_22729254485696.1649890211392" ((open, mode= W, access= rw-r-----, flags= ASYNC|DIRECT|TRUNCATE|UNALIGNED_SIZE), factory= (root= "/hana/backup/SID/log/DB_SID/" (access= rw-r-----, flags= AUTOCREATE_PATH|DISKFULL_ERROR, usage= LOG_BACKUP, fs= xfs, config= (async_write_submit_active=on,async_write_submit_blocks=all,async_read_submit=on,num_submit_queues=1,num_completion_queues=1,size_kernel_io_queue=512,max_parallel_io_requests=64,min_submit_batch_size=16,max_submit_batch_size=64))) {shortRetries= 0, fullRetries= 0 (0/10)}
2022-04-14T01:50:33+02:00  P0046666      1802555fbca INFO    BACKUP   SAVE DATA started [1649893833674]
2022-04-14T01:50:33+02:00  P0046666      1802555fbca INFO    BACKUP   command: backup data differential for SID using backint ('/usr/sap/SID/SYS/global/hdb/backint/DB_SID/') TOOLOPTION 'NSR_UTL_FILE=/usr/sap/SID/SYS/global/hdb/opt/.tmp_hana_SID_129545.utl'

Environment

SAP HANA, platform edition 2.0

Product

SAP HANA, platform edition 2.0

Keywords

KBA , HAN-DB-BAC , SAP HANA Backup & Recovery , HAN-DB-PER , SAP HANA Database Persistence , HAN-DB-HA , SAP HANA High Availability (System Replication, DR, etc.) , Problem

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.