Symptom
-
A point in time recovery using log backups from a former primary system to a takeover High availability or DR site, fails with the below error
- LogPartition[/hana/log/SID/mnt00001/hdb00004.00003/:0]: #segments 30/30 processed, 30/30 loaded, 30 total, last loaded LogSegment[0:0x14adf80fa140/Writing,ts=2022-04-14 08:43:58.732268][GUID=f0a1f89f-eca3-20220414-055602-0da000000e/PrevGUID=9d2afb7f-ecf3-20220123-082625-0b14a2c070/PersGUID=f0a1f89f-eca3-20220414-055602-0da000000a/RestoreGUID=aeffa451-ecf3-20211121-103821-23a0000037]
[125853]{-1}[-1/-1] 2022-04-14 10:40:44.433930 w Logger RecoveryHandlerImpl.cpp(02667) : Log ends at 0x14adf80fa180, cannot recover until 0x14adf822a140 (inexact)
[117002]{-1}[-1/-1] 2022-04-14 10:40:45.906788 i PITRestart LogReplayCoordinator.cpp(00611) : Log Replay Coordinator (master, volume=4)
- lastReplayStepStartLogPos: 0x14adf802a140
Replay Step Information
- replayStepStatus: ReplayStep_Running
[117002]{-1}[-1/-1] 2022-04-14 10:40:55.906895 i PITRestart LogReplayCoordinator.cpp(00611) : Log Replay Coordinator (master, volume=4)
- lastReplayStepStartLogPos: 0x14adf802a140
Replay Step Information
- replayStepStatus: ReplayStep_Running
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614708 i PITRestart LogReplayCoordinator.cpp(00608) : Exit virtual void DataRecovery::LogReplayCoordinator::waitForFinishRecovery()(2417731547usec)
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614730 i PersistenceManag PersistenceManagerSPI.cpp(01147) : Notify RedoHandler listener about new state: 0
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614740 i CSARedo RedoMerge.cpp(02933) : Cancel ongoing merges that are not ready to be committed
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.614745 i PersistenceManag PersistenceManagerSPI.cpp(01167) : Notification of RedoHandler listener finished
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.616404 i Logger RecoveryHandlerImpl.cpp(01927) : Wait for recovery queue threads
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624083 i Logger RecoveryHandlerImpl.cpp(01934) : Recovery finished at log position 0x14adf80fa180
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624121 i PITRestart LogReplayCoordinator.cpp(00148) : recoveryFinished(): vid=4, pos=0x14adf80fa180, reason=0x0
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624126 i PersistenceManag PersistenceManagerSPI.cpp(01147) : Notify RedoHandler listener about new state: 1
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624128 i CSARedo RedoMerge.cpp(02933) : Cancel ongoing merges that are not ready to be committed
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624130 i CSARedo RedoMerge.cpp(02954) : Unblock all async merges and OC before waiting for them
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624250 i PersistenceManag PersistenceManagerSPI.cpp(01167) : Notification of RedoHandler listener finished
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624258 i PersistenceManag LogReplayCoordinator.cpp(00464) : Coordinated log replay finished (master) ...
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624263 i PersistenceManag LogReplayCoordinator.cpp(00465) : - maxDiskPos : 0x14adf80fa180
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624264 i PersistenceManag LogReplayCoordinator.cpp(00466) : - replayPos : 0x14adf80fa180
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624265 i PersistenceManag LogReplayCoordinator.cpp(00469) : - replayTime: 14.04.2022-06.43.58 (1649918638723202)
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624275 i PITRestart LogReplayCoordinator.cpp(01233) : serviceFinishedReplay() myVid=4, isMaster=1, vid=4, isConsistent=1, slaveCount=1, servicesFinishedReplay=0
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624281 i PersistenceManag LogReplayCoordinator.cpp(01249) : PIT Service Finished Replay: 4 isConsistent:1 1/2
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.624283 i PITRestart LogReplayCoordinator.cpp(02310) : Enter bool DataRecovery::LogReplayCoordinatorMaster::waitForSlaves()
Arg this = 0x00007f6c53686800
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632073 i PITRestart LogReplayCoordinator.cpp(01233) : serviceFinishedReplay() myVid=4, isMaster=1, vid=3, isConsistent=0, slaveCount=1, servicesFinishedReplay=1
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632092 i PersistenceManag LogReplayCoordinator.cpp(01249) : PIT Service Finished Replay: 3 isConsistent:0 2/2
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632095 i PITRestart LogReplayCoordinator.cpp(01253) : All slaves finished replay. Send global result to all slaves: result=0
[125457]{-1}[-1/-1] 2022-04-14 10:41:03.632097 i PITRestart LogReplayCoordinator.cpp(01263) : - slave: 3
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.632454 i PITRestart LogReplayCoordinator.cpp(02310) : Exit bool DataRecovery::LogReplayCoordinatorMaster::waitForSlaves()(8171usec)
[117002]{-1}[-1/-1] 2022-04-14 10:41:03.632959 i assign TREXIndexServer.cpp(01269) : persistence take over with volume <id> but could not replay to end of log
exception 1: no.70029060 (PersistenceLayer/PersistenceSystem.cpp:542) TID: 117002 replay for takeover could to replay until latest log: reach timestamp: 1649918638723202 logpos: 22737423671680
exception throw location:
1: 0x00007f78d36c5d6a in PersistenceLayer::PersistenceSystem::start(NameServer::ServiceStartInfo const&, bool, bool)+0x946 at PersistenceSystem.cpp:542 (libhdbpersistence.so)
2: 0x00007f78d3682230 in PersistenceLayer::PersistenceFactory::startPersistence(NameServer::ServiceStartInfo&, bool, bool, DataAccess::TablePreloadWriteCallback*, ltt::smartptr_handle<DataAccess::DisasterRecoveryPrimaryCallback>, DataAccess::DisasterRecoveryESCallback*)+0xd0 at PersistenceFactory.cpp:553 (libhdbpersistence.so)
3: 0x00007f78f4a537e3 in PersistenceController::startup(PersistenceLayer::PERSISTENCE_MODE, NameServer::ServiceStartInfo*, bool, DataAccess::TablePreloadWriteCallback*, DataAccess::TablePreloadReadCallback*, Backup::RecoverCbc_Federation*)+0x5d0 at PersistenceController.cpp:729 (libhdblogger.so)
4: 0x00005618bdef1c69 in TRexAPI::TREXIndexServer::assign(NameServer::ServiceStartInfo&, bool, TREX_ERROR::TRexError&)+0x1525 at TREXIndexServer.cpp:1250 (hdbindexserver)
5: 0x00005618bdef7e97 in TRexAPI::AssignThread::run(void*)+0x33 at TREXIndexServer.cpp:545 (hdbindexserver)
6: 0x00007f78d2c82fd4 in TrexThreads::PoolThread::run()+0x970 at PoolThread.cpp:462 (libhdbbasement.so)
7: 0x00007f78d2c84e10 in TrexThreads::PoolThread::run(void*&)+0x10 at PoolThread.cpp:142 (libhdbbasement.so)
8: 0x00007f78cd8cdf85 in Execution::Thread::staticMainImp(Execution::Thread*)+0x531 at Thread.cpp:522 (libhdbbasis.so)
9: 0x00007f78cd8bd9e8 in Execution::pthreadFunctionWrapper(Execution::Thread*)+0x1e4 at ThreadInterposition.cpp:639 (libhdbbasis.so)
10: 0x00007f78ccf1fea5 in start_thread+0xc1 (libpthread.so.0)
11: 0x00007f78cb60695d in __clone+0x69 (libc.so.6)
exception type information:
-
Most recent successful log backups are recorded in the backup.log file of the failing service and it's persistence volume <id> :
'2022-04-13T23:54:39+02:00 P0046666 18024ebd648 INFO LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 100% 2147307520/2147307520'
-
The 'backup.log' file shows errors for failed log backups on the persistence during the next subsequent log backup attempt (also present in the HANA service trace file) :
2022-04-13T23:54:52+02:00 P0046666 18024ec0545 INFO LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 79% 2549268480/3219742720
2022-04-13T23:54:52+02:00 P0046666 18024ec0545 ERROR LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, BackupLogError
2022-04-13T23:54:52+02:00 P0046666 18024ec0545 ERROR LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, Error during asynchronous file transfer (io_getevents), rc=28: No space left on device; $fileCallback$=[W] , buffer= 0x00006d8339139000, offset= 2549280768, size= 0/134217728, file= "<root>/.log_backup_4_0_22728986116480_22729036424960.1649886889285" ((open, mode= W, access= rw-r-----, flags= ASYNC|DIRECT|TRUNCATE|UNALIGNED_SIZE), factory= (root= "/hana/backup/SID/log/DB_SID/" (access= rw-r-----, flags= AUTOCREATE_PATH|DISKFULL_ERROR, usage= LOG_BACKUP, fs= xfs, config= (async_write_submit_active=on,async_write_submit_blocks=all,async_read_submit=on,num_submit_queues=1,num_completion_queues=1,size_kernel_io_queue=512,max_parallel_io_requests=64,min_submit_batch_size=16,max_submit_batch_size=64))) {shortRetries= 0, fullRetries= 0 (0/10)}
2022-04-13T23:55:53+02:00 P0046666 18024ecfe41 INFO LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, BackupLogStarted
2022-04-13T23:55:53+02:00 P0046666 18024ecfe41 INFO LOGBCKUP to file: /hana/backup/SID/log/DB_SID/log_backup_4_0_22728986116480_22729254485696.1649886953025
2022-04-13T23:55:53+02:00 P0046666 18024ecfe41 INFO LOGBCKUP start of progress monitoring, volumes: 1, bytes: 17175629824
2022-04-13T23:55:53+02:00 P0046666 18024ecfe41 INFO LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 0% 0/17175629824
...........
2022-04-14T00:50:32+02:00 P0046666 180251eb640 INFO LOGBCKUP progress of service: indexserver, hostname:30040, volume: 4, 29% 4964343808/17175629824
2022-04-14T00:50:42+02:00 P0046666 180251eb640 ERROR LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, BackupLogError
2022-04-14T00:50:42+02:00 P0046666 180251eb640 ERROR LOGBCKUP state of service: indexserver, hostname:30040, volume: 4, Error during asynchronous file transfer (io_getevents), rc=28: No space left on device; $fileCallback$=[W] , buffer= 0x00006bb6e88f1000, offset= 4964364288, size= 0/134217728, file= "<root>/.log_backup_4_0_22728986116480_22729254485696.1649890211392" ((open, mode= W, access= rw-r-----, flags= ASYNC|DIRECT|TRUNCATE|UNALIGNED_SIZE), factory= (root= "/hana/backup/SID/log/DB_SID/" (access= rw-r-----, flags= AUTOCREATE_PATH|DISKFULL_ERROR, usage= LOG_BACKUP, fs= xfs, config= (async_write_submit_active=on,async_write_submit_blocks=all,async_read_submit=on,num_submit_queues=1,num_completion_queues=1,size_kernel_io_queue=512,max_parallel_io_requests=64,min_submit_batch_size=16,max_submit_batch_size=64))) {shortRetries= 0, fullRetries= 0 (0/10)}
2022-04-14T01:50:33+02:00 P0046666 1802555fbca INFO BACKUP SAVE DATA started [1649893833674]
2022-04-14T01:50:33+02:00 P0046666 1802555fbca INFO BACKUP command: backup data differential for SID using backint ('/usr/sap/SID/SYS/global/hdb/backint/DB_SID/') TOOLOPTION 'NSR_UTL_FILE=/usr/sap/SID/SYS/global/hdb/opt/.tmp_hana_SID_129545.utl'
Read more...
Environment
SAP HANA, platform edition 2.0
Product
Keywords
KBA , HAN-DB-BAC , SAP HANA Backup & Recovery , HAN-DB-HA , SAP HANA High Availability (System Replication, DR, etc.) , HAN-DB-PER , SAP HANA Database Persistence , Problem
About this page
This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).Search for additional results
Visit SAP Support Portal's SAP Notes and KBA Search.
SAP Knowledge Base Article - Preview