Remote incremental backups - timeouts

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
Hi @brt,

cPanel version 68 includes resolutions for both cases referenced in the earlier post. It's now available in the Current build tier, and is tentatively planned for the Release build tier next week.

Thank you.
 

uk01

Well-Known Member
Dec 31, 2009
232
35
78
Hi,
The errors described here seem to be the same as we are having.
I've tried the fix on the previous page, editing the rsync file with a higher timeout, however the error has occurred again tonight.

Preview of transport errors log:
Unable to prune transport “xxxxxxx”
Error pruning “/home/xxxx-incremental/2017-10-24” from “xxxxxxx”: ssh slave failed: timed out

I checked and the 2017-10-24 has been deleted even though its telling me it timed out.
Is this related to the fix in v68? And are backups ok even though this error has occurred or are they corrupt? (bearing in mind its pruning the "original" folder" with the hardlinks.



[2017-10-31 03:01:29 +0000] warn [cpbackup_transporter] Error pruning /home/xxxx-incremental/2017-10-24 from xxxxxx-rsync: ssh slave failed: timed out at /usr/local/cpanel/Cpanel/LoggerAdapter.pm line 27.
Cpanel::LoggerAdapter::warn(Cpanel::LoggerAdapter=HASH(0x1bfa350), "Error pruning /home/xxxx-incremental/2017-10-24 from xxxxxx-"...) called at /usr/local/cpanel/Cpanel/Backup/Queue.pm line 690
Cpanel::Backup::Queue::transport_backup::attempt_to_prune_destination(Cpanel::Backup::Queue::transport_backup=HASH(0x207f450), Cpanel::Transport::Files::Rsync=HASH(0x2a3e488), 7, undef, Cpanel::LoggerAdapter=HASH(0x1bfa350)) called at /usr/local/cpanel/Cpanel/Backup/Queue.pm line 226
Cpanel::Backup::Queue::transport_backup::process_task(Cpanel::Backup::Queue::transport_backup=HASH(0x207f450), cPanel::TaskQueue::Task=HASH(0x2a866b8), Cpanel::LoggerAdapter=HASH(0x1bfa350)) called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/TaskQueue.pm line 629
eval {...} called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/TaskQueue.pm line 632
cPanel::TaskQueue::__ANON__() called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/StateFile.pm line 237
eval {...} called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/StateFile.pm line 237
cPanel::StateFile::Guard::call_unlocked(cPanel::StateFile::Guard=HASH(0x26c1418), CODE(0x26dd698)) called at /usr/local/cpanel/3rdparty/perl/524/lib64/perl5/cpanel_lib/cPanel/TaskQueue.pm line 637
cPanel::TaskQueue::process_next_task(cPanel::TaskQueue=HASH(0x26c01b0)) called at /usr/local/cpanel/bin/cpbackup_transporter line 151
eval {...} called at /usr/local/cpanel/bin/cpbackup_transporter line 149
 

brt

Well-Known Member
Jul 9, 2015
105
10
68
US
cPanel Access Level
Root Administrator
This seems to be fixed now, functionally, (backups DO seem to be pruning themselves now) but I'm still getting daily emails with the pruning error (Backup Transport Error).

Are there any updates on this matter (pruning backups)?
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
Is this related to the fix in v68? And are backups ok even though this error has occurred or are they corrupt? (bearing in mind its pruning the "original" folder" with the hardlinks.
The issue you described does appear related to internal case CPANEL-15398. The resolution is included in cPanel version 68 and ensures that the operations for the rsync backend use the default timeout setting so that destination servers with extremely slow
disks (e.g. disk caching is disabled) can delete directories. I recommend updating to cPanel version 68 once it's in your build tier (or even sooner if you don't mind switching to the CURRENT build tier) to see if the issue persists.

And are backups ok even though this error has occurred or are they corrupt? (bearing in mind its pruning the "original" folder" with the hardlinks.
The error suggests the backups were not pruned. Are you suggesting some of the backup directories are in-fact pruned on the remote system?

This seems to be fixed now, functionally, (backups DO seem to be pruning themselves now) but I'm still getting daily emails with the pruning error (Backup Transport Error).

Are there any updates on this matter (pruning backups)?
Is this server already using cPanel version 68? cPanel 68 includes the fix:

Fixed case CPANEL-15398: Backups: ensure rsync operations use default timeout.

Thank you.
 

brt

Well-Known Member
Jul 9, 2015
105
10
68
US
cPanel Access Level
Root Administrator
@cPanelMichael - I guess I did mistake two different updates here. "CPANEL-15493: Make sure incremental dirs are removed when asked." is the one that seems to have fixed our remote server filling up. Previous backups were failing to prune prior to that.

Now it appears they're pruning correctly, but are still reporting problems pruning.
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
Now it appears they're pruning correctly, but are still reporting problems pruning.
To confirm, which version of cPanel is installed on this system?

Thank you.
 

LBJ

Well-Known Member
Nov 1, 2003
117
24
168
cPanel Access Level
DataCenter Provider
The error suggests the backups were not pruned. Are you suggesting some of the backup directories are in-fact pruned on the remote system?

Is this server already using cPanel version 68? cPanel 68 includes the fix:

Fixed case CPANEL-15398: Backups: ensure rsync operations use default timeout.

Thank you.
We're running v68.0.21 and on 2 of our servers we're seeing the remote prune being successful, but still receiving the "Unable to prune transport" error.

Best regards,

LBJ
 

sp3ctre69

Well-Known Member
Aug 14, 2006
111
5
168
Our system seemed fixed after the update but in he last week it has started doing it again.
 

uk01

Well-Known Member
Dec 31, 2009
232
35
78
We see the following:

Remote destination 1
3 servers running centos 7 work fine
1 server running cloudlinux centos 6 never prunes

Remote destination 2
3 servers running centos 7 alert as failed pruning but do actually prune
1 server running cloudlinux centos 6 never prunes
 

LBJ

Well-Known Member
Nov 1, 2003
117
24
168
cPanel Access Level
DataCenter Provider
On cPanel Stable v68.0.21, the timeout used for the final remote prune appears to be running with 30 rather than 300.

The logs, Rsync.pm, and the timing between the primary backup finishing and the error for a prune failure being generated all point to this.

For backups of servers with data well in excess of 1/2 TB, a timeout of 30 doesn't always allow for a prune on remote SATA non NVMe/SSD drives to complete. I would imagine most users with substantial data would be using economical non NVMe/SSD drives for backup processes.

Best regards,

LBJ
 

uk01

Well-Known Member
Dec 31, 2009
232
35
78
data well in excess of 1/2 TB, a timeout of 30 doesn't always allow for a prune
LBJ
Ours is under 500gb, several GB per server per night.
Good point on the SATA, we wouldn't use SSD for backup!
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
Hello,

Internal case CPANEL-17253 addresses an issue where if the backup transporter hits the destination timeout (e.g. MAXIMUM_TIMEOUT value in /var/cpanel/backups/config) on an upload attempt, then subsequent attempts will ignore the timeout entirely, causing remaining transports to be delayed or possibly skipped. This affects multiple transporter types (including rsync). The resolution is included with cPanel version 70:

Fixed case CPANEL-17253: Ensure timeout is set for each upload attempt.

Thank you.
 

LBJ

Well-Known Member
Nov 1, 2003
117
24
168
cPanel Access Level
DataCenter Provider
Hello,

Internal case CPANEL-17253 addresses an issue where if the backup transporter hits the destination timeout (e.g. MAXIMUM_TIMEOUT value in /var/cpanel/backups/config) on an upload attempt, then subsequent attempts will ignore the timeout entirely, causing remaining transports to be delayed or possibly skipped. This affects multiple transporter types (including rsync). The resolution is included with cPanel version 70:

Fixed case CPANEL-17253: Ensure timeout is set for each upload attempt.

Thank you.
Yes, but that doesn't apply to the issue being discussed here.

The problem being encountered by the above posters is where the upload has completed and the prune on the remote location is also completely successful, but cPanel's rsync.pm bales out too early on the prune process (spawned by the backup) and generates a spurious error email.

Best regards,

LBJ
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
The problem being encountered by the above posters is where the upload has completed and the prune on the remote location is also completely successful, but cPanel's rsync.pm bales out too early on the prune process (spawned by the backup) and generates a spurious error email.
Could you open a support ticket so we can take a closer look at an affected system? You can post the ticket number here and I'll update this thread with the outcome.

Thank you.
 

LBJ

Well-Known Member
Nov 1, 2003
117
24
168
cPanel Access Level
DataCenter Provider
Could you open a support ticket so we can take a closer look at an affected system? You can post the ticket number here and I'll update this thread with the outcome.

Thank you.
Thank you for the offer, but we've patched rsync.pm on our servers to give it a more reasonable timeout on the prune process and it now works as expected.

Any of the above posters should be able to help you out with a sample system though.

Best regards,

LBJ
 

brt

Well-Known Member
Jul 9, 2015
105
10
68
US
cPanel Access Level
Root Administrator
I'm still getting a daily backup transport error email, saying Unable to prune transport -- but the backup completes and the destination(s) DO prune successfully; the oldest backup does delete as expected.

I believe I have the timeout set at the max, although deleting hundreds of thousands of files does take a long time.

I'm sick of this email. Please advise.
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
I'm still getting a daily backup transport error email, saying Unable to prune transport -- but the backup completes and the destination(s) DO prune successfully; the oldest backup does delete as expected.
I encourage you to open a support ticket so we can take a closer look at the affected system to see what's happening. Let us know the ticket number and we'll update this thread with the outcome.

Thank you.
 

LBJ

Well-Known Member
Nov 1, 2003
117
24
168
cPanel Access Level
DataCenter Provider
I'm still getting a daily backup transport error email, saying Unable to prune transport -- but the backup completes and the destination(s) DO prune successfully; the oldest backup does delete as expected.

I believe I have the timeout set at the max, although deleting hundreds of thousands of files does take a long time.

I'm sick of this email. Please advise.
G'day brt,

Can you please confirm that the error email is raised 60 minutes after the primary backup completes? You'll find the primary backup completion time in the last line of the current backup log at...

/usr/local/cpanel/logs/cpbackup/

We've found on all our boxes that if you use the standard processing, anything spawned at the end of the primary backup process times out 60 minutes after the backup process spawns it. This includes the pruning process and anything run via the post backup hook. The MAXIMUM_TIMEOUT being set high won't fix this.

Best regards,

LBJ
 

brt

Well-Known Member
Jul 9, 2015
105
10
68
US
cPanel Access Level
Root Administrator
G'day brt,

Can you please confirm that the error email is raised 60 minutes after the primary backup completes? You'll find the primary backup completion time in the last line of the current backup log at...

/usr/local/cpanel/logs/cpbackup/

We've found on all our boxes that if you use the standard processing, anything spawned at the end of the primary backup process times out 60 minutes after the backup process spawns it. This includes the pruning process and anything run via the post backup hook. The MAXIMUM_TIMEOUT being set high won't fix this.

Best regards,

LBJ
- Removed -
 
Last edited by a moderator: