Backup Pruning Timeout

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
On servers where we are using the default cPanel backup feature, we have an issue where the backups are not successfully pruned. We get an email each night from cPanel saying the pruning timed out. So each night we have to go in and manually remove these old backups.

There is no issue with the remote machine - the machine is configured correctly and optimized. The backups are just very very large (over 1TB), with many inodes (we are using rsync incremental backups). Is there some reason why the destination timeout limit is limited to 300 seconds? Can we increase this beyond 300 seconds by modifying the destination information file in /var/cpanel/backups?

Any other way around this rather annoying issue? We could make a post backup hook perhaps to go in and prune the directory.. but there must be an easier way...
 

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
To answer my own question, it seems like patching /usr/local/cpanel/Cpanel/Transport/Files/Rsync.pm to allow a greater than 300 second timeout would do the trick. I'm still not sure why cPanel limits this to 300 seconds.

This solution was inspired by the posts in this thread:

 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston
Hi @CanSpace


There's a relatively old inquiry I found with this exact question and the response from development was as follows:

It's important to note what kind of timeout is being hit. Based on the log entry, this is an idle connection timeout:

Code:
[2014-09-19 07:26:32 +0100] warn [cpbackup_transporter] Upload attempt failed: RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. at /usr/local/cpanel/Cpanel/LoggerAdapter.pm line 26
This means literally no packets made it either to or from the cPanel server in 5 minutes. Increasing this timeout wouldn't do any good, since this is 100% a network issue. If the transporter is actively transferring data, that timeout can be increased up to 2 days. The 300 seconds only applies if data stops flowing. Perhaps this needs to be documented better; we have a lot of different timeouts and they all signify different conditions.

Keep in mind that a currently-running backup transporter will prevent a new one from running. Continually increasing timeouts will eventually run into problems with backups getting deleted before they can be transported, and/or customer drives filling up while they wait in vain for backups to get moved.
In your case does the transport log indicate the same/similar issue?
 

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
[2020-01-27 14:53:37 -0500] info [cpbackup_transporter] cpbackup_transporter - Processing next task
[2020-01-27 14:53:37 -0500] info [cpbackup_transporter] Instantiating Object
[2020-01-27 14:53:37 -0500] info [cpbackup_transporter] Starting a "prune" operation on the "xxxx" destination ID "LXhr8rkO7BDl5cnFU7v0yprS".
[2020-01-27 14:53:37 -0500] info [cpbackup_transporter] Performing prune operation, retaining 2 items on: xxxx
[2020-01-27 14:53:38 -0500] info [cpbackup_transporter] Pruning backup directory: 2020-01-22, from xxxx
[2020-01-27 22:53:37 -0500] info [cpbackup_transporter] ERROR: Pruning 2020-01-22 from xxxx: time out reached
[2020-01-27 22:53:37 -0500] info [cpbackup_transporter] The system could not prune the “2020-01-22” directory due to an error.

Now that I look at it, it's not reaching the 300 second limit - it's reaching the 8 hour total destination backup limit. What might be causing this? If I log in to the server and delete the directory manually, it only takes a minute or so. Is there something else that could be disconnecting and killing the connection?

On another server:

[2020-01-27 12:04:29 -0500] info [cpbackup_transporter] cpbackup_transporter - Processing next task
[2020-01-27 12:04:29 -0500] info [cpbackup_transporter] Instantiating Object
[2020-01-27 12:04:29 -0500] info [cpbackup_transporter] Starting a "prune" operation on the "xxxx" destination ID "0lotB6lKc0bSDgR7OiWKYiXl".
[2020-01-27 12:04:29 -0500] info [cpbackup_transporter] Performing prune operation, retaining 2 items on: xxxx
[2020-01-27 12:04:29 -0500] info [cpbackup_transporter] Pruning backup directory: 2020-01-19, from xxxx
[2020-01-27 13:07:18 -0500] info [cpbackup_transporter] ERROR: Pruning 2020-01-19 from xxxx: ssh slave failed: timed out
[2020-01-27 13:07:18 -0500] info [cpbackup_transporter] The system could not prune the “2020-01-19” directory due to an error.

This one timed out after less than 3 minutes. What might be causing this? Also notice the errors are slightly different.
 
Last edited:

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
To answer my own question, this actually wasn't a timeout issue. If I log in to the backup server and su into the rsync backup user, I can see that one of the directories is set to -rw-r--r-- so the owner cannot delete it. This isn't actually a timeout issue - cPanel is just throwing the wrong error.

The reason I didn't originally check this is because for some of our servers, cPanel actually does show when there is a permission error when pruning backups, whereas here it just shows a timeout.
 
  • Like
Reactions: cPanelLauren

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston
Nice @CanSpace

I'm actually glad I deterred you from the rabbit hole of making the modifications you were initially intending on making.

644 for a directory is kind of unusual but was it actually that ownership was incorrect? Because anything owned by the cPanel user/group should be able to be removed by the user.
 

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
The changes to be honest seem relatively straightforward - it's a fairly simple perl script.

Yes it was owned by the same user, but the file did not have owner write permissions. When I su'd to that user I was not able to rm the file (permission denied), but I could (as the same user) chmod u+w the file and then remove it.

As mentioned, I did not think this was a permission issue because when this happens on another server of ours, the cPanel backup failed email actually shows the permission issue, as opposed to just showing a timeout error.
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston
The changes to be honest seem relatively straightforward - it's a fairly simple perl script
Yea, I agree, and thankfully pretty easy to read. It's not difficult, but I think there would have been some further confusion when the timeout wasn't resolved.

As mentioned, I did not think this was a permission issue because when this happens on another server of ours, the cPanel backup failed email actually shows the permission issue, as opposed to just showing a timeout error.
The actual error that's being output is probably not a permission error, I was actually more curious what the issue was than whether or not it was a permissions error. Permission due to bad ownership may be something that isn't expected (that specifically is doubtful though)
 

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
Again the issue is that some files don't have user write permissions, so the user cannot delete them, and the rm -rf fails. Whereas the cPanel email just says "timed out" - this is what led me in the wrong direction.

On another server with the exact same issue (and the exact same backup destination server) the email actually does show the permission issues (ie cannot delete file, permission denied), so I assumed if there actually was a permission issue, that the email would have told me.

Not sure why cPanel has this inconsistency - but this issue has likely led a lot of people to think there is a timeout issue when it's really just a permission issue.
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston
Again the issue is that some files don't have user write permissions, so the user cannot delete them, and the rm -rf fails. Whereas the cPanel email just says "timed out" - this is what led me in the wrong direction.
Ok, to clarify what I mean: If I log in as the user lauren to my server - it's a standard user with no special permissions, and create a file or directory with 644 permissions - because it is owned by my user and group I can remove that file.

There should absolutely not be an issue with that. Standard file permissions are 644 - if it were the case that the user was unable to remove files owned by them with permissions of 644 almost no one would be able to remove their own files.

The issue I'm trying to understand here is specifically why this specific folder was unable to be removed. Either the attributes are modified or it isn't owned by that user/group
 

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
Sorry I pasted the wrong permissions above. The directory was set to 400 (ie r--------), so the owner-write permission was not set, and therefore the user could not delete their own files (without first changing the permission on the directory to 600).

This is a major nuisance because many applications create files without user-write permissions (444 or 400) like Drupal does by default on the /sites/default directory. rvSiteBuilder does this too with some specific template directories. I had to go in and manually change all these directories to 600 or 644 and then the backup user was able to delete them successfully.

So every time a user installs Drupal and it gets backed up, cPanel will not be able to prune the backups properly.

This is likely a very common issue with cPanel backups, but since cPanel sends an error message saying there was a "timeout" instead of actually showing the permission issues, everyone assumes there is a timeout issue - which is likely why there are so many discussions about this.

I've fixed the permission issues and everything is working fine now, but it's only a matter of time until another Drupal installation causes the pruning issue again. The ideal solution to this would be for the rsync copy to make sure that files/directories it copies have at least permissions of 600.
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston
Again,

While I can appreciate that the permissions were incorrectly relayed prior, I think there is some confusion about what a user can and can't do. If my user:group is lauren:lauren and my file is owned by the same user id and group ID (UID:GID) there are absolutely no restrictions as to what my user can remove unless the files attributes have been modified.

I can set the permissions of a file/directory my user owns to 000 and still be able to remove it.

For example:

Code:
[[email protected] ~]$ mkdir permissions_test
[[email protected] ~]$ chmod 400 permissions_test
[[email protected] ~]$ stat permissions_test/
  File: ‘permissions_test/’
  Size: 4096          Blocks: 8          IO Block: 4096   directory
Device: fd01h/64769d    Inode: 2099840     Links: 2
Access: (0400/dr--------)  Uid: ( 1000/  lauren)   Gid: ( 1002/  lauren)
Access: 2020-01-30 14:32:49.387711162 -0600
Modify: 2020-01-30 14:32:49.387711162 -0600
Change: 2020-01-30 14:32:57.651711765 -0600
Birth: -
[[email protected] ~]$ rm -rf permissions_test
[[email protected] ~]$
[[email protected] ~]$ stat permissions_test
stat: cannot stat ‘permissions_test’: No such file or directory
The only reason the user wouldn't be able to remove a file is in two instances:

  • The file has had its attributes modified (i.e., it is immutable)
  • The file is not owned by the same UID/GID as the users.

What I believe is actually occurring here is because the folder does not have the executable value, when the pruning operation takes place it is trying to navigate the directory to see if there is content below but cannot and times out doing so - it doesn't report back an error, or the error is not the expected error. This isn't really a straightforward permissions error (though still technically one).
 

CanSpace

Well-Known Member
PartnerNOC
Nov 25, 2011
70
61
68
cPanel Access Level
DataCenter Provider
I don't think it's that simple.

If lauren:lauren creates directory "dir1", goes in to dir1 and then creates a file "file1" (with permissions 777.. or anything else.. permissions on the file do not matter), then goes up a level, chmod's dir1 to 400, and then tries to rm -rf dir1, the command will fail.

For example:

[[email protected] ~]$ mkdir dir1
[[email protected] ~]$ cd dir1
[[email protected] dir1]$ touch file1
[[email protected] dir1]$ cd ..
[[email protected] ~]$ chmod 400 dir1
[[email protected] ~]$ rm -rf dir1
rm: cannot remove ‘test1/file1’: Permission denied

As you can see, the rm -rf fails. This is the command that the cPanel backup pruning process runs, and why it always fails.
 
  • Like
Reactions: cPanelLauren