How do incremental backups work?

coursevector

Well-Known Member
Feb 23, 2015
162
28
68
cPanel Access Level
Root Administrator
I know there have been some posts before on how this works but non really explain it in detail.

So I am currently using compressed backups and it's getting to the point where it's impacting the server and taking too long. I'd like to switch to incremental backups. Our backups are uploaded to S3.

Here is my first question, if I switch to incremental, when is the full backup created? So for example, I set it to incremental on Jan 1 with daily incrementals and I want to retain the last 30 days of backups. It uploads to S3 and the bucket has a retention policy to only keep the last 31 days (for example). So, March 30, i decide i need to restore a backup, do I need all 90 days worth of backups in order to restore the account? I'd assume not if the retention period is set to 30 days, but that means at some point, it created a new full backup for the incrementals to build from. So on day 31, did it create a new full backup or did it merge day 1 and day 2? And if it is merging the two oldest days, how can it do this if they are uploaded to S3 and not stored on the server itself?

My other question is, what if i have a full backup on Jan 1, it fails on the 3rd day and I go to restore in on the 7th day. Will all the incrementals on the 4th, 5th, 6th be corrupted since they build off the 3rd incremental?

Finally, how does this handle files that were deleted? If i have an account with a large log file i want deleted. If i go to restore it, will that file come back? (same goes for directories)

If someone could clear up how incrementals work in detail, i'd appreciate it.
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston

coursevector

Well-Known Member
Feb 23, 2015
162
28
68
cPanel Access Level
Root Administrator
@cPanelLauren the first link doesn't explain much, just how to check the true disk space usage.
I will admit I did not see the rsync notice on the documentation page. I realize i won't be able to use S3 (unless cPanel supports rclone in the future Amazon S3 ). But say I do decide to backup to an rsync compatible device, could you explain how incrementals truly work? The documentation and that forum post don't answer my original questions. Thanks.
 

coursevector

Well-Known Member
Feb 23, 2015
162
28
68
cPanel Access Level
Root Administrator
cPanel incremental backups isn't great. For good incremental backups you can look into CDP,Acronis or if you wish to go with free tools go with Borg or restic
Thanks for the input, although I'm particularly interested in cPanel account backups. I already backup the server using AWS snapshots right now. But I'd like to have a secondary method for individual accounts. That way i don't need to restore a server just to fix one account.

For those looking at this ticket in the future, some of those software names are vague and match a million hits. So here are links (i hope are correct) to them:
CDP: (couldn't find this one)
Acronis (Windows): Backup Software & Data Protection Solutions - Acronis
Borg: BorgBackup – Deduplicating archiver with compression and authenticated encryption
Restic: restic · Backups done right!
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston
Hi @coursevector

I know our community team is working on a blog post about this subject and I was hoping it would be published this week but looks like it might be a couple more days.

if I switch to incremental, when is the full backup created
The full backup is created when you switch to incremental backups, meaning that first backup run will create the full backup.

So, March 30, I decide I need to restore a backup, do I need all 90 days worth of backups in order to restore the account? I'd assume not if the retention period is set to 30 days, but that means at some point, it created a new full backup for the incrementals to build from.
You don't need all 90 days, in fact, the most recent backup would be enough, since it's hardlinked to the oldest version present on the server

My other question is, what if i have a full backup on Jan 1, it fails on the 3rd day and I go to restore in on the 7th day. Will all the incrementals on the 4th, 5th, 6th be corrupted since they build off the 3rd incremental?
This works the same as with any backups it's not going to allow you to restore past the point where it has valid backups so if the 4th failed your last restore point is the 3rd

Finally, how does this handle files that were deleted? If i have an account with a large log file i want deleted. If i go to restore it, will that file come back? (same goes for directories)
You'll still have several points to restore from, each day the new backup is created the hardlinks are added for that day's backup process - it's not just one flat file. To prove this I switched my backups to incremental and modified the date on my local machine so I could get multiple days of backups present. Then went into the restore interface:

backup_restoration.png


You can also see that they're present as separated dates in my backup directory:

Code:
[[email protected] backup]# ls -lah
total 80K
drwx--x--x  6 root root 4.0K Jun 22 00:01 .
dr-xr-xr-x 20 root root 4.0K Jun 22 00:01 ..
drwx--x--x  4 root root 4.0K Jun 20 13:31 2019-06-20
drwx--x--x  4 root root 4.0K Jun 21 00:00 2019-06-21
drwx--x--x  4 root root 4.0K Jun 22 00:00 2019-06-22
drwx--x--x  2 root root 4.0K Jun 22 00:01 .meta
-rw-------  1 root root  56K Jun 22 00:01 transports.db
[[email protected] backup]#
 

coursevector

Well-Known Member
Feb 23, 2015
162
28
68
cPanel Access Level
Root Administrator
You don't need all 90 days, in fact, the most recent backup would be enough, since it's hardlinked to the oldest version present on the server
Ok, that kinda answers my question but avoids the root of what I'm trying to determine. I can't keep backups indefinitely, at some point retention rules kick in. For my example I was using 30 days and trying to restore 90 days from flipping the switch. I understand I may not need all 90 days, but ..... ok just had a thought and maybe I'm misunderstanding it because of my experience with Macrium Reflect and how it handles incremental backups. Let me rephrase then, with macrium you create a full backup. Then each incremental is based of changes of the full backup. I thought cpanel did something similar but maybe not? Is it creating an incremental off the actual files each time? Well.... on second thought it can't, that would be a differential and not an incremental.

So it must be an incremental off the full backup. So if I go 90 days out, only retain the last 30 days, when is the updated full backup created? You said I don't need all 90 days, so at some point a new full was created to be used 30 days out. What is the logic for creating an updated full backup?
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston
So it must be an incremental off the full backup. So if I go 90 days out, only retain the last 30 days, when is the updated full backup created? You said I don't need all 90 days, so at some point a new full was created to be used 30 days out. What is the logic for creating an updated full backup?
To understand this you'll need to understand what a hardlink is. The resource here might be helpful: What is a hard link? -- definition by The Linux Information Project (LINFO)

When a file is hardlinked it shares the same inode as the original file, you can remove the original file and retain the data since the OS see's it no differently than if the file existed in both locations or as it being the same file with different names to put it more accurately, with the exception of storage calculation. This is how incremental backups retention works. For example let's say I have incremental backups enabled and I have retention set to only keep two copies of the backups. When my system has made the third backup it will prune the original leaving the originally hardlinked backup as the data storage point since that hardlink shares the same inode as the original. You can see this exampled by the following:

The file is created:
Code:
bash-4.2# touch file
bash-4.2# ls -i file
14026022 file
The link is created and we can see the inode is the same:
Code:
bash-4.2# ln file newfile
bash-4.2# ls -i file newfile
14026022 file  14026022 newfile
The original file is removed:
Code:
bash-4.2# /bin/rm file
we can see post removal of the original file, the hardlinked file retains the same inode:
Code:
bash-4.2# ls -i file newfile
ls: cannot access file: No such file or directory
14026022 newfile