Backups and broken RAID1

kenashkov

Active Member
Nov 23, 2006
33
0
156
Sofia, Bulgaria
cPanel Access Level
Root Administrator
Hello everyone,

one of our server has 2 drives with each partition in RAID1. When the scheduled backup occurs during the weekend the array always gets broken. Because of this during the backup it starts rebuilding and these two processes kill the server. I don't have a proof but it seems the backup causes the array to break (as during normal operation the array is fine). What could be the reason for that?

Vesko Kenashkov
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
43
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
Hello Vesko,

What are you using to perform the backup operations? Is this cpbackup or something else being used?

Next, do you have any logs that show us the exact errors occurring during this time?

Thanks!
 

kenashkov

Active Member
Nov 23, 2006
33
0
156
Sofia, Bulgaria
cPanel Access Level
Root Administrator
Hi Tristan,

I'm sorry I didn't specify explicitly - we are using the cpbackup functionality over FTP. I was just wondering is it possible the load created by tar/gzip to have something to do with the RAID1 breaking. In the /var/log/messages there are no useful messages as to why the array did break (just the sync messages).
Is there is setting to change the priority of the backup process, or I have to change it manually each time with nice?

Vesko Kenashkov
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
43
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
Hi Vesko,

I'm uncertain for your questions on the load causing the RAID1 to break, but it seems unlikely that a high load would keep causing that issue to me personally.

For changing the priority of the backup, there isn't an option to renice the process or change the nice value of the process for the backups, but you might want to consider a feature request to set a nice value for the cpbackup process in WHM > Backup > Configure Backup area. I could imagine that having a lower priority nice value might be beneficial to users who are having any load-based issues during the backup. The location for feature requests is the following:

Feature Requests for cPanel and WHM

Thanks.
 

JerrySmith

Active Member
Apr 21, 2011
35
0
56
Hello,

I am not a RAID expert, but I thought I would share my thoughts with you.

I have never heard of a backup process or high load causing a RAID to fail. It is very possible it would cause a performance bottleneck but should not cause an actual failure of the RAID.

Are you using hardware or software raid?

If you are using hardware raid, it is possible the controller card is failing and when given a lot of IO is causing the array to fail. I would have your datacenter replace the card to see if this resolves the issue.

Another potential cause would be one or more of the drives in the array failing. I would check smartctl to see if there are any errors on the drives; Your datacenter can assist with this.
 

kenashkov

Active Member
Nov 23, 2006
33
0
156
Sofia, Bulgaria
cPanel Access Level
Root Administrator
Hello,

the RAID is software and no device is failing. At it only happened only when there is backup ongoing we decided there could be some connection between the two. We will be looking elsewhere for the problem...
Than you for comments.

Vesko Kenashkov
 

nobodyk

Well-Known Member
Aug 1, 2010
90
0
56
This is actually common on non-qualified raid hard drives. What type of hard drives are you using?
 

JerrySmith

Active Member
Apr 21, 2011
35
0
56
Hello,

Thanks for bringing that to my attention, nobodyk.

In my earlier reply, I had forgotten to mention that there is a big difference between consumer and raid class HDDs.

This could very well be causing his issue with software RAID.
 

kenashkov

Active Member
Nov 23, 2006
33
0
156
Sofia, Bulgaria
cPanel Access Level
Root Administrator
Today is Sunday and again the backup is running and recync of the RAID1 volumes just started.
So it appears that high I/O load can cause the RAID1 to recync... I simply cant find any other explanation (as these two events always coincide).

I'm not sure the issue is related to the hard drives as this is software RAID and I imagine it should be hardware agnostic. Maybe there is an issue with the raid implementation in the kernel.

I don't think the hard drives are anything special - bog standard SATA 2 from WD.