Community Forums
Connect with us on LinkedIn
Community Notice
Closed Thread
Results 1 to 13 of 13
  1. #1
    Member
    Join Date
    Jul 2003
    Posts
    129

    Default Bad Experience with R1Soft - Any replacements?

    At the end of May, we identified that we needed a uniform backup solution for both our Windows and Linux shared servers to sleep easy at night. cPanel's backup solution simply causes too much I/O load especially as we start having customers with 60-70 GB of files that we can no longer backup their account.

    So, we deployed R1Soft on our Windows servers to test and they worked great. But when it came to cPanel/Linux servers, we've had nothing but trouble and their support hasn't helped at all.

    There are several purposes to this post and I'll outline them before I go through my setup and experiences with R1Soft:

    1) To see if there is a replacement out there that integrates with cPanel well.
    2) To see if anyone from R1Soft can reply to this. Believe it or not, you have to get 'approved' to post on their own forums. Its good that R1soft is putting a forum up for transparency just to turn around and put a restriction on who can post on there. Well, I'm in the queue. Just like I'm in their support queue too.
    3) To see if anyone experienced this and can help me resolve it.


    With the increasing number of issues we faced with I/O, I quickly took advantage of their special last month and put all of our shared servers on R1Soft. I figured, backups are very important so we played everything save. Every server is a Dual Xeon Quad Core 5430/5450 with RAID 10 500 GB drives. The repository is external (Unlike bluehost even though our issue sounds just like theirs) and is kept on a Xeon 3060 with 4x1TB Drives. Only 4 hosts per server so no issue with "slow server" here.

    Now, Windows servers backup absolutely fine. I haven't tried restoring files but I'm assuming that will work as it has worked on our linux servers. However, about 50% of our linux servers' loads spike to 100+ several times during a backup procedures.

    For about 5-10 minutes, the load stays at 2-3 (we try to keep our load always under 1.0 but 2-3 aint so bad considering the power of these servers)
    then, the load spikes to 100+. If we kill off all other processes and let it run, it will come back down to 2-3 after about 5 minutes just to shoot up again 5-10 minutes after. If we don't jump on it, the server is down.

    We originally had the servers on daily backups and switched it to hourly thinking with less I/O at a time, issues would be resolved. But no, with hourly, we have to worry about the server going down every hour instead of every day.

    Financially, We're out $4000+ in R1Soft licenses and much more in backup server infrastructure and time. Several customers have already complained due to stability issues of the servers. And we pride ourselves in our reviews, stability and reliability. The backup processes are on hold for the time being.

    As for my experience with support, I read on their forums that if I generate logs using their bash script and send it in, they'll look at it. The ticket was opened early this month (SUPPORT-949324). The response I got was -

    There's an issue that we've been seeing lately that sounds a lot like what you're seeing. We're not sure what the root cause is yet, but it will cause the load on some Linux systems to go up, but not on others, even if identically configured. We have been testing the new driver for the 3.0 release against systems that exhibit these symptoms, and the 3.0 driver doesn't seem to be affected by it.

    One thing that has helped some customers is rolling back to an older kernel. This only seems to affect recent kernels in the RedHat/CentOS line. Rolling back to an older kernel may not be feasible for security reasons, but it may be something to try in the meantime while the development team works on getting the 3.0 release ready to go.
    What I got out of it: So, compromise security or compromise backups. I mean it really doesn't matter that you've paid this high price for a "reputable" backup software. Choose one or the other until our developers are done with v3.

    I went back and tried many other things (some are listed above). My last change was upgrading to their latest Server/Linux Agent pair (1.63 I believe and 2.15). Nothing changed. Its almost like clockwork. Watch it watch it, then all of a sudden load at 100.

    All of this happening when our company is growing at a rate of over 300% and deploying servers left and right. I really don't have the time to fool around more with a software that was advertised to me as "High Performance". Please someone let me know where we can go from here.
    Arvixe - Freedom of the web at your fingertips

  2. #2
    Member
    Join Date
    Jan 2007
    Posts
    10

    Default RE: this is a kernel bug not an r1soft bug

    I am familiar with the crazy high load you have experienced.

    Its a VERY RARE issue, but nonetheless very serious for those that experience it.

    This is a known issue in the linux kernel. There is a bug open for it on kernel.org and its unresolved as it appears to be complex combination of certain disk controller device drivers and disk I/O scheduling.

    Repeat - this is not a bug in r1softt's product... our product does a lot of disk I/O and we can set it off under the right conditions e.g. planets aligned particular storage controllers i/o schedulers etc

    Bug 12309 – Large I/O operations result in slow performance and high iowait times

    Its most commonly but not always seen in OpenVZ or Virtuozzo kernels.

    I have had two customers that experienced the issue on a couple of servers and by migrating to different hardware the issue completely went away... same r1soft rev... same kernel rev etc. Only thing changed is hardware and storage controller driver.

    Our customers have been able to reproduce the same issue WITHOUT R1Soft even loaded on the system.

    Here is notes form one of our customers who has done exhaustive research into this issue:

    #######################
    5/12/2009
    Joe / David,

    In our testing we were able to reproduce the same condition that has been happening with openvz/cdp scenarios. It seems to relate to accessing large files. In the case scenario where we could replicate the condition, we tar'd a directory of ~25GB of files in the 5-20MB size range. Near the end of this test, the system in question exhibited the exact same signs as our systems which failed during the backup process. In order to get the IO to spin out of control we also ran some other intensive tasks which kept io ops around ~1,000/sec -- at the point of failure sar had logged ~6,250/sec.

    What's different about the failures with CDP is that sometimes the failures occurred at the very first stage of the backup, other times near the end of the backup.

    There is some online chatter as of late regarding the linux kernel and iowait issues: Bug 12309 – Large I/O operations result in slow performance and high iowait times

    Hopefully this information assists you in tracking down how you are triggering the failure.
    #######################

    What I can also tell you is that we have done everything possible to workaround this kernel bug including rewriting our CDP device driver for our new CDP 3 product. We believe by doing disk I/O a different way we appear not to trigger the kernel scheduling issue.

    And I repeat again. This is a kernel bug not an r1soft bug.

    As far as posting on the forums we started screening new forum members some time ago to stop spammers. I will look into see why you were not approved or the approval process itself.

    Regards,
    -David Wartell
    R1Soft Founder

  3. #3
    Member
    Join Date
    Jan 2007
    Posts
    10

    Default RE: your r1soft forum registration

    I think I found you arvand. Your registration is now approved in the r1soft forums. Your username was in a long list of addresses like: buyeffexor _at_ rxmednow.com

    Sorry for your registration not being approved. Thanks for lettign us know we made a mistake.

    Like most forums we get a lot of spammer registration mixed in with real ones and its very easy to not approve someone that is not a spammer.

  4. #4
    Member
    Join Date
    Sep 2004
    Posts
    887

    Default

    Some advanced Captcha would probably help you out, and would likely make your customers much happier when they are trying to get support.

    M

  5. #5
    Member
    Join Date
    Jul 2004
    Posts
    37

    Default

    We too are a R1SOFT customer... and have had these issues also... but as late (after we UPGRADED not downgraded our kernels) it has been less of a problem.

    I can say, one thing that will cause these load issues is if your server is running MYSQL, and you are using the R1SOFT mySQL agent, and due to existing locks happening on your server, due to bad SQL, when the R1SOFT mySQL agent tries to do it's FLUSH to grab a consistent database backup, the mySQL agent FLUSH TABLES, and existing locks, will snowball, and load will jump. You should run mySQL processlist (or something like MONyog) to monitor this during a backup to see if this is the cause.

    Some other things we have seen are IO errors during the R1SOFT backup, which R1SOFT first tried to blame on a particular server, but even when we did upgrades (with 100% new hardware) to new servers, ADAPTEC controllers, and hard drives - we still get these from time to time.

    Another problem is the way INNODB backups are handled - the restore process it a problem... this being most true in a VZ environment.

    Anther problem is memory based MYISAM tables are backed up file system only, and not by the mySQL agent - so when doing a restore using the mySQL agent, you need to go through and manually restore the memory based table .FRM(s).

    I like R1SOFT. The company and the people. But in all honesty, they need to get this v3 release out the door - and stop just talking about it.

    From the gentleman's original post - where he paid $4000, our purchases, and many others I am sure -they should have the money for development.

    Spend the money - hire the people - get the focus - and get v3 out the door.

    We all are a bit tired of waiting for it. At this point, the problems we have have gone from minor annoyances to major hassles.

    So - R1SOFT, how about stopping delaying the release of v3 any longer - do what you need to do - and get it out SOONER.

    We are all tired of waiting.

    Smoge

  6. #6
    Member
    Join Date
    Jul 2004
    Posts
    37

    Default

    Quote Originally Posted by david-r1soft View Post
    rewriting our CDP device driver for our new CDP 3 product.
    David - understand this... we don't care anymore.

    We just want it fixed.

    Get v3.0 out now.

    Enough of "when 3.0 comes out all will be ok" - we are tired of this rhetoric.

    Smoge

  7. #7
    Member
    Join Date
    Jul 2003
    Posts
    129

    Default

    Quote Originally Posted by david-r1soft View Post
    I am familiar with the crazy high load you have experienced.

    Its a VERY RARE issue, but nonetheless very serious for those that experience it.

    This is a known issue in the linux kernel. There is a bug open for it on kernel.org and its unresolved as it appears to be complex combination of certain disk controller device drivers and disk I/O scheduling.

    Repeat - this is not a bug in r1softt's product... our product does a lot of disk I/O and we can set it off under the right conditions e.g. planets aligned particular storage controllers i/o schedulers etc

    Bug 12309 – Large I/O operations result in slow performance and high iowait times

    Its most commonly but not always seen in OpenVZ or Virtuozzo kernels.

    I have had two customers that experienced the issue on a couple of servers and by migrating to different hardware the issue completely went away... same r1soft rev... same kernel rev etc. Only thing changed is hardware and storage controller driver.

    Our customers have been able to reproduce the same issue WITHOUT R1Soft even loaded on the system.

    Here is notes form one of our customers who has done exhaustive research into this issue:

    #######################
    5/12/2009
    Joe / David,

    In our testing we were able to reproduce the same condition that has been happening with openvz/cdp scenarios. It seems to relate to accessing large files. In the case scenario where we could replicate the condition, we tar'd a directory of ~25GB of files in the 5-20MB size range. Near the end of this test, the system in question exhibited the exact same signs as our systems which failed during the backup process. In order to get the IO to spin out of control we also ran some other intensive tasks which kept io ops around ~1,000/sec -- at the point of failure sar had logged ~6,250/sec.

    What's different about the failures with CDP is that sometimes the failures occurred at the very first stage of the backup, other times near the end of the backup.

    There is some online chatter as of late regarding the linux kernel and iowait issues: Bug 12309 – Large I/O operations result in slow performance and high iowait times

    Hopefully this information assists you in tracking down how you are triggering the failure.
    #######################

    What I can also tell you is that we have done everything possible to workaround this kernel bug including rewriting our CDP device driver for our new CDP 3 product. We believe by doing disk I/O a different way we appear not to trigger the kernel scheduling issue.

    And I repeat again. This is a kernel bug not an r1soft bug.

    As far as posting on the forums we started screening new forum members some time ago to stop spammers. I will look into see why you were not approved or the approval process itself.

    Regards,
    -David Wartell
    R1Soft Founder
    Thank you for your response. You haven't really provided us with any solutions though.

    In any business, there are aspects of the business that decrease the quality of your product. We see it in our own web hosting business. You can't just turn around and blame it on bugs or other things.

    Your company chose to develop this product for linux, you are now a stake holder in its faults and flaws. If the kernels are randomly failing during I/O, and your application relies heavily in I/O, as you have mentioned, your company must take note and develop work arounds.

    As the previous poster mentions, having a workaround 'coming out' doesn't help anybody. I just came back to this post after another one of the servers which I had thought was not spiking during buagent spiked to 40+ load. We just can't continue to do business this way...
    Arvixe - Freedom of the web at your fingertips

  8. #8
    Member
    Join Date
    Jul 2003
    Posts
    129

    Default This may be the issue with our case?

    David,

    When deploying servers, several got the original partitioning "suggested by cPanel" based on what SoftLayer techs told us. In any case, the partition for /var was not big enough. So for mysql databases, we created /home/mysql and did a symbolic link from /var/lib/mysql to that directory.

    Could this cause the high load issue?
    Arvixe - Freedom of the web at your fingertips

  9. #9
    Member
    Join Date
    Jul 2004
    Posts
    37

    Default

    Quote Originally Posted by Arvand View Post
    several got the original partitioning "suggested by cPanel" based on what SoftLayer techs told us.
    I know this was directed to David - but in general, you should really should, as you know now perhaps, not use the rather useless Softlayer recommended partitions.

    Instead, you should have a dedicated /var (10GB maybe) and /var/lib/mysql (depends, we usually do 40 or 50GB).

    As for your spiking - you can do a test yourself, if you don't have a way or desire to run mysql process list while your backups are running.

    Login and run the mysql CLI

    mysql

    then type

    FLUSH TABLES;

    see how long it takes... if it takes a long time, then perhaps your mySQL server has a lot of locking going on (bad mySQL by your users).

    But - I hope this side note by both of us does not side track the thread - that ...

    Show us the stuff! Get v3.0 out now.

  10. #10
    Member
    Join Date
    Jun 2004
    Posts
    78

    Default

    Thank you Arvand and smoge for your input.

    We have been considering this solution but still have yet to justify the high costs involved with deploying it on all systems.

    Since we certainly cannot afford to buy now and wait for a working solution as you currently are. We will continue to watch this evolve as well as test other products hoping something less expensive will come along.

  11. #11
    Member
    Join Date
    Jul 2004
    Posts
    37

    Default

    I called R1SOFT, spoke to them... and told them I replied in this thread about being tired of waiting for v3.

    We spoke for a while - maybe 30 minutes or so... more or less, it was indicated they are working on v3 - it's not ready - lots of people like v2....

    I also like v2... in fact, I just ordered another mySQL and Archive license today... it is a nice product..

    I just wish 3.0 was coming sooner - but with some things in life... wadda ya gonna do?

    It is a good product - it does work well - its just that we want it all to be "perfect". Demands are high.

    I still recommend it.

    Smoge

  12. #12
    cPanel Staff mario-cPanel's Avatar
    Join Date
    Oct 2007
    Location
    Houston, Texas, United States
    Posts
    59
    cPanel/Enkompass Access Level

    Website Owner

    Default

    Quote Originally Posted by smoge View Post
    I called R1SOFT, spoke to them... and told them I replied in this thread about being tired of waiting for v3.

    We spoke for a while - maybe 30 minutes or so... more or less, it was indicated they are working on v3 - it's not ready - lots of people like v2....

    I also like v2... in fact, I just ordered another mySQL and Archive license today... it is a nice product..

    I just wish 3.0 was coming sooner - but with some things in life... wadda ya gonna do?

    It is a good product - it does work well - its just that we want it all to be "perfect". Demands are high.

    I still recommend it.

    Smoge
    Smoge that is great to hear that you were able to speak to our ISV Partner on your concerns.

    Let us know if we can be of any further assistance.
    Mario Rodriguez
    cPanel.net
    Strategic Partner Manager
    mario@cPanel.net
    415-894-5882 / aim: cpanelmario

  13. #13
    Member
    Join Date
    Jul 2003
    Posts
    129

    Default

    David/Mario,

    Out of 40 or so servers, over 4-5 are doing this. Our night shift is working harder than our day shift because these servers are just going up and down constantly. We are losing customers. That was not the plan when we opted for R1Soft.

    What can we do? Over the ticket system, your staff member said that you "tried the 3.0 agent on a client computer and it fixed the issue". How much more business do we need to do to get this treatment? It's past experimental or buggy right now. This, as it stands, is unusable.

    And of course, we can't just not run it because we have data we are liable for. So it's a catch-22. Run it and you lose customers due to downtime. Don't run it and you lose customers and run the risk of getting sued due to no backups.

    Or, say screw it to how much we paid R1Soft and go back to the cPanel backup which was plagued with the same type of issues but on a weekly basis. At least it was weekly and not nightly.
    Arvixe - Freedom of the web at your fingertips

Similar Threads & Tags
Similar threads

  1. R1Soft 3 for cPanel
    By david_sh in forum cPanel and WHM Discussions
    Replies: 2
    Last Post: 05-05-2011, 04:48 PM
  2. Using R1soft to bckup
    By jameshsi in forum Data Protection
    Replies: 6
    Last Post: 04-13-2009, 10:21 AM
  3. wrong fs type, bad option, bad superblock
    By katmai in forum cPanel and WHM Discussions
    Replies: 2
    Last Post: 06-05-2007, 02:40 PM
  4. Nick Nick Nick!!! /tmp full ...bad bad bad
    By rpmws in forum cPanel and WHM Discussions
    Replies: 22
    Last Post: 10-21-2005, 10:35 AM
Linkedin       Facebook       Twitter       RSS       Flickr       YouTube