The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Bad Experience with R1Soft - Any replacements?

Discussion in 'Data Protection' started by Arvand, Jul 17, 2009.

Thread Status:
Not open for further replies.
  1. Arvand

    Arvand Well-Known Member
    PartnerNOC

    Joined:
    Jul 26, 2003
    Messages:
    130
    Likes Received:
    1
    Trophy Points:
    18
    At the end of May, we identified that we needed a uniform backup solution for both our Windows and Linux shared servers to sleep easy at night. cPanel's backup solution simply causes too much I/O load especially as we start having customers with 60-70 GB of files that we can no longer backup their account.

    So, we deployed R1Soft on our Windows servers to test and they worked great. But when it came to cPanel/Linux servers, we've had nothing but trouble and their support hasn't helped at all.

    There are several purposes to this post and I'll outline them before I go through my setup and experiences with R1Soft:

    1) To see if there is a replacement out there that integrates with cPanel well.
    2) To see if anyone from R1Soft can reply to this. Believe it or not, you have to get 'approved' to post on their own forums. Its good that R1soft is putting a forum up for transparency just to turn around and put a restriction on who can post on there. Well, I'm in the queue. Just like I'm in their support queue too.
    3) To see if anyone experienced this and can help me resolve it.


    With the increasing number of issues we faced with I/O, I quickly took advantage of their special last month and put all of our shared servers on R1Soft. I figured, backups are very important so we played everything save. Every server is a Dual Xeon Quad Core 5430/5450 with RAID 10 500 GB drives. The repository is external (Unlike bluehost even though our issue sounds just like theirs) and is kept on a Xeon 3060 with 4x1TB Drives. Only 4 hosts per server so no issue with "slow server" here.

    Now, Windows servers backup absolutely fine. I haven't tried restoring files but I'm assuming that will work as it has worked on our linux servers. However, about 50% of our linux servers' loads spike to 100+ several times during a backup procedures.

    For about 5-10 minutes, the load stays at 2-3 (we try to keep our load always under 1.0 but 2-3 aint so bad considering the power of these servers)
    then, the load spikes to 100+. If we kill off all other processes and let it run, it will come back down to 2-3 after about 5 minutes just to shoot up again 5-10 minutes after. If we don't jump on it, the server is down.

    We originally had the servers on daily backups and switched it to hourly thinking with less I/O at a time, issues would be resolved. But no, with hourly, we have to worry about the server going down every hour instead of every day.

    Financially, We're out $4000+ in R1Soft licenses and much more in backup server infrastructure and time. Several customers have already complained due to stability issues of the servers. And we pride ourselves in our reviews, stability and reliability. The backup processes are on hold for the time being.

    As for my experience with support, I read on their forums that if I generate logs using their bash script and send it in, they'll look at it. The ticket was opened early this month (SUPPORT-949324). The response I got was -

    What I got out of it: So, compromise security or compromise backups. I mean it really doesn't matter that you've paid this high price for a "reputable" backup software. Choose one or the other until our developers are done with v3.

    I went back and tried many other things (some are listed above). My last change was upgrading to their latest Server/Linux Agent pair (1.63 I believe and 2.15). Nothing changed. Its almost like clockwork. Watch it watch it, then all of a sudden load at 100.

    All of this happening when our company is growing at a rate of over 300% and deploying servers left and right. I really don't have the time to fool around more with a software that was advertised to me as "High Performance". Please someone let me know where we can go from here.
     
  2. david-r1soft

    david-r1soft Member

    Joined:
    Jan 4, 2007
    Messages:
    10
    Likes Received:
    0
    Trophy Points:
    1
    RE: this is a kernel bug not an r1soft bug

    I am familiar with the crazy high load you have experienced.

    Its a VERY RARE issue, but nonetheless very serious for those that experience it.

    This is a known issue in the linux kernel. There is a bug open for it on kernel.org and its unresolved as it appears to be complex combination of certain disk controller device drivers and disk I/O scheduling.

    Repeat - this is not a bug in r1softt's product... our product does a lot of disk I/O and we can set it off under the right conditions e.g. planets aligned particular storage controllers i/o schedulers etc

    Bug 12309 – Large I/O operations result in slow performance and high iowait times

    Its most commonly but not always seen in OpenVZ or Virtuozzo kernels.

    I have had two customers that experienced the issue on a couple of servers and by migrating to different hardware the issue completely went away... same r1soft rev... same kernel rev etc. Only thing changed is hardware and storage controller driver.

    Our customers have been able to reproduce the same issue WITHOUT R1Soft even loaded on the system.

    Here is notes form one of our customers who has done exhaustive research into this issue:

    #######################
    5/12/2009
    Joe / David,

    In our testing we were able to reproduce the same condition that has been happening with openvz/cdp scenarios. It seems to relate to accessing large files. In the case scenario where we could replicate the condition, we tar'd a directory of ~25GB of files in the 5-20MB size range. Near the end of this test, the system in question exhibited the exact same signs as our systems which failed during the backup process. In order to get the IO to spin out of control we also ran some other intensive tasks which kept io ops around ~1,000/sec -- at the point of failure sar had logged ~6,250/sec.

    What's different about the failures with CDP is that sometimes the failures occurred at the very first stage of the backup, other times near the end of the backup.

    There is some online chatter as of late regarding the linux kernel and iowait issues: Bug 12309 – Large I/O operations result in slow performance and high iowait times

    Hopefully this information assists you in tracking down how you are triggering the failure.
    #######################

    What I can also tell you is that we have done everything possible to workaround this kernel bug including rewriting our CDP device driver for our new CDP 3 product. We believe by doing disk I/O a different way we appear not to trigger the kernel scheduling issue.

    And I repeat again. This is a kernel bug not an r1soft bug.

    As far as posting on the forums we started screening new forum members some time ago to stop spammers. I will look into see why you were not approved or the approval process itself.

    Regards,
    -David Wartell
    R1Soft Founder
     
  3. david-r1soft

    david-r1soft Member

    Joined:
    Jan 4, 2007
    Messages:
    10
    Likes Received:
    0
    Trophy Points:
    1
    RE: your r1soft forum registration

    I think I found you arvand. Your registration is now approved in the r1soft forums. Your username was in a long list of addresses like: buyeffexor _at_ rxmednow.com

    Sorry for your registration not being approved. Thanks for lettign us know we made a mistake.

    Like most forums we get a lot of spammer registration mixed in with real ones and its very easy to not approve someone that is not a spammer.
     
  4. mtindor

    mtindor Well-Known Member

    Joined:
    Sep 14, 2004
    Messages:
    1,279
    Likes Received:
    36
    Trophy Points:
    48
    Location:
    inside a catfish
    cPanel Access Level:
    Root Administrator
    Some advanced Captcha would probably help you out, and would likely make your customers much happier when they are trying to get support.

    M
     
  5. smoge

    smoge Well-Known Member

    Joined:
    Jul 2, 2004
    Messages:
    52
    Likes Received:
    0
    Trophy Points:
    6
    We too are a R1SOFT customer... and have had these issues also... but as late (after we UPGRADED not downgraded our kernels) it has been less of a problem.

    I can say, one thing that will cause these load issues is if your server is running MYSQL, and you are using the R1SOFT mySQL agent, and due to existing locks happening on your server, due to bad SQL, when the R1SOFT mySQL agent tries to do it's FLUSH to grab a consistent database backup, the mySQL agent FLUSH TABLES, and existing locks, will snowball, and load will jump. You should run mySQL processlist (or something like MONyog) to monitor this during a backup to see if this is the cause.

    Some other things we have seen are IO errors during the R1SOFT backup, which R1SOFT first tried to blame on a particular server, but even when we did upgrades (with 100% new hardware) to new servers, ADAPTEC controllers, and hard drives - we still get these from time to time.

    Another problem is the way INNODB backups are handled - the restore process it a problem... this being most true in a VZ environment.

    Anther problem is memory based MYISAM tables are backed up file system only, and not by the mySQL agent - so when doing a restore using the mySQL agent, you need to go through and manually restore the memory based table .FRM(s).

    I like R1SOFT. The company and the people. But in all honesty, they need to get this v3 release out the door - and stop just talking about it.

    From the gentleman's original post - where he paid $4000, our purchases, and many others I am sure -they should have the money for development.

    Spend the money - hire the people - get the focus - and get v3 out the door.

    We all are a bit tired of waiting for it. At this point, the problems we have have gone from minor annoyances to major hassles.

    So - R1SOFT, how about stopping delaying the release of v3 any longer - do what you need to do - and get it out SOONER.

    We are all tired of waiting.

    Smoge
     
  6. smoge

    smoge Well-Known Member

    Joined:
    Jul 2, 2004
    Messages:
    52
    Likes Received:
    0
    Trophy Points:
    6
    David - understand this... we don't care anymore.

    We just want it fixed.

    Get v3.0 out now.

    Enough of "when 3.0 comes out all will be ok" - we are tired of this rhetoric.

    Smoge
     
  7. Arvand

    Arvand Well-Known Member
    PartnerNOC

    Joined:
    Jul 26, 2003
    Messages:
    130
    Likes Received:
    1
    Trophy Points:
    18
    Thank you for your response. You haven't really provided us with any solutions though.

    In any business, there are aspects of the business that decrease the quality of your product. We see it in our own web hosting business. You can't just turn around and blame it on bugs or other things.

    Your company chose to develop this product for linux, you are now a stake holder in its faults and flaws. If the kernels are randomly failing during I/O, and your application relies heavily in I/O, as you have mentioned, your company must take note and develop work arounds.

    As the previous poster mentions, having a workaround 'coming out' doesn't help anybody. I just came back to this post after another one of the servers which I had thought was not spiking during buagent spiked to 40+ load. We just can't continue to do business this way...
     
  8. Arvand

    Arvand Well-Known Member
    PartnerNOC

    Joined:
    Jul 26, 2003
    Messages:
    130
    Likes Received:
    1
    Trophy Points:
    18
    This may be the issue with our case?

    David,

    When deploying servers, several got the original partitioning "suggested by cPanel" based on what SoftLayer techs told us. In any case, the partition for /var was not big enough. So for mysql databases, we created /home/mysql and did a symbolic link from /var/lib/mysql to that directory.

    Could this cause the high load issue?
     
  9. smoge

    smoge Well-Known Member

    Joined:
    Jul 2, 2004
    Messages:
    52
    Likes Received:
    0
    Trophy Points:
    6
    I know this was directed to David - but in general, you should really should, as you know now perhaps, not use the rather useless Softlayer recommended partitions.

    Instead, you should have a dedicated /var (10GB maybe) and /var/lib/mysql (depends, we usually do 40 or 50GB).

    As for your spiking - you can do a test yourself, if you don't have a way or desire to run mysql process list while your backups are running.

    Login and run the mysql CLI

    mysql

    then type

    FLUSH TABLES;

    see how long it takes... if it takes a long time, then perhaps your mySQL server has a lot of locking going on (bad mySQL by your users).

    But - I hope this side note by both of us does not side track the thread - that ...

    Show us the stuff! Get v3.0 out now.
     
  10. S-Combs

    S-Combs Well-Known Member

    Joined:
    Jun 10, 2004
    Messages:
    78
    Likes Received:
    0
    Trophy Points:
    6
    Thank you Arvand and smoge for your input.

    We have been considering this solution but still have yet to justify the high costs involved with deploying it on all systems.

    Since we certainly cannot afford to buy now and wait for a working solution as you currently are. We will continue to watch this evolve as well as test other products hoping something less expensive will come along.
     
  11. smoge

    smoge Well-Known Member

    Joined:
    Jul 2, 2004
    Messages:
    52
    Likes Received:
    0
    Trophy Points:
    6
    I called R1SOFT, spoke to them... and told them I replied in this thread about being tired of waiting for v3.

    We spoke for a while - maybe 30 minutes or so... more or less, it was indicated they are working on v3 - it's not ready - lots of people like v2....

    I also like v2... in fact, I just ordered another mySQL and Archive license today... it is a nice product..

    I just wish 3.0 was coming sooner - but with some things in life... wadda ya gonna do?

    It is a good product - it does work well - its just that we want it all to be "perfect". Demands are high.

    I still recommend it.

    Smoge
     
  12. Mario-cPanel

    Mario-cPanel Administrator
    Staff Member

    Joined:
    Oct 4, 2007
    Messages:
    72
    Likes Received:
    49
    Trophy Points:
    8
    Location:
    Houston, Texas, United States
    cPanel Access Level:
    Website Owner
    Smoge that is great to hear that you were able to speak to our ISV Partner on your concerns.

    Let us know if we can be of any further assistance.
     
  13. Arvand

    Arvand Well-Known Member
    PartnerNOC

    Joined:
    Jul 26, 2003
    Messages:
    130
    Likes Received:
    1
    Trophy Points:
    18
    David/Mario,

    Out of 40 or so servers, over 4-5 are doing this. Our night shift is working harder than our day shift because these servers are just going up and down constantly. We are losing customers. That was not the plan when we opted for R1Soft.

    What can we do? Over the ticket system, your staff member said that you "tried the 3.0 agent on a client computer and it fixed the issue". How much more business do we need to do to get this treatment? It's past experimental or buggy right now. This, as it stands, is unusable.

    And of course, we can't just not run it because we have data we are liable for. So it's a catch-22. Run it and you lose customers due to downtime. Don't run it and you lose customers and run the risk of getting sued due to no backups.

    Or, say screw it to how much we paid R1Soft and go back to the cPanel backup which was plagued with the same type of issues but on a weekly basis. At least it was weekly and not nightly.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page