The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Success - Complete Clustered and Redundant CPanel with load Balancing and Fail Over

Discussion in 'Bind / DNS / Nameserver Issues' started by mr2jzgte, Jan 26, 2005.

  1. mr2jzgte

    mr2jzgte Well-Known Member

    Joined:
    Jun 18, 2003
    Messages:
    51
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Florida
    **Note this is for a 99.999% high availabilty project. Certian variables such as upstream connectivity can not be helped. The only other option here is to have multiple incoming ips connections routed via bgp to handle the multihomed address such as what we are using.)

    Ok.
    Brief Overview.

    We were at a stage where we had 3000 clients on one aging machine. My boss approached me and gave me a project to undertake.

    Multiple Clustered CPanel Server with expandability.

    After 2 months of solid work we've finally had complete success with this project. I'll provide some details here for anyones thats interested and to prove that it works..

    Network Diagram - best that i can do at the moment.
    [​IMG]

    Componants required

    2 x mid level machines - lots of memory + dual gige network cards (LVS front loaders)
    2-10 x mid-high level machines + dual gige cards (these are the actual cpanel machines)
    2 x low -mid level machines - FileStorage Servers. again dual gige cards.
    2 x GIG E Switches.

    In our case i used the following
    2 x P3 1.3GHz Xeons with 1gb memory - 3 x 36gb scsi disks in a mirrored array + hotswap
    2 x dual PIV 2.4Ghz Xeon Machines with 3 x 36gb scsi disks in a mirrored array + hotswap
    2 x Dual PIV 2.4ghz Xeons with 2 x 80 Sata Mirrored Array (system drive) + 4 x 200gb Sata drives in a mirrored striped array ( 380 gb usuable space)

    The network flow is data comes into the system via a multihomed address which carries the working subnet (a /28 expandable in this case)

    This is all based on Redhat Enterprise 3.0 with a 2.4 kernel.
    The 2 front loaders are identical config with exception of the ip addresses.
    1st machine handles all the connections and return traffic. Should this machine go down then the 2nd lvs machine will assume all ip's and routing information.
    These machines run IPVS, heartbeat, ldirector and thats about it + a few custom scripts for the network interfaces. (using ripd to announce routes to all machines in case one switch goes down)
    IPVS handles all the port forwarding and load balancing.
    Ldirector checks services on the cpanel machines and according to the setup will drop port forwards to any dead services and route them through to the live machine.
    Should both cpanel machines be down then ldirector modifies the IPVS table so that requests are handled by the lvs machine (i..e http webpage with system maint etc, local mail server which holds all mail in a spool ready to deliver when the servers come back up. Likewise for dns - it runs off the same config zones as the cpanel servers via a nfs share.

    Backend - 2 x fileservers. Again similar setup to above. One primary server which rsync's data to a 2nd server. If the first server goes down then data loss will be less than 10 mins. These servers also are the MYSQL cluster management services (4.1.9) Relatively dumb machines.. all they do is serve files to the cpanel machines.

    Cpanel machines.. This is where things got tricky.

    Started with a base install of Redhat Ent 3.0 with current patchset and 2.4.20-27 kernel.

    Mounts are as follows
    /home to nfs:/nfs/home
    /mounts/var to nfs:/nfs/var
    /mounts/etc to nfs:/nfs/etc
    /mounts/usr to nfs:/nfs/usr
    hard mounts with intr (this enables apps to stall and then start running again if the nfs servers go down)
    etc

    which enabled the use of symlinks (by god if i never see or have to use another symlink i'll be all too happy)

    I then proceeded to install Cpanel 10r6 on both machines.
    After this was done i started the hard bit. Modifying and symlinking.
    There were very few changes made to the cpanel program (zero if i recall correctly)
    Only major changes were converting the system to work with mysql 4.1.9 (which has a new password hashing method which caused all sorts of dramas for me till i figured out the legacy tables.)
    Main problems here were the eximstats app.
    I set it up so both machines use the same rsa keys - for https/ssl certs to work correctly.

    cpanel machines have aliased lo:0-lo:10 with the /28 subnet also all data returns via the lvs machines this setup is known as LVS-NAT

    the all machines have 10.0.x.x addresses with exception of the 2 lvs machines with have aliased interfaces for the /28 and multihomed addresses

    Thats about all for now that i can think of. If anyone has specific questions I'll see what i can do about answering them, I'm considering doing up a step by step document for this process. I wil be charging for this though due to the ammount of time and effort spent on this project.

    This setup is now live and hosting clients - No problems so far...

    Feel free to add comments / flames / disbelief etc..

    Things to do
    ** Setup monitoring software which emails admins about machine status.
    ** Work out why quotas are not working (they are working except a zero value is being returned for used space on clients account (not a real big concern with the space that we have)
    ** Finish off the snmp monitoring programs
    ** Complete the app to handle failback ftp/pop3/etcx requests
     
    #1 mr2jzgte, Jan 26, 2005
    Last edited: Jan 26, 2005
  2. Lestat

    Lestat Well-Known Member

    Joined:
    Sep 13, 2003
    Messages:
    199
    Likes Received:
    0
    Trophy Points:
    16
    I am currently on a project very similiar to yours. I just got the first machine up with AS3 on it as well with the new Cpanel. Only thing that differs is I have a nokia Checkpoint firewall to deal with. I have this cpanel plugged into a dmz unit and from the unit to the firewall. Or it might before the firewall haven't looked that far. But all seems to be working so far.I finally have figured that out to get webserver to work on the out side along with the mail. On the cpanel machine I have 2x 200gb SATA drives mirrored. And another machine on the way. As for the first cpanel machine it is setup on a local IP with the extra external IP added in cpanel so I can assign sites to it so everything reads proper on the outside. Now my delima before I move forward is getting the cpanel to also work internally so local users can get email and view website. But this cpanel machine is also the DNS. So for a user to go out then come back in is just not working. Some light on that fix would be nice. But to move on I have this other machine on its way and I will be clustering these 2 machines together but they will be in 2 different facilities. Also have a NFS server at hand but haven't even began to fathem to implementing that yet. Well just alittle run down on what I have on my plate sounds pretty similiar to yours and I would like more details on this documention you have in the works. Nice job from what I can see.
     
    #2 Lestat, Feb 11, 2005
    Last edited: Feb 11, 2005
  3. drw.net

    drw.net Registered

    Joined:
    Feb 6, 2005
    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    1
    Documentation...??

    mr2jzgte, you stated:

    "I'm considering doing up a step by step document for this process. I wil be charging for this though due to the ammount of time and effort spent on this project."

    Are these docs done? if so, what are you charging for them?
     
  4. robertwagnervv

    robertwagnervv Registered

    Joined:
    Apr 24, 2005
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    1
    Docs....

    I'd be interested in a price on these docs as well. Sounds like a good setup.
     
  5. Doctor

    Doctor Well-Known Member

    Joined:
    Apr 26, 2003
    Messages:
    180
    Likes Received:
    0
    Trophy Points:
    16
    Count me in.
     
  6. cPanelBilly

    cPanelBilly Guest

    I like the setup you have done. However there is 1 very large issue with it.

    You used NFS. You will have alot of problems with file locking.
    Most likely people with will start loosing email and being unable to login to pop/IMAP die to this.
     
  7. netwrkr

    netwrkr Well-Known Member

    Joined:
    Apr 12, 2003
    Messages:
    203
    Likes Received:
    0
    Trophy Points:
    16
    I'm wondering if the above solution using something like GFS from RedHat would be a better alternative to NFS?

    Tom
     
  8. mr2jzgte

    mr2jzgte Well-Known Member

    Joined:
    Jun 18, 2003
    Messages:
    51
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Florida

    By uising the following commands in the /etc/fstab file for the nfc mounts I've overcome those problems

    dev,rsize=8192,timeo=14,wsize=8192,intr,nolock,suid,exec,bg,hard

    we had a few issues with out those commands and cppop service returning an error which locked up some mail clients.. After bit of debugging we resolved it with this.

    GFS would be a better solution, or codafs (i had no end of dramas with coda) and from memory. As for GFS at the time there was very little info i could find on it and i believe the licence fee was $US5k which we couldn't justify spending..
     
  9. KatieBuller

    KatieBuller BANNED

    Joined:
    May 10, 2005
    Messages:
    65
    Likes Received:
    0
    Trophy Points:
    0
    How do you get the databases to replicate on the mysql cluster? I know it can mirror tables but not the actual databases, right?
     
  10. KatieBuller

    KatieBuller BANNED

    Joined:
    May 10, 2005
    Messages:
    65
    Likes Received:
    0
    Trophy Points:
    0
  11. KatieBuller

    KatieBuller BANNED

    Joined:
    May 10, 2005
    Messages:
    65
    Likes Received:
    0
    Trophy Points:
    0
  12. BianchiDude

    BianchiDude Well-Known Member
    PartnerNOC

    Joined:
    Jul 2, 2005
    Messages:
    619
    Likes Received:
    0
    Trophy Points:
    16
    Wouldnt that be even worse? Surely those files must be locked for a reason.
     
  13. cfsdigital-frankie

    Joined:
    Jan 7, 2003
    Messages:
    8
    Likes Received:
    0
    Trophy Points:
    1
    File locking is there for reason ? but how would one over come this. I know in freebsd there's snapshots -- wonder if this can be used? Network Appliance Filers have built in tools for file locking and what not
     
  14. DWHS.net

    DWHS.net Well-Known Member
    PartnerNOC

    Joined:
    Jul 28, 2002
    Messages:
    1,569
    Likes Received:
    6
    Trophy Points:
    38
    Location:
    LA, Costa RIca
    cPanel Access Level:
    Root Administrator
    If cpanel would just cluster email then the load wouldn't be much of a factor. :mad:

    Good idea though guys, good luck!
     
  15. evilweasil

    evilweasil Member

    Joined:
    Oct 18, 2005
    Messages:
    9
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    Manchester
    How many sites is this handling? There was a mention of 3000 in there somewhere. I take it that in using LVS/IPVS it's just a case of adding 1 line to the config file to add a new server right? Do all of the servers read from the same partition on the same storage array?
     
  16. mr2jzgte

    mr2jzgte Well-Known Member

    Joined:
    Jun 18, 2003
    Messages:
    51
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Florida
    we're upto 6500 clients on the array. I'm about to add a 3rd application server as we've decomissioned an identical machine to the current application servers..

    adding it into the array is a simple matter of mirroring the current slave app server and modifying a few settings and ip addresses and modifying the /etc/sysconfig/ipvsadm file + ha stuff.

    Yes all the servers run from the same partition on the same storage array... As i said previously I am running NFS..

    we do about 60000 emails a day averaged over 7 days..

    I still haven't done any documentation due to other commitments and being flat out.. I've had no firm offers from people regarding it so it's preferable if anyones interested to contact me via email and we can discuss what is needed.

    We've doubled our client base and the servers are handling the load really well.

    It's not exactly a cheap or easy setup but once it's working it works brilliantly.

    email - james@austdomains.com.au - i'll try and reply asap.
     
  17. mr2jzgte

    mr2jzgte Well-Known Member

    Joined:
    Jun 18, 2003
    Messages:
    51
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Florida
    Just a small update...

    I've just finished working on a solution which for full reduntancy involves only 4 servers in total....

    for simple load balancing and service monitoring only 3 machines are needed.

    Hows this sound to you guys... Just removed a big portian of the cost. It's done by using a GNDB filesystem on the 2 application servers with non-shared mirrored storage in the cluster and GFS clients... I'm still testing it..

    Also have been testing mysql 5.x cluster which seems a lot more promising than the previous versions.
     
  18. mr2jzgte

    mr2jzgte Well-Known Member

    Joined:
    Jun 18, 2003
    Messages:
    51
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Florida
    OK i've just perfected the network raid and gfs solution.. it works perfectly with failover and load balancing (active-active) solution.


    I'll be providing a reference site from a swiss based company shortly so you can contact to prove this is possible

    If there are any questions regarding this setup then feel free to ask.

    To be clear THIS IS NOT DNS CLUSTERING this is true reduntant and load balanced failover.. I'm also working on a front end active-active solution (i.e both load balancers are active and handling requests. the only addition in hardware is a BGP capable switch) This system has ZERO points of failure other than the upstream provider which can be negated with a decent provider.

    If you are interested, leave a message in this thread or send an email to james@austdomains.com.au

    Regards
    James
     
  19. Drew Nichols

    Drew Nichols Well-Known Member

    Joined:
    May 5, 2003
    Messages:
    96
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    SC
    Very intrigued -- this really adds value for many clients.
     
  20. depee

    depee Member

    Joined:
    Dec 5, 2005
    Messages:
    8
    Likes Received:
    0
    Trophy Points:
    1
    ------------------------------------------------------

    Everyone,
    I'm planing the same setup for an ISP in Africa. There are 2 sites involved.
    My Network:

    Please find attached JPG file


    LB = Load Balancers
    RS = Real Servers (With cPanel )
    SAN = SAN SWITCH
    DATA = DATA ARRAY HP MSA 1000 with 174 Gb of Data
    One site has the same config as above, except that the clusted nodes use share storage ( HP MSA1000).

    Questions

    Q1: Since DATA is shared here, how do u configure cpanel on RS1 and RS2 ?

    Q2: I wanted to use SteelEye LifeKeeper for HA + DR but SteelEye Lifekeeper doesn't do HA for cPanel as a whole, but does it at a LAMP level, and they do not have support KIT for Exim and other application. They just have some KIT available.
    So I intendded to use GFS, but I need directions on how to go about it. Any Idea ?

    Q3: Many people said GFS works great. I would like to know if GFS will provide Failover over the WAN link? I mean can RS3 and RS4 be active nodes in the cluster, if I use GFS ? How do I replicate DATA (These 2 sites are doing the exact same thing, and just need to be failover one of the other)

    Q4: Is there any other solution you can suggest appart of James solution or "Global FS" solution ?

    Please I would be glad to be contacted off the List for any commercial support.
    epeelea@gmail.com / +237 300 0682


    Much Regards,

    Daniel
     

    Attached Files:

    • san.JPG
      san.JPG
      File size:
      12.8 KB
      Views:
      540
Loading...

Share This Page