**Note this is for a 99.999% high availabilty project. Certian variables such as upstream connectivity can not be helped. The only other option here is to have multiple incoming ips connections routed via bgp to handle the multihomed address such as what we are using.)
Ok.
Brief Overview.
We were at a stage where we had 3000 clients on one aging machine. My boss approached me and gave me a project to undertake.
Multiple Clustered CPanel Server with expandability.
After 2 months of solid work we've finally had complete success with this project. I'll provide some details here for anyones thats interested and to prove that it works..
Network Diagram - best that i can do at the moment.
- Removed -
Componants required
2 x mid level machines - lots of memory + dual gige network cards (LVS front loaders)
2-10 x mid-high level machines + dual gige cards (these are the actual cpanel machines)
2 x low -mid level machines - FileStorage Servers. again dual gige cards.
2 x GIG E Switches.
In our case i used the following
2 x P3 1.3GHz Xeons with 1gb memory - 3 x 36gb scsi disks in a mirrored array + hotswap
2 x dual PIV 2.4Ghz Xeon Machines with 3 x 36gb scsi disks in a mirrored array + hotswap
2 x Dual PIV 2.4ghz Xeons with 2 x 80 Sata Mirrored Array (system drive) + 4 x 200gb Sata drives in a mirrored striped array ( 380 gb usuable space)
The network flow is data comes into the system via a multihomed address which carries the working subnet (a /28 expandable in this case)
This is all based on Redhat Enterprise 3.0 with a 2.4 kernel.
The 2 front loaders are identical config with exception of the ip addresses.
1st machine handles all the connections and return traffic. Should this machine go down then the 2nd lvs machine will assume all ip's and routing information.
These machines run IPVS, heartbeat, ldirector and thats about it + a few custom scripts for the network interfaces. (using ripd to announce routes to all machines in case one switch goes down)
IPVS handles all the port forwarding and load balancing.
Ldirector checks services on the cpanel machines and according to the setup will drop port forwards to any dead services and route them through to the live machine.
Should both cpanel machines be down then ldirector modifies the IPVS table so that requests are handled by the lvs machine (i..e http webpage with system maint etc, local mail server which holds all mail in a spool ready to deliver when the servers come back up. Likewise for dns - it runs off the same config zones as the cpanel servers via a nfs share.
Backend - 2 x fileservers. Again similar setup to above. One primary server which rsync's data to a 2nd server. If the first server goes down then data loss will be less than 10 mins. These servers also are the MYSQL cluster management services (4.1.9) Relatively dumb machines.. all they do is serve files to the cpanel machines.
Cpanel machines.. This is where things got tricky.
Started with a base install of Redhat Ent 3.0 with current patchset and 2.4.20-27 kernel.
Mounts are as follows
/home to nfs:/nfs/home
/mounts/var to nfs:/nfs/var
/mounts/etc to nfs:/nfs/etc
/mounts/usr to nfs:/nfs/usr
hard mounts with intr (this enables apps to stall and then start running again if the nfs servers go down)
etc
which enabled the use of symlinks (by god if i never see or have to use another symlink i'll be all too happy)
I then proceeded to install Cpanel 10r6 on both machines.
After this was done i started the hard bit. Modifying and symlinking.
There were very few changes made to the cpanel program (zero if i recall correctly)
Only major changes were converting the system to work with mysql 4.1.9 (which has a new password hashing method which caused all sorts of dramas for me till i figured out the legacy tables.)
Main problems here were the eximstats app.
I set it up so both machines use the same rsa keys - for https/ssl certs to work correctly.
cpanel machines have aliased lo:0-lo:10 with the /28 subnet also all data returns via the lvs machines this setup is known as LVS-NAT
the all machines have 10.0.x.x addresses with exception of the 2 lvs machines with have aliased interfaces for the /28 and multihomed addresses
Thats about all for now that i can think of. If anyone has specific questions I'll see what i can do about answering them, I'm considering doing up a step by step document for this process. I wil be charging for this though due to the ammount of time and effort spent on this project.
This setup is now live and hosting clients - No problems so far...
Feel free to add comments / flames / disbelief etc..
Things to do
** Setup monitoring software which emails admins about machine status.
** Work out why quotas are not working (they are working except a zero value is being returned for used space on clients account (not a real big concern with the space that we have)
** Finish off the snmp monitoring programs
** Complete the app to handle failback ftp/pop3/etcx requests
Ok.
Brief Overview.
We were at a stage where we had 3000 clients on one aging machine. My boss approached me and gave me a project to undertake.
Multiple Clustered CPanel Server with expandability.
After 2 months of solid work we've finally had complete success with this project. I'll provide some details here for anyones thats interested and to prove that it works..
Network Diagram - best that i can do at the moment.
- Removed -
Componants required
2 x mid level machines - lots of memory + dual gige network cards (LVS front loaders)
2-10 x mid-high level machines + dual gige cards (these are the actual cpanel machines)
2 x low -mid level machines - FileStorage Servers. again dual gige cards.
2 x GIG E Switches.
In our case i used the following
2 x P3 1.3GHz Xeons with 1gb memory - 3 x 36gb scsi disks in a mirrored array + hotswap
2 x dual PIV 2.4Ghz Xeon Machines with 3 x 36gb scsi disks in a mirrored array + hotswap
2 x Dual PIV 2.4ghz Xeons with 2 x 80 Sata Mirrored Array (system drive) + 4 x 200gb Sata drives in a mirrored striped array ( 380 gb usuable space)
The network flow is data comes into the system via a multihomed address which carries the working subnet (a /28 expandable in this case)
This is all based on Redhat Enterprise 3.0 with a 2.4 kernel.
The 2 front loaders are identical config with exception of the ip addresses.
1st machine handles all the connections and return traffic. Should this machine go down then the 2nd lvs machine will assume all ip's and routing information.
These machines run IPVS, heartbeat, ldirector and thats about it + a few custom scripts for the network interfaces. (using ripd to announce routes to all machines in case one switch goes down)
IPVS handles all the port forwarding and load balancing.
Ldirector checks services on the cpanel machines and according to the setup will drop port forwards to any dead services and route them through to the live machine.
Should both cpanel machines be down then ldirector modifies the IPVS table so that requests are handled by the lvs machine (i..e http webpage with system maint etc, local mail server which holds all mail in a spool ready to deliver when the servers come back up. Likewise for dns - it runs off the same config zones as the cpanel servers via a nfs share.
Backend - 2 x fileservers. Again similar setup to above. One primary server which rsync's data to a 2nd server. If the first server goes down then data loss will be less than 10 mins. These servers also are the MYSQL cluster management services (4.1.9) Relatively dumb machines.. all they do is serve files to the cpanel machines.
Cpanel machines.. This is where things got tricky.
Started with a base install of Redhat Ent 3.0 with current patchset and 2.4.20-27 kernel.
Mounts are as follows
/home to nfs:/nfs/home
/mounts/var to nfs:/nfs/var
/mounts/etc to nfs:/nfs/etc
/mounts/usr to nfs:/nfs/usr
hard mounts with intr (this enables apps to stall and then start running again if the nfs servers go down)
etc
which enabled the use of symlinks (by god if i never see or have to use another symlink i'll be all too happy)
I then proceeded to install Cpanel 10r6 on both machines.
After this was done i started the hard bit. Modifying and symlinking.
There were very few changes made to the cpanel program (zero if i recall correctly)
Only major changes were converting the system to work with mysql 4.1.9 (which has a new password hashing method which caused all sorts of dramas for me till i figured out the legacy tables.)
Main problems here were the eximstats app.
I set it up so both machines use the same rsa keys - for https/ssl certs to work correctly.
cpanel machines have aliased lo:0-lo:10 with the /28 subnet also all data returns via the lvs machines this setup is known as LVS-NAT
the all machines have 10.0.x.x addresses with exception of the 2 lvs machines with have aliased interfaces for the /28 and multihomed addresses
Thats about all for now that i can think of. If anyone has specific questions I'll see what i can do about answering them, I'm considering doing up a step by step document for this process. I wil be charging for this though due to the ammount of time and effort spent on this project.
This setup is now live and hosting clients - No problems so far...
Feel free to add comments / flames / disbelief etc..
Things to do
** Setup monitoring software which emails admins about machine status.
** Work out why quotas are not working (they are working except a zero value is being returned for used space on clients account (not a real big concern with the space that we have)
** Finish off the snmp monitoring programs
** Complete the app to handle failback ftp/pop3/etcx requests
Last edited: