SloanPeterson

Member
Mar 21, 2005
19
0
151
upcp is crashing my server

Once it hits 80% done, the load jumps to 900+ and the server eventually crashes.

Why does upcp do that?

It happens when it is doing the:
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
part

Why does upcp crash the server?
 

SloanPeterson

Member
Mar 21, 2005
19
0
151
It happens when cpanelsync is running. Clearly there must be something wrong with that script, has anyone else has this problem?

killing it once cpanelsync starts running then rerunning
/scripts/updatenow
/scripts/upcp
seems to work ok, but what are we suppose to do when it runs via the cron at 2 every morning?
 

SloanPeterson

Member
Mar 21, 2005
19
0
151
I was able to save it
........
........
........
........
........
........
........
........
........
........
........
........
........
........
^X^X^X^X^X^Y^Y^Y^Y^Y^X^X^X^X^X



^X^Y^Y

[3]+ Stopped /scripts/upcp --force
 

SloanPeterson

Member
Mar 21, 2005
19
0
151
Dammit it happend again on another server:
14:10:03 up 10 days, 8:57, 2 users, load average: 722.25, 156.55, 51.46
 

neutro

Well-Known Member
Apr 11, 2004
70
1
158
same the other day.. i UPCP 5 servers.. two went down... very high load like above.. both redhat 9.....exactly as discribed above

Upcp got stuck at

...... securitycheck......

for me
 

mattwilks

Registered
Jan 8, 2004
2
0
151
I have had this problem on 12 out of 12 servers, I left them unattended while upcp was running, I had to reboot them all.
 

thedavid

Well-Known Member
Nov 22, 2002
124
0
166
I've had this happen 2 times now - the first time I simply thought that it was a buggy upgrade, the second time I knew that there was something wrong with the sync process itself.

Here's what I see on it: load jumps to 500+ during which point the entire server is inaccessable when it hits '80%' dcpumon logs aren't generated at that point because the entire server is locked. All other daemons die. Sometimes you can pull out of this by hitting ctrl-c many times, and waiting about 10 minutes. Other times, it's too far gone and needs a reboot.

It's very frustrating...
 

porcupine

Well-Known Member
PartnerNOC
Apr 18, 2002
74
0
306
Toronto, Ontario
cPanel Access Level
DataCenter Provider
So far I've only seen this on CentOS 3.4 servers in our facility (and figured it was related to CentOS 3.4 specifically).

I've seen this on several different hardware configurations, and all of our OS's are installed by hand. Some came directly after a clean/fresh OS and CPanel install, some occurred after content was moved (or accounts restored from a remote location) to the servers.

We have a simple monitoring script that we run every 3 minutes to check if the load is over 10.0, and if so, record the output of the following

ps auxww
top -c (1 page)
lynx --dump localhost/whm-server-status
mysqladmin processlist
netstat -n
/usr/sbin/exiwhat
vmstat 1 -n10 (I believe)

This hasn't caught anything to date. I have left some down until I could get to the console though and they displayed kernel panic's originating from /scripts/upcp. I tested the systems throughly, ran memtest86 for a day or two, etc. without any results.
 

i3903

Well-Known Member
Apr 27, 2003
62
0
156
We ran the update on 3 Xeons several days ago, all 3 went down at the very part you mentioned above.
 

SloanPeterson

Member
Mar 21, 2005
19
0
151
How does this know if the load is 10 or above?
ps auxww
top -c (1 page)
lynx --dump localhost/whm-server-status
mysqladmin processlist
netstat -n
/usr/sbin/exiwhat
vmstat 1 -n10 (I believe)
 

porcupine

Well-Known Member
PartnerNOC
Apr 18, 2002
74
0
306
Toronto, Ontario
cPanel Access Level
DataCenter Provider
SloanPeterson said:
How does this know if the load is 10 or above?
ps auxww
top -c (1 page)
lynx --dump localhost/whm-server-status
mysqladmin processlist
netstat -n
/usr/sbin/exiwhat
vmstat 1 -n10 (I believe)
Theres more to it then that, those are simply the commands it logs the output from. It uses the "uptime" command, and parses the output to determine the current load average.

Notably, has anyone submitted a ticket on this? I've got two dual xeon's sitting idle right now that *need* to be in production replacing older servers. I'm probably going to attempt Fedora Core 3 tonight to see if it suffers the same fate as the CentOS 3.4 loaded box did (as Fedora Core 2 has drivers for the GBE NIC's in the box, but they dont appear to work, and I cant be bothered troubleshooting something that may not work anyhow).
 

SloanPeterson

Member
Mar 21, 2005
19
0
151
porcupine said:
So far I've only seen this on CentOS 3.4 servers in our facility (and figured it was related to CentOS 3.4 specifically).
I have had it happend on CentOS, Fedora, and RedHat. This problem is not specific to only 1 OS