abdelhost77

Well-Known Member
Apr 25, 2012
116
2
68
Morocco
cPanel Access Level
Root Administrator
Hello ,

FROM Yesterday my CPU is increasing and i dont find really why , here after output of some commands :


top - 11:18:17 up 13:03, 1 user, load average: 16.92, 19.96, 19.63
Tasks: 253 total, 2 running, 248 sleeping, 0 stopped, 3 zombie
Cpu(s): 4.7%us, 1.3%sy, 0.0%ni, 42.5%id, 51.2%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 8023892k total, 7535632k used, 488260k free, 1523304k buffers
Swap: 10239992k total, 0k used, 10239992k free, 4117108k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22453 mysql 20 0 2386m 348m 5364 S 1.0 4.4 25:11.13 mysqld
6864 nobody 20 0 209m 34m 2368 S 0.3 0.4 0:00.05 httpd
6921 nobody 20 0 209m 34m 2208 S 0.3 0.4 0:00.04 httpd
6942 nobody 20 0 209m 34m 2292 S 0.3 0.4 0:00.02 httpd
7027 nobody 20 0 209m 34m 2348 S 0.3 0.4 0:00.02 httpd
7116 nobody 20 0 209m 34m 2332 S 0.3 0.4 0:00.03 httpd
7513 xxxx 20 0 111m 10m 6220 S 0.3 0.1 0:00.01 php
22578 root 20 0 14768 696 480 S 0.3 0.0 0:02.92 dovecot
23248 root 20 0 135m 35m 3464 S 0.3 0.5 0:29.52 httpd
1 root 20 0 19352 1528 1212 S 0.0 0.0 0:00.86 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:01.60 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:06.92 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.19 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:00.59 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 0:03.55 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:00.06 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:00.70 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
13 root 20 0 0 0 0 S 0.0 0.0 0:04.55 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 0:00.10 watchdog/2




IOTOP


Total DISK READ: 381.54 K/s | Total DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
6195 be/4 nobody 253.06 K/s 0.00 B/s 0.00 % 2.15 % httpd -k start -DSSL
6403 be/4 nobody 3.89 K/s 0.00 B/s 0.00 % 1.03 % httpd -k start -DSSL
6256 be/4 nobody 124.59 K/s 0.00 B/s 0.00 % 0.00 % httpd -k start -DSSL
6270 be/4 nobody 0.00 B/s 7.79 K/s 0.00 % 0.00 % httpd -k start -DSSL
6169 be/4 nobody 0.00 B/s 7.79 K/s 0.00 % 0.00 % httpd -k start -DSSL
6145 be/4 nobody 0.00 B/s 7.79 K/s 0.00 % 0.00 % httpd -k start -DSSL
382 be/4 biocuir 0.00 B/s 15.57 K/s 0.00 % 0.00 % pure-ftpd (UPLOAD)
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
3 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
5 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
8 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/1]



df -h

Filesystem Size Used Avail Use% Mounted on
/dev/sda2 39G 8.7G 28G 24% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 985M 51M 884M 6% /boot
/dev/sda7 386G 69G 298G 19% /home
/dev/sda6 4.9G 271M 4.3G 6% /tmp
/dev/sda3 20G 7.5G 11G 41% /usr





free -m


total used free shared buffers cached
Mem: 7835 7350 485 0 1486 4010
-/+ buffers/cache: 1852 5983
Swap: 9999 0 9999




I find also this in /var/log/messages


Jul 21 11:20:29 hiver kernel: ata1: EH complete
Jul 21 11:20:30 hiver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 21 11:20:30 hiver kernel: ata1.00: BMDMA stat 0x24
Jul 21 11:20:30 hiver kernel: ata1.00: failed command: READ DMA
Jul 21 11:20:30 hiver kernel: ata1.00: cmd c8/00:08:10:7d:35/00:00:00:00:00/e5 tag 0 dma 4096 in
Jul 21 11:20:30 hiver kernel: res 51/40:07:11:7d:35/00:00:00:00:00/e5 Emask 0x9 (media error)
Jul 21 11:20:30 hiver kernel: ata1.00: status: { DRDY ERR }
Jul 21 11:20:30 hiver kernel: ata1.00: error: { UNC }
Jul 21 11:20:30 hiver kernel: ata1.00: configured for UDMA/133
Jul 21 11:20:30 hiver kernel: sd 0:0:0:0: [sda] Unhandled sense code
Jul 21 11:20:30 hiver kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 21 11:20:30 hiver kernel: sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Jul 21 11:20:30 hiver kernel: Descriptor sense data with sense descriptors (in hex):
Jul 21 11:20:30 hiver kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Jul 21 11:20:30 hiver kernel: 05 35 7d 11
Jul 21 11:20:30 hiver kernel: sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Jul 21 11:20:30 hiver kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 05 35 7d 10 00 00 08 00





Please help me !!!
 
Last edited:

abdelhost77

Well-Known Member
Apr 25, 2012
116
2
68
Morocco
cPanel Access Level
Root Administrator
51.2%wa


very high I/O usage, is this VPS or dedicated ?
do you had backup running at a time or some slow mysql queries ?


It is dedicated ,
no there is no slow MYSQL Queries ,



the %io is not always high see top result below :

top - 15:02:44 up 16:47, 1 user, load average: 27.13, 28.10, 25.36
Tasks: 296 total, 3 running, 291 sleeping, 0 stopped, 2 zombie
Cpu(s): 17.1%us, 4.1%sy, 0.0%ni, 75.7%id, 2.1%wa, 0.0%hi, 1.1%si, 0.0%st
Mem: 8023892k total, 7565348k used, 458544k free, 794264k buffers
Swap: 10239992k total, 172k used, 10239820k free, 5130108k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22453 mysql 20 0 2450m 379m 5384 S 17.2 4.8 30:38.50 mysqld
31301 xxxx 20 0 136m 14m 6532 R 1.0 0.2 0:00.03 php-cgi
22589 dovecot 20 0 37612 2628 1804 S 0.7 0.0 0:02.43 imap-login
30339 nobody 20 0 209m 34m 2356 S 0.7 0.4 0:00.07 httpd
31265 xxxx 20 0 98060 15m 2416 S 0.7 0.2 0:00.02 cpsrvd-ssl
31273 xxxx 20 0 97988 15m 2416 S 0.7 0.2 0:00.02 cpsrvd-ssl
31275 xxxx 20 0 97980 15m 2416 S 0.7 0.2 0:00.02 cpsrvd-ssl
31279 xxxx 20 0 98136 15m 2416 S 0.7 0.2 0:00.02 cpsrvd-ssl
31281 xxxx 20 0 98136 15m 2416 S 0.7 0.2 0:00.02 cpsrvd-ssl
31285 xxxx 20 0 98348 15m 2408 S 0.7 0.2 0:00.02 cpsrvd-ssl
31299 xxxx 20 0 0 0 0 Z 0.7 0.0 0:00.02 php <defunct>
1336 root 20 0 0 0 0 R 0.3 0.0 0:59.18 kondemand/0
13543 xxxx 20 0 135m 1600 708 S 0.3 0.0 0:01.59 pure-ftpd
23539 root 20 0 97500 13m 1796 S 0.3 0.2 0:08.15 cpsrvd-ssl
25875 nobody 20 0 209m 35m 3012 S 0.3 0.5 0:00.26 httpd
28264 nobody 20 0 209m 34m 2356 S 0.3 0.4 0:00.08 httpd



i think maybe it is harddisk issue
is it safe to reboot with fsck option ? => shutdown -Fr now
 
Last edited:

abdelhost77

Well-Known Member
Apr 25, 2012
116
2
68
Morocco
cPanel Access Level
Root Administrator
yes

you can also do some hdd test with smartctl



/usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sda


SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 001 001 051 Pre-fail Always FAILING_NOW 36753

ATA Error Count: 12694 (device log contains only the most recent five errors)
Error 12694 occurred at disk power-on lifetime: 16743 hours (697 days + 15 hours)
Error 12693 occurred at disk power-on lifetime: 16743 hours (697 days + 15 hours)
Error 12692 occurred at disk power-on lifetime: 16743 hours (697 days + 15 hours)
Error 12691 occurred at disk power-on lifetime: 16743 hours (697 days + 15 hours)
Error 12690 occurred at disk power-on lifetime: 16743 hours (697 days + 15 hours)

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 16731 87391503



What do you think please ?
 

abdelhost77

Well-Known Member
Apr 25, 2012
116
2
68
Morocco
cPanel Access Level
Root Administrator
Make a backup asap.

That harddisk will fail soon and lose all data on it.
Hopefully i have j-2 backup ,

The DC will not change HD before tomorrow , and the server seems to work fine from websites speed opening point of view ( not very slow ) . even if cpu is more than 20 !!! ( yes i confirm it is strange :) )

im thinking to reboot with fsck ( shutdown -Fr now ) , may be there is just some corrupted filesystems that should be corrected but i read somewhere that fsck may stucks and we need KVM or console to unblock .

i do not have console , and DC are not responsive today ( sunday) and dont know what is the best option to choose , my goal is the minimum downtime as we have +500 cpanel live accounts on that server ?
 
Last edited:

ilaurens

Active Member
Jul 13, 2013
28
0
1
cPanel Access Level
Root Administrator
The DC will not change HD before tomorrow , and the server seems to work fine from websites speed opening point of view ( not very slow ) . even if cpu is more than 20 !!! ( yes i confirm it is strange )
Might be, but it has trouble with reading sectors that is the reason for high cpu usage. It has to make multiple cycles to receive the data.
 

abdelhost77

Well-Known Member
Apr 25, 2012
116
2
68
Morocco
cPanel Access Level
Root Administrator
Might be, but it has trouble with reading sectors that is the reason for high cpu usage. It has to make multiple cycles to receive the data.

The CPU go back to normal , i think may be it was only a corrumpted filesystem that was be corrected !!?? i do not know , and did not perform any actions except making backup of hard disk to external server .

please see top below , all seems to work very fine now , im wondering if still a good idea to change harddisk ?!


top - 23:55:17 up 1 day, 1:40, 2 users, load average: 0.24, 0.41, 0.43
Tasks: 240 total, 3 running, 235 sleeping, 0 stopped, 2 zombie
Cpu(s): 11.1%us, 2.8%sy, 0.0%ni, 82.2%id, 2.7%wa, 0.0%hi, 1.2%si, 0.0%st
Mem: 8023892k total, 7469112k used, 554780k free, 1318128k buffers
Swap: 10239992k total, 4004k used, 10235988k free, 3808156k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30002 user 20 0 0 0 0 Z 13.6 0.0 0:00.41 php <defunct>
22453 mysql 20 0 2578m 501m 4988 S 6.0 6.4 49:24.66 mysqld
30017 user 20 0 111m 10m 6152 R 1.0 0.1 0:00.03 php
28946 nobody 20 0 0 0 0 Z 0.7 0.0 0:00.16 httpd <defunct>
29562 nobody 20 0 210m 35m 2336 S 0.7 0.5 0:00.07 httpd
17 root 20 0 0 0 0 S 0.3 0.0 0:13.46 ksoftirqd/3
19 root 20 0 0 0 0 S 0.3 0.0 0:33.40 events/0
34 root 20 0 0 0 0 S 0.3 0.0 0:59.12 kblockd/0
14318 root 20 0 99920 4336 3328 S 0.3 0.1 0:02.08 sshd
14583 root 20 0 15164 1456 952 R 0.3 0.0 0:29.02 top
23248 root 20 0 135m 36m 3464 S 0.3 0.5 1:08.38 httpd
25225 user 20 0 135m 1528 684 S 0.3 0.0 0:00.72 pure-ftpd
27066 nobody 20 0 210m 35m 2356 S 0.3 0.5 0:00.07 httpd
28281 nobody 20 0 210m 35m 2392 S 0.3 0.5 0:00.23 httpd
28969 nobody 20 0 210m 35m 2364 S 0.3 0.5 0:00.10 httpd
28972 nobody 20 0 210m 35m 2376 S 0.3 0.5 0:00.12 httpd
29552 nobody 20 0 210m 35m 2300 S 0.3 0.5 0:00.07 httpd
29688 nobody 20 0 210m 35m 2340 S 0.3 0.5 0:00.03 httpd
29712 nobody 20 0 210m 35m 2296 S 0.3 0.4 0:00.03 httpd
29758 nobody 20 0 210m 35m 2352 S 0.3 0.5 0:00.03 httpd
29772 nobody 20 0 210m 35m 2348 S 0.3 0.5 0:00.05 httpd
29785 nobody 20 0 210m 35m 2280 S 0.3 0.4 0:00.02 httpd
29791 nobody 20 0 210m 35m 2292 S 0.3 0.4 0:00.02 httpd
1 root 20 0 19352 1136 908 S 0.0 0.0 0:01.08 init
 
Last edited:

abdelhost77

Well-Known Member
Apr 25, 2012
116
2
68
Morocco
cPanel Access Level
Root Administrator
Just to be sure ,

we also rebooted with fsck option ; and here after the boot logs relateed to filesystems :

...........
Checking filesystems
/dev/sda2: clean, 175011/2564096 files, 2430862/10240000 blocks
/dev/sda1: clean, 46/64000 files, 16938/256000 blocks
/dev/sda7: clean, 2871827/25665536 files, 19465422/102639616 blocks
/dev/sda6: clean, 30943/320000 files, 88745/1280000 blocks
/dev/sda3: clean, 156109/1281120 files, 1998422/5120000 blocks
^[[60G[^[[0;32m OK ^[[0;39m]^M
Remounting root filesystem in read-write mode: ^[[60G[^[[0;32m OK ^[[0;39m]^M
Mounting local filesystems: ^[[60G[^[[0;32m OK ^[[0;39m]^M
Enabling local filesystem quotas: ^[[60G[^[[0;32m OK ^[[0;39m]^M
Enabling /etc/fstab swaps: ^[[60G[^[[0;32m OK ^[[0;39m]^M
Entering non-interactive startup
Calling the system activity data collector (sadc):
Starting securetmp: *** Notice *** No loop module detected
If the loopback block device is built as a module, try running `modprobe loop` as root via ssh and running this script again.
If the loopback block device is built into the kernel itself, you can ignore this message.
Securing /tmp & /var/tmp
Securing /tmp... Done
Setting up /var/tmp... Done
Checking fstab for entries ...Done
Logrotate TMPDIR already configured
Process Complete
 

abdelhost77

Well-Known Member
Apr 25, 2012
116
2
68
Morocco
cPanel Access Level
Root Administrator
Just if anyone else encounter the same pb ,
i think it is maybe just related to some RPM update from cpanel , a cpanel expert can confirm this , because when checking crontab , i find that the time where the errors related to disk disapear and cpu go down is almost the same time of execution of :

/usr/local/cpanel/scripts/upcp

So it seems somehow that cpanel update did fix the issu .

so if someone else encounter the same pb it will be a good idea to run
/usr/local/cpanel/scripts/upcp

What is your opinions please ?
 
Last edited:

ilaurens

Active Member
Jul 13, 2013
28
0
1
cPanel Access Level
Root Administrator
I second this question, because the previous smart error ment that the harddisk will break soon. The indication of 1 day does not really mean it'll break after one day but it's just a indicator that it will high likely break soon. That is why we said to make a backup of the data to avoid any problems regarding data loss.

If you follow it, we do not mind, it's just a suggestion. If the data is not important than do not make a backup.