Community Forums
Connect with us on LinkedIn
+ Reply to Thread
Results 1 to 10 of 10
  1. #1
    Member dev.null's Avatar
    Join Date
    May 2003
    Posts
    71

    Default kernel: BUG: soft lockup - CPU#2 stuck for 10s! [exp2:5725]

    wow. Just hit my first major problem with this box that has been running fine for over a year.

    CentOS 5.2, 64bit.

    My box was completely locked - the only thing I could do is hit reset to get it restarted. I look in the logs and find this:

    Code:
    Sep  1 05:30:55 vhost3 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [exp2:5725]
    Sep  1 05:30:55 vhost3 kernel: CPU 2:
    Sep  1 05:30:55 vhost3 kernel: Modules linked in: nfs lockd fscache nfs_acl sunrpc iptable_nat ip_nat deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc crypto_blkcipher md5 sha256 sh
    Sep  1 05:30:55 vhost3 kernel: libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
    Sep  1 05:30:55 vhost3 kernel: Pid: 5725, comm: exp2 Tainted: G      2.6.18-92.el5 #1
    Sep  1 05:30:55 vhost3 kernel: RIP: 0010:[<0000000000000001>]  [<0000000000000001>]
    Sep  1 05:30:55 vhost3 kernel: RSP: 0018:ffff81006bcabb20  EFLAGS: 00000246
    Sep  1 05:30:55 vhost3 kernel: RAX: 0000000000000000 RBX: ffff810063b3fcc0 RCX: 0000000000000001
    Sep  1 05:30:55 vhost3 kernel: RDX: 00000000000004d0 RSI: ffffffff884687d0 RDI: ffff8100338ade80
    Sep  1 05:30:55 vhost3 kernel: RBP: ffffffff80231b65 R08: 00000000d1b48344 R09: ffffffff80231b65
    Sep  1 05:30:55 vhost3 kernel: R10: 0000000080000000 R11: 00000000000003f8 R12: ffffffff804c9590
    Sep  1 05:30:55 vhost3 kernel: R13: ffff81006bcabb30 R14: 0000000000000003 R15: 00000000000004d0
    Sep  1 05:30:55 vhost3 kernel: FS:  0000000045975940(0000) GS:ffff81011bc3ae40(0063) knlGS:00000000f7eed6c0
    Sep  1 05:30:55 vhost3 kernel: CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    Sep  1 05:30:55 vhost3 kernel: CR2: 0000000000402ba0 CR3: 000000008a8a4000 CR4: 00000000000006e0
    Sep  1 05:30:55 vhost3 kernel: 
    Sep  1 05:30:55 vhost3 kernel: Call Trace:
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80232128>] ip_push_pending_frames+0x383/0x45e
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80242226>] udp_push_pending_frames+0x236/0x25b
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8005203c>] udp_sendmsg+0x4d3/0x5ce
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8011f4af>] socket_has_perm+0x5b/0x68
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80054924>] sock_sendmsg+0xf3/0x110
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff800b9da9>] delayacct_end+0x5d/0x86
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8000769e>] find_get_page+0x21/0x50
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80013325>] filemap_nopage+0x188/0x322
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80008b39>] __handle_mm_fault+0x4e9/0xe23
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8020d4de>] sys_sendto+0x11c/0x14f
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80066852>] do_page_fault+0x4fe/0x830
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80041922>] d_rehash+0x21/0x34
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8020d10f>] sock_attach_fd+0x8f/0xfd
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80005d36>] level2_kernel_pgt+0xd36/0x1000
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff800b3fd8>] audit_syscall_entry+0x16e/0x1a1
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80221cb1>] compat_sys_socketcall+0xf1/0x172
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80061618>] cstar_do_call+0x1b/0x65
    Sep  1 05:30:55 vhost3 kernel:
    this happens 7 more times within a 2 min period, all the same process ID and CPU. After that there is no log until reboot.

    Then it happened again today, different CPU (and of course different process ID). Call stack looks the same.

    I try to do yum update and find 274 pkg need updating, but yum hangs on libpq.so.4 being needed (but it's there) so it won't update... That's on another thread.

    Any ideas on the CPU block?

    Thanks!
    /dev/null
    Your local neighborhood null device.

  2. #2
    Member
    Join Date
    Aug 2009
    Location
    Houston, Tx
    Posts
    275

    Default CPU#2 stuck for 10s! [exp2:5725]

    Hello,

    I do see the issue that you are running into, and I am sorry you have had issues. Unfortunately you would need to submit a ticket with you datacenter to take a look at this machine. I do not believe it will have anything to do with cPanel. If you find that it does or would like to submit a ticket with us there is a link at the bottom of this post.

    Thank you,
    Matthew Curry

  3. #3
    Member dev.null's Avatar
    Join Date
    May 2003
    Posts
    71

    Default

    Quote Originally Posted by cPanelMattCurry View Post
    Hello,

    I do see the issue that you are running into, and I am sorry you have had issues. Unfortunately you would need to submit a ticket with you datacenter to take a look at this machine. I do not believe it will have anything to do with cPanel. If you find that it does or would like to submit a ticket with us there is a link at the bottom of this post.

    Thank you,
    Matthew Curry
    I don't think it's cpanel per-se. I'm just checking with you other server admins for advice on what to do.

    I am the datacenter guy... ;-D

    Been running linux servers for 10+ years, never saw this problem before. I'm hoping someone will tell me "it's not your hardware going bad, this is a software/driver problem". That's the big one for me.

    I currently have a script in place that records all the processes and their IDs. Next lockup I'll know what the process is.

    Thanks!
    /dev/null
    Your local neighborhood null device.

  4. #4
    d_t
    d_t is offline
    Member
    Join Date
    Sep 2003
    Location
    Bucharest
    Posts
    239

    Default

    It may be a problem with RAID controller or storage system. Check if is any error message in controller's BIOS. (I had a similar problem several months ago and if was a bad Adaptec controller)
    Joomla & Magento cPAddons
    Joomla 2.x added as cPanel Addon (free)

  5. #5
    Member hostmedic's Avatar
    Join Date
    Apr 2003
    Location
    Ohio
    Posts
    556
    cPanel/Enkompass Access Level

    DataCenter Provider

    Default what does sar show

    what does sar show?

    # sar

    we want to see the I/O load most specifically

    you might want to recompile kernel as well

  6. #6
    Member dev.null's Avatar
    Join Date
    May 2003
    Posts
    71

    Default

    Quote Originally Posted by d_t View Post
    It may be a problem with RAID controller or storage system. Check if is any error message in controller's BIOS. (I had a similar problem several months ago and if was a bad Adaptec controller)
    No raid, couple sata's right off the mobo.

    when you say "check if is any error message in controller's BIOS", do you mean in a log file or in BIOS on boot-up?

    Thanks!
    /dev/null
    Your local neighborhood null device.

  7. #7
    Member dev.null's Avatar
    Join Date
    May 2003
    Posts
    71

    Default

    Quote Originally Posted by hostmedic View Post
    what does sar show?

    # sar

    we want to see the I/O load most specifically

    you might want to recompile kernel as well
    sar doesn't show far back enough (last time it died was yesterday, sar started at midnight)

    Next time it happens I'll be all over it like a cheap suite and let you know.

    Should I set sar to dump to a file via cron? (IOW does sar get reset at boot/shutdown?)

    Thanks!
    /dev/null
    Your local neighborhood null device.

  8. #8
    Member hostmedic's Avatar
    Join Date
    Apr 2003
    Location
    Ohio
    Posts
    556
    cPanel/Enkompass Access Level

    DataCenter Provider

    Default not sure - dont think so -

    it might - not sure.

    you could get it to log out - just to be safe.

    Is this the only server? - might be good to setup so taht the logs go elsewhere just in case

    Honestly i think its an issue w/ kernel not liking a drive controller - but I have been known to be wrong before.

  9. #9
    d_t
    d_t is offline
    Member
    Join Date
    Sep 2003
    Location
    Bucharest
    Posts
    239

    Default

    Quote Originally Posted by dev.null View Post
    do you mean in a log file or in BIOS on boot-up?
    No, I mean raid controller bios (it logs errors) - but doesn't apply to you if you don't have raid.
    Joomla & Magento cPAddons
    Joomla 2.x added as cPanel Addon (free)

  10. #10
    Member kran's Avatar
    Join Date
    Jul 2003
    Location
    Colombia
    Posts
    75

    Default I'm having a similar problem

    I've have look every posible cause I can think off ... I belive it might be the firewall because it ran for many hours, 1 After reinstalling the firewall, started having the same, it seems it runs out of swap space and it locks this is what I get:

    Oct 3 01:20:03 tiburon kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:e0:81:34:cd:1d:00:d0:03:9c:68:0a:08:00 SRC=190.84.24
    7.230 DST=66.197.xxx.xxx LEN=60 TOS=0x00 PREC=0x00 TTL=112 ID=23554 PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=56270
    Oct 3 01:20:04 tiburon kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kswapd0:185]
    Oct 3 01:20:04 tiburon kernel:
    Oct 3 01:20:04 tiburon kernel: Pid: 185, comm: kswapd0
    Oct 3 01:20:04 tiburon kernel: EIP: 0060:[<c049e068>] CPU: 1
    Oct 3 01:20:04 tiburon kernel: EIP is at dqput+0xda/0x15d
    Oct 3 01:20:04 tiburon kernel: EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
    Oct 3 01:20:04 tiburon kernel: EAX: 00000000 EBX: ea75dd80 ECX: f75b6400 EDX: 00000002
    Oct 3 01:20:04 tiburon kernel: ESI: 00000000 EDI: ffffffe2 EBP: f7f6af10 DS: 007b ES: 007b
    Oct 3 01:20:04 tiburon kernel: CR0: 8005003b CR2: 45d0e290 CR3: 0073b000 CR4: 000006d0
    Oct 3 01:20:04 tiburon kernel: [<c049e5d7>] dquot_drop+0x26/0x4c
    Oct 3 01:20:04 tiburon kernel: [<f8890b2e>] ext3_dquot_drop+0x3b/0x5d [ext3]
    Oct 3 01:20:04 tiburon kernel: [<c048aad3>] clear_inode+0x9f/0x104
    Oct 3 01:20:04 tiburon kernel: [<c048ad9a>] dispose_list+0x33/0xb1
    Oct 3 01:20:04 tiburon kernel: [<c048af94>] shrink_icache_memory+0x17c/0x1a4
    Oct 3 01:20:04 tiburon kernel: [<c045f2f2>] shrink_slab+0xd3/0x13c
    Oct 3 01:20:04 tiburon kernel: [<c045f67d>] kswapd+0x2a6/0x3ab
    Oct 3 01:20:04 tiburon kernel: [<c0434907>] autoremove_wake_function+0x0/0x2d
    Oct 3 01:20:04 tiburon kernel: [<c045f3d7>] kswapd+0x0/0x3ab
    Oct 3 01:20:04 tiburon kernel: [<c0434845>] kthread+0xc0/0xeb
    Oct 3 01:20:04 tiburon kernel: [<c0434785>] kthread+0x0/0xeb
    Oct 3 01:20:04 tiburon kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
    Oct 3 01:20:04 tiburon kernel: =======================
    www.americandominios.com
    hosting Colombia - LatinAmerica

Similar Threads & Tags
Similar threads

  1. bug: EnkompassWcfService stuck starting
    By chrisbuk in forum Enkompass Discussions
    Replies: 5
    Last Post: 08-09-2011, 07:33 AM
  2. IMAPD stuck at 100% CPU load
    By eva2000 in forum E-mail Discussions
    Replies: 0
    Last Post: 10-14-2008, 09:23 PM
  3. Possible kernel BUG
    By ukagg in forum cPanel and WHM Discussions
    Replies: 4
    Last Post: 03-15-2007, 10:56 PM
  4. kernel 2.6.18 cpu usage
    By spector in forum cPanel and WHM Discussions
    Replies: 0
    Last Post: 09-23-2006, 01:27 PM
  5. kernel: kernel BUG at vmscan.c:359!
    By nlservices in forum cPanel and WHM Discussions
    Replies: 0
    Last Post: 11-23-2003, 12:11 PM
Linkedin       Facebook       Twitter       RSS       Flickr       YouTube