The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

kernel: BUG: soft lockup - CPU#2 stuck for 10s! [exp2:5725]

Discussion in 'General Discussion' started by dev.null, Sep 2, 2009.

  1. dev.null

    dev.null Well-Known Member

    Joined:
    May 27, 2003
    Messages:
    75
    Likes Received:
    1
    Trophy Points:
    6
    wow. Just hit my first major problem with this box that has been running fine for over a year.

    CentOS 5.2, 64bit.

    My box was completely locked - the only thing I could do is hit reset to get it restarted. I look in the logs and find this:

    Code:
    Sep  1 05:30:55 vhost3 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [exp2:5725]
    Sep  1 05:30:55 vhost3 kernel: CPU 2:
    Sep  1 05:30:55 vhost3 kernel: Modules linked in: nfs lockd fscache nfs_acl sunrpc iptable_nat ip_nat deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc crypto_blkcipher md5 sha256 sh
    Sep  1 05:30:55 vhost3 kernel: libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
    Sep  1 05:30:55 vhost3 kernel: Pid: 5725, comm: exp2 Tainted: G      2.6.18-92.el5 #1
    Sep  1 05:30:55 vhost3 kernel: RIP: 0010:[<0000000000000001>]  [<0000000000000001>]
    Sep  1 05:30:55 vhost3 kernel: RSP: 0018:ffff81006bcabb20  EFLAGS: 00000246
    Sep  1 05:30:55 vhost3 kernel: RAX: 0000000000000000 RBX: ffff810063b3fcc0 RCX: 0000000000000001
    Sep  1 05:30:55 vhost3 kernel: RDX: 00000000000004d0 RSI: ffffffff884687d0 RDI: ffff8100338ade80
    Sep  1 05:30:55 vhost3 kernel: RBP: ffffffff80231b65 R08: 00000000d1b48344 R09: ffffffff80231b65
    Sep  1 05:30:55 vhost3 kernel: R10: 0000000080000000 R11: 00000000000003f8 R12: ffffffff804c9590
    Sep  1 05:30:55 vhost3 kernel: R13: ffff81006bcabb30 R14: 0000000000000003 R15: 00000000000004d0
    Sep  1 05:30:55 vhost3 kernel: FS:  0000000045975940(0000) GS:ffff81011bc3ae40(0063) knlGS:00000000f7eed6c0
    Sep  1 05:30:55 vhost3 kernel: CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    Sep  1 05:30:55 vhost3 kernel: CR2: 0000000000402ba0 CR3: 000000008a8a4000 CR4: 00000000000006e0
    Sep  1 05:30:55 vhost3 kernel: 
    Sep  1 05:30:55 vhost3 kernel: Call Trace:
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80232128>] ip_push_pending_frames+0x383/0x45e
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80242226>] udp_push_pending_frames+0x236/0x25b
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8005203c>] udp_sendmsg+0x4d3/0x5ce
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8011f4af>] socket_has_perm+0x5b/0x68
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80054924>] sock_sendmsg+0xf3/0x110
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff800b9da9>] delayacct_end+0x5d/0x86
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8000769e>] find_get_page+0x21/0x50
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80013325>] filemap_nopage+0x188/0x322
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80008b39>] __handle_mm_fault+0x4e9/0xe23
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8020d4de>] sys_sendto+0x11c/0x14f
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80066852>] do_page_fault+0x4fe/0x830
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80041922>] d_rehash+0x21/0x34
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff8020d10f>] sock_attach_fd+0x8f/0xfd
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80005d36>] level2_kernel_pgt+0xd36/0x1000
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff800b3fd8>] audit_syscall_entry+0x16e/0x1a1
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80221cb1>] compat_sys_socketcall+0xf1/0x172
    Sep  1 05:30:55 vhost3 kernel:  [<ffffffff80061618>] cstar_do_call+0x1b/0x65
    Sep  1 05:30:55 vhost3 kernel:
    
    this happens 7 more times within a 2 min period, all the same process ID and CPU. After that there is no log until reboot.

    Then it happened again today, different CPU (and of course different process ID). Call stack looks the same.

    I try to do yum update and find 274 pkg need updating, but yum hangs on libpq.so.4 being needed (but it's there) so it won't update... That's on another thread.

    Any ideas on the CPU block?

    Thanks!
     
  2. MattCurry

    MattCurry Well-Known Member

    Joined:
    Aug 18, 2009
    Messages:
    275
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Houston, Tx
    CPU#2 stuck for 10s! [exp2:5725]

    Hello,

    I do see the issue that you are running into, and I am sorry you have had issues. Unfortunately you would need to submit a ticket with you datacenter to take a look at this machine. I do not believe it will have anything to do with cPanel. If you find that it does or would like to submit a ticket with us there is a link at the bottom of this post.

    Thank you,
    Matthew Curry
     
  3. dev.null

    dev.null Well-Known Member

    Joined:
    May 27, 2003
    Messages:
    75
    Likes Received:
    1
    Trophy Points:
    6
    I don't think it's cpanel per-se. I'm just checking with you other server admins for advice on what to do.

    I am the datacenter guy... ;-D

    Been running linux servers for 10+ years, never saw this problem before. I'm hoping someone will tell me "it's not your hardware going bad, this is a software/driver problem". That's the big one for me.

    I currently have a script in place that records all the processes and their IDs. Next lockup I'll know what the process is.

    Thanks!
     
  4. d_t

    d_t Well-Known Member

    Joined:
    Sep 20, 2003
    Messages:
    243
    Likes Received:
    1
    Trophy Points:
    18
    Location:
    Bucharest
    It may be a problem with RAID controller or storage system. Check if is any error message in controller's BIOS. (I had a similar problem several months ago and if was a bad Adaptec controller)
     
  5. hostmedic

    hostmedic Well-Known Member

    Joined:
    Apr 30, 2003
    Messages:
    559
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Washington Court House, Ohio, United States
    cPanel Access Level:
    DataCenter Provider
    what does sar show

    what does sar show?

    # sar

    we want to see the I/O load most specifically

    you might want to recompile kernel as well
     
  6. dev.null

    dev.null Well-Known Member

    Joined:
    May 27, 2003
    Messages:
    75
    Likes Received:
    1
    Trophy Points:
    6
    No raid, couple sata's right off the mobo.

    when you say "check if is any error message in controller's BIOS", do you mean in a log file or in BIOS on boot-up?

    Thanks!
     
  7. dev.null

    dev.null Well-Known Member

    Joined:
    May 27, 2003
    Messages:
    75
    Likes Received:
    1
    Trophy Points:
    6
    sar doesn't show far back enough (last time it died was yesterday, sar started at midnight)

    Next time it happens I'll be all over it like a cheap suite and let you know.

    Should I set sar to dump to a file via cron? (IOW does sar get reset at boot/shutdown?)

    Thanks!
     
  8. hostmedic

    hostmedic Well-Known Member

    Joined:
    Apr 30, 2003
    Messages:
    559
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Washington Court House, Ohio, United States
    cPanel Access Level:
    DataCenter Provider
    not sure - dont think so -

    it might - not sure.

    you could get it to log out - just to be safe.

    Is this the only server? - might be good to setup so taht the logs go elsewhere just in case

    Honestly i think its an issue w/ kernel not liking a drive controller - but I have been known to be wrong before.
     
  9. d_t

    d_t Well-Known Member

    Joined:
    Sep 20, 2003
    Messages:
    243
    Likes Received:
    1
    Trophy Points:
    18
    Location:
    Bucharest
    No, I mean raid controller bios (it logs errors) - but doesn't apply to you if you don't have raid.
     
  10. kran

    kran Well-Known Member

    Joined:
    Jul 5, 2003
    Messages:
    74
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Colombia
    cPanel Access Level:
    Root Administrator
    I'm having a similar problem

    I've have look every posible cause I can think off ... I belive it might be the firewall because it ran for many hours, 1 After reinstalling the firewall, started having the same, it seems it runs out of swap space and it locks this is what I get:

    Oct 3 01:20:03 tiburon kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:e0:81:34:cd:1d:00:d0:03:9c:68:0a:08:00 SRC=190.84.24
    7.230 DST=66.197.xxx.xxx LEN=60 TOS=0x00 PREC=0x00 TTL=112 ID=23554 PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=56270
    Oct 3 01:20:04 tiburon kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kswapd0:185]
    Oct 3 01:20:04 tiburon kernel:
    Oct 3 01:20:04 tiburon kernel: Pid: 185, comm: kswapd0
    Oct 3 01:20:04 tiburon kernel: EIP: 0060:[<c049e068>] CPU: 1
    Oct 3 01:20:04 tiburon kernel: EIP is at dqput+0xda/0x15d
    Oct 3 01:20:04 tiburon kernel: EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
    Oct 3 01:20:04 tiburon kernel: EAX: 00000000 EBX: ea75dd80 ECX: f75b6400 EDX: 00000002
    Oct 3 01:20:04 tiburon kernel: ESI: 00000000 EDI: ffffffe2 EBP: f7f6af10 DS: 007b ES: 007b
    Oct 3 01:20:04 tiburon kernel: CR0: 8005003b CR2: 45d0e290 CR3: 0073b000 CR4: 000006d0
    Oct 3 01:20:04 tiburon kernel: [<c049e5d7>] dquot_drop+0x26/0x4c
    Oct 3 01:20:04 tiburon kernel: [<f8890b2e>] ext3_dquot_drop+0x3b/0x5d [ext3]
    Oct 3 01:20:04 tiburon kernel: [<c048aad3>] clear_inode+0x9f/0x104
    Oct 3 01:20:04 tiburon kernel: [<c048ad9a>] dispose_list+0x33/0xb1
    Oct 3 01:20:04 tiburon kernel: [<c048af94>] shrink_icache_memory+0x17c/0x1a4
    Oct 3 01:20:04 tiburon kernel: [<c045f2f2>] shrink_slab+0xd3/0x13c
    Oct 3 01:20:04 tiburon kernel: [<c045f67d>] kswapd+0x2a6/0x3ab
    Oct 3 01:20:04 tiburon kernel: [<c0434907>] autoremove_wake_function+0x0/0x2d
    Oct 3 01:20:04 tiburon kernel: [<c045f3d7>] kswapd+0x0/0x3ab
    Oct 3 01:20:04 tiburon kernel: [<c0434845>] kthread+0xc0/0xeb
    Oct 3 01:20:04 tiburon kernel: [<c0434785>] kthread+0x0/0xeb
    Oct 3 01:20:04 tiburon kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
    Oct 3 01:20:04 tiburon kernel: =======================
     
Loading...

Share This Page