wow. Just hit my first major problem with this box that has been running fine for over a year.
CentOS 5.2, 64bit.
My box was completely locked - the only thing I could do is hit reset to get it restarted. I look in the logs and find this:
this happens 7 more times within a 2 min period, all the same process ID and CPU. After that there is no log until reboot.Code:Sep 1 05:30:55 vhost3 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [exp2:5725] Sep 1 05:30:55 vhost3 kernel: CPU 2: Sep 1 05:30:55 vhost3 kernel: Modules linked in: nfs lockd fscache nfs_acl sunrpc iptable_nat ip_nat deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc crypto_blkcipher md5 sha256 sh Sep 1 05:30:55 vhost3 kernel: libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Sep 1 05:30:55 vhost3 kernel: Pid: 5725, comm: exp2 Tainted: G 2.6.18-92.el5 #1 Sep 1 05:30:55 vhost3 kernel: RIP: 0010:[<0000000000000001>] [<0000000000000001>] Sep 1 05:30:55 vhost3 kernel: RSP: 0018:ffff81006bcabb20 EFLAGS: 00000246 Sep 1 05:30:55 vhost3 kernel: RAX: 0000000000000000 RBX: ffff810063b3fcc0 RCX: 0000000000000001 Sep 1 05:30:55 vhost3 kernel: RDX: 00000000000004d0 RSI: ffffffff884687d0 RDI: ffff8100338ade80 Sep 1 05:30:55 vhost3 kernel: RBP: ffffffff80231b65 R08: 00000000d1b48344 R09: ffffffff80231b65 Sep 1 05:30:55 vhost3 kernel: R10: 0000000080000000 R11: 00000000000003f8 R12: ffffffff804c9590 Sep 1 05:30:55 vhost3 kernel: R13: ffff81006bcabb30 R14: 0000000000000003 R15: 00000000000004d0 Sep 1 05:30:55 vhost3 kernel: FS: 0000000045975940(0000) GS:ffff81011bc3ae40(0063) knlGS:00000000f7eed6c0 Sep 1 05:30:55 vhost3 kernel: CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b Sep 1 05:30:55 vhost3 kernel: CR2: 0000000000402ba0 CR3: 000000008a8a4000 CR4: 00000000000006e0 Sep 1 05:30:55 vhost3 kernel: Sep 1 05:30:55 vhost3 kernel: Call Trace: Sep 1 05:30:55 vhost3 kernel: [<ffffffff80232128>] ip_push_pending_frames+0x383/0x45e Sep 1 05:30:55 vhost3 kernel: [<ffffffff80242226>] udp_push_pending_frames+0x236/0x25b Sep 1 05:30:55 vhost3 kernel: [<ffffffff8005203c>] udp_sendmsg+0x4d3/0x5ce Sep 1 05:30:55 vhost3 kernel: [<ffffffff8011f4af>] socket_has_perm+0x5b/0x68 Sep 1 05:30:55 vhost3 kernel: [<ffffffff80054924>] sock_sendmsg+0xf3/0x110 Sep 1 05:30:55 vhost3 kernel: [<ffffffff800b9da9>] delayacct_end+0x5d/0x86 Sep 1 05:30:55 vhost3 kernel: [<ffffffff8009dde2>] autoremove_wake_function+0x0/0x2e Sep 1 05:30:55 vhost3 kernel: [<ffffffff8000769e>] find_get_page+0x21/0x50 Sep 1 05:30:55 vhost3 kernel: [<ffffffff80013325>] filemap_nopage+0x188/0x322 Sep 1 05:30:55 vhost3 kernel: [<ffffffff80008b39>] __handle_mm_fault+0x4e9/0xe23 Sep 1 05:30:55 vhost3 kernel: [<ffffffff8020d4de>] sys_sendto+0x11c/0x14f Sep 1 05:30:55 vhost3 kernel: [<ffffffff80066852>] do_page_fault+0x4fe/0x830 Sep 1 05:30:55 vhost3 kernel: [<ffffffff80041922>] d_rehash+0x21/0x34 Sep 1 05:30:55 vhost3 kernel: [<ffffffff8020d10f>] sock_attach_fd+0x8f/0xfd Sep 1 05:30:55 vhost3 kernel: [<ffffffff80005d36>] level2_kernel_pgt+0xd36/0x1000 Sep 1 05:30:55 vhost3 kernel: [<ffffffff800b3fd8>] audit_syscall_entry+0x16e/0x1a1 Sep 1 05:30:55 vhost3 kernel: [<ffffffff80221cb1>] compat_sys_socketcall+0xf1/0x172 Sep 1 05:30:55 vhost3 kernel: [<ffffffff80061618>] cstar_do_call+0x1b/0x65 Sep 1 05:30:55 vhost3 kernel:
Then it happened again today, different CPU (and of course different process ID). Call stack looks the same.
I try to do yum update and find 274 pkg need updating, but yum hangs on libpq.so.4 being needed (but it's there) so it won't update... That's on another thread.
Any ideas on the CPU block?
Thanks!



LinkBack URL
About LinkBacks
Reply With Quote






