Webserver fail after gracefull reboot

bitpt

Member
Sep 28, 2006
21
2
153
After server gracefull reboot, server access missing, no ping, no boot.. no nothing
Hardware
Intell Xeon E3-1230 v3 -3.3Ghz - 32GB RAM, 2 TB
OS
CENTOS 7

Can do a hard reboot with IPMI but no normal boot.
Hhowever, server boot in rescue mode with options for
Freebsd_10.1_amd64 or 10.2 or 11.0, Ubuntu 14.04, 16.04, 18.04
How i recovery CENTOS 7 WHM/CPANEL installed only with rescue boot with Freebsd or ubuntu?

Thanks for your help
 
Last edited by a moderator:

kodeslogic

Well-Known Member
Apr 26, 2020
87
27
93
IN
cPanel Access Level
Root Administrator
If it is a managed server then your provider should take care of such issues on your behalf.
If it is an unmanaged server and you're not sure what next step needs to be taken then you can consult one of the service providers from System Administration Services
 

cPRex

Jurassic Moderator
Staff member
Oct 19, 2014
715
97
153
cPanel Access Level
Root Administrator
@kodeslogic is correct - if the system will not boot properly and won't even respond to ping, it is experiencing issues at a deeper level than the cPanel software and there needs to be intervention from the hosting provider or admin.

Was cPanel already installed on that system and you need to try and move your data? If so, we have a guide here that gives more details about performing disaster recovery work on a system: Full Disaster Recovery | cPanel & WHM Documentation
 

bitpt

Member
Sep 28, 2006
21
2
153
@kodeslogic is correct - if the system will not boot properly and won't even respond to ping, it is experiencing issues at a deeper level than the cPanel software and there needs to be intervention from the hosting provider or admin.

Was cPanel already installed on that system and you need to try and move your data? If so, we have a guide here that gives more details about performing disaster recovery work on a system: Full Disaster Recovery | cPanel & WHM Documentation
Is a unmanaged server, now i can login with ssh boot rescue(freesbd)
CPanel was working fine in this server with 50 websites until i made a gracefull reboot because, in "secure advisor", have a window with 2 processes and a "gracefull reboot" link to solve the issue (in this case to crash the server).

I don't know how recovery centos 7 and cpanel from a rescue boot (freesbd or unbutu)

In "Full disater recovery" options have access to hdd data, in this case, until rescue mode can recover centos boot, i can't do nothing or lost all hdd data.

it's second time(in 2 servers) in last 3 months i had issues with reboot from WHM
 
Last edited:

cPRex

Jurassic Moderator
Staff member
Oct 19, 2014
715
97
153
cPanel Access Level
Root Administrator
Security Advisor can let you know if services need a reboot, but that would not be the cause of the crash. The issue itself could be a problem with the physical hardware or the operating system, but that would need to be determined after some additional troubleshooting.

Even an unmanaged server should have some level of support of the system isn't booting properly, but that is up to your hosting agreement with your provider. It also seems odd to me that the recovery option they have chosen is freebsd when the server is running CentOS. The standard procedures for recovery mode are outlined here:


but I don't think they will be much help if you have freebsd.

If you need to have an admin check the system for you, the link that @kodeslogic provided to System Administration is our best recommendation.
 

bitpt

Member
Sep 28, 2006
21
2
153
After a rescue boot (Ubuntu)
Data disk, in this case, can be recovered.

# blkid
/dev/sda1: UUID="b94103e6-65ae-4917-b92c-2c47cd8bedd8" TYPE="ext4"
/dev/sda2: UUID="f435351dad-f978-4f8a-bb0c-a77457948eec" TYPE="ext4"
/dev/sda3: UUID="fb3de819-47a3-4039-acd9-982259b0915c" TYPE="swap"
/dev/sdb: PTTYPE="dos"
/dev/loop0: UUID="6f1ec65b-9c7d-4f86-9656-884671af1901" TYPE="ext3"

# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
├─sda1 8:1 0 512M 0
├─sda2 8:2 0 1.8T 0 part /
└─sda3 8:3 0 1G 0 part [SWAP]
sdb 8:16 0 1.8T 0 disk
loop0 7:0 0 4G 0 loop /var/tmp

mount /dev/sad2 /mnt/restore
Data in /mnt/restore can be restored to other location and mount a new Centos 7 Boot in SDA1.
 
  • Like
Reactions: cPRex

bitpt

Member
Sep 28, 2006
21
2
153
New fail after a new gracefull reboot in new server we use to transfer data from crashed server, i wast 3 days to recover data and websites to other server and, after all running, yellow window "You must reboot the server to apply software updates."
After gracefull reboot, waiting 45 minutes, no ping .. no nothing.
Server is ON connect with IPMI, no errors in logs.
Code:
IPMI LOG

Log Sequence Number: 2859

Detailed Description:

System is performing a CPU reset because of system power off, power on or a warm reset like CTRL-ALT-DEL.
But system don't recover... is a new server, we work with cpanel systems since 2005 and never had problems like this.
1.Setember a crash because machine execute fsck and a gracefull reboot started from whm don't check if fsck is running, boot fail, server fail.
2. Last week, nobody know because logs no report fails.
3. And new one, we lost money and clients and whm/cpanel(whatever) have this issue again
4. Now we need re alocate (AGAIN) 50 websites and more them 500G

I am #$%%#"&"$ with whm/cpanel.
We pay for a panel to avoid fails like this.... and
 
Last edited:

bitpt

Member
Sep 28, 2006
21
2
153
After 3rd time, server boot and work well... for now... nothing in logs i can said do this
After that @kodeslogic investigate and saw file system check takes too long

Thank you!!!

Why? disk problems?

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 213 212 021 Pre-fail Always - 4333
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 13
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 33946
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 13
16 Unknown_Attribute 0x0022 010 000 000 Old_age Always - 588792523628
183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 7
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 10
194 Temperature_Celsius 0x0022 119 114 000 Old_age Always - 31 (Min/Max 18/36)
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
241 Total_LBAs_Written 0x0032 200 200 000 Old_age Always - 220631763597
242 Total_LBAs_Read 0x0032 200 200 000 Old_age Always - 368160760031

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 27988 -
# 2 Short offline Completed without error 00% 23663 -
# 3 Short offline Completed without error 00% 184 -
# 4 Short offline Completed without error 00% 2 -
# 5 Vendor (0xdf) Completed without error 00% 2 -


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 210 210 021 Pre-fail Always - 4458
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 13
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 061 060 000 Old_age Always - 29174
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 13
16 Unknown_Attribute 0x0022 007 000 000 Old_age Always - 382580832582
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 16
194 Temperature_Celsius 0x0022 121 115 000 Old_age Always - 29 (Min/Max 18/35)
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
241 Total_LBAs_Written 0x0032 200 200 000 Old_age Always - 32816623532
242 Total_LBAs_Read 0x0032 200 200 000 Old_age Always - 349764209050

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 27988 -
# 2 Short offline Completed without error 00% 23663 -
# 3 Short offline Completed without error 00% 184 -
# 4 Short offline Completed without error 00% 2 -
# 5 Vendor (0xdf) Completed without error 00% 2 -

filesytem is clean

# dumpe2fs
Filesystem volume name: <none>
Last mounted on: /boot
Filesystem UUID: b94103e6-65ae-4917-b92c-2c47cd8bedd8
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 32768
Block count: 131072
Reserved block count: 6553
Free blocks: 94485
Free inodes: 32433
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 63
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Sun Nov 15 10:51:25 2020
Last mount time: Sun Nov 22 09:45:14 2020
Last write time: Sun Nov 22 09:45:14 2020
Mount count: 9
Maximum mount count: -1
Last checked: Sun Nov 15 10:51:25 2020
Check interval: 0 (<none>)
Lifetime writes: 224 MB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 31b0ffb3-ff30-4cc8-9300-36dc681b9396
Journal backup: inode blocks
Journal features: journal_64bit
Journal size: 16M
Journal length: 4096
Journal sequence: 0x00000172
Journal start: 1
 
Last edited:

cPRex

Jurassic Moderator
Staff member
Oct 19, 2014
715
97
153
cPanel Access Level
Root Administrator
@bitpt - I'm so sorry you have to deal with all that! It's important to note that all the issues you've mentioned so far seem to be related to the server's hardware and disks, and not the cPanel software. If the system has issues booting or there are disk errors that would indicate a problem with the machine at a lower level than the cPanel software.
 

bitpt

Member
Sep 28, 2006
21
2
153
@bitpt - I'm so sorry you have to deal with all that! It's important to note that all the issues you've mentioned so far seem to be related to the server's hardware and disks, and not the cPanel software. If the system has issues booting or there are disk errors that would indicate a problem with the machine at a lower level than the cPanel software.
Cpanel can detect, for example, if system i'll do a fsck, and, calculate (+/-) time to reboot.

cpanel can easily have a script that detects if the system will run a fsck, for example, and give information that the server can take up to x minutes to reboot. When cpanel places a "gracefull reboot" button, can, before, check if there is a "clean system" or not, or a count for next fsck. As it is, we lost hundreds of hours to solve problems induced by cpanel information button ... am I right?

A warning before the reboot would, at least, be acceptable. Say that "cpanel software is not to blame" when it puts a "Gracefull Reboot" button in front of the user ... is more or less say not trust in CPANEL and check before what cpanel says ... or CPANEL is not reliable and we have to check everything before, or, in cases that may create damage, you should, at least, inform the user.

@cPRex server execute fsck after x Mount, had hardware problems or not. fsck work time change, for a 1T data, can takes 3600s (1 hour) to check
 

cPRex

Jurassic Moderator
Staff member
Oct 19, 2014
715
97
153
cPanel Access Level
Root Administrator
There's always a balance of warning users and encouraging functionality. For example, you can change your PHP version with EasyApache and MultiPHP, and that could cause issues for your site if it doesn't support the version you're switching to.

When I click the yellow "reboot" banner in WHM on my end, it does prompt with this:

Warning: This will reboot your system!

but I don't believe we want to get into any type of time estimates. If we did provide that information, and that estimate turned out to be inaccurate, that would cause other issues for the server admin.

The best thing to do is to always perform reboots or other maintenance that will cause downtime during non-peak hours for your system and users, to have as litte of an impact as possible.