The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Read only File system

Discussion in 'General Discussion' started by mike25, Jun 22, 2007.

  1. mike25

    mike25 Well-Known Member

    Joined:
    Aug 29, 2003
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Raleigh NC, USA
    I have a server that is switching to a read only file system overnight, usually around the same time as cron.daily is ran. I have read a few posts and most seem to think that this is the resulty of a faulty drive. The file system is setup on a hardware raid 1 array though. I have ran the smart check of the device and it produces no errors from what I can tell. What other things could I look at to find the source of this?

    smartctl version 5.33 [x86_64-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
    Home page is http://smartmontools.sourceforge.net/

    === START OF INFORMATION SECTION ===
    Device Model: ST3320620AS
    Serial Number: 4QF03BNB
    Firmware Version: 3.AAD
    User Capacity: 320,072,933,376 bytes
    Device is: Not in smartctl database [for details use: -P showall]
    ATA Version is: 7
    ATA Standard is: Exact ATA specification draft version not indicated
    Local Time is: Fri Jun 22 13:42:19 2007 EDT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status: (0x82) Offline data collection activity
    was completed without error.
    Auto Offline Data Collection: Enabled.
    Self-test execution status: ( 249) Self-test routine in progress...
    90% of test remaining.
    Total time to complete Offline
    data collection: ( 430) seconds.
    Offline data collection
    capabilities: (0x5b) SMART execute Offline immediate.
    Auto Offline data collection on/off support.
    Suspend Offline collection upon new
    command.
    Offline surface scan supported.
    Self-test supported.
    No Conveyance Self-test supported.
    Selective Self-test supported.
    SMART capabilities: (0x0003) Saves SMART data before entering
    power-saving mode.
    Supports SMART auto save timer.
    Error logging capability: (0x01) Error logging supported.
    General Purpose Logging supported.
    Short self-test routine
    recommended polling time: ( 1) minutes.
    Extended self-test routine
    recommended polling time: ( 115) minutes.

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x000f 105 097 006 Pre-fail Always - 0
    3 Spin_Up_Time 0x0003 096 095 000 Pre-fail Always - 0
    4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 12
    5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
    7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 98457581
    9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1601
    10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
    12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 18
    187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
    189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
    190 Unknown_Attribute 0x0022 073 062 045 Old_age Always - 454754331
    194 Temperature_Celsius 0x0022 027 040 000 Old_age Always - 27 (Lifetime Min/Max 0/24)
    195 Hardware_ECC_Recovered 0x001a 065 054 000 Old_age Always - 98291022
    197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
    198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
    199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
    200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
    202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

    SMART Error Log Version: 1
    No Errors Logged

    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Short offline Self-test routine in progress 90% 1601 -

    SMART Selective self-test log data structure revision number 1
    SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
    Selective self-test flags (0x0):
    After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
     
  2. cPanelNick

    cPanelNick Administrator
    Staff Member

    Joined:
    Mar 9, 2015
    Messages:
    3,426
    Likes Received:
    2
    Trophy Points:
    38
    cPanel Access Level:
    DataCenter Provider
    Backup everything.

    Take the machine into single user mode.

    fsck the drive

    Boot it back up.

    If you still get an error, replace the drive and restore.
     
  3. mike25

    mike25 Well-Known Member

    Joined:
    Aug 29, 2003
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Raleigh NC, USA
    I located this in the message log, odd each occurance was at the same time. any help would be greatly appericated.

    Jun 21 03:47:40 venus kernel: EXT3-fs error (device sda8): htree_dirblock_to_tree: bad entry in directory #10011395: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
    Jun 21 03:47:40 venus kernel: Aborting journal on device sda8.
    Jun 21 03:47:40 venus kernel: ext3_abort called.
    Jun 21 03:47:40 venus kernel: EXT3-fs error (device sda8): ext3_journal_start_sb: Detected aborted journal
    Jun 21 03:47:40 venus kernel: Remounting filesystem read-only



    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8): ext3_free_blocks_sb: bit already cleared for block 25442310
    Jun 22 03:47:52 venus kernel: Aborting journal on device sda8.
    Jun 22 03:47:52 venus kernel: ext3_abort called.
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8): ext3_journal_start_sb: Detected aborted journal
    Jun 22 03:47:52 venus kernel: Remounting filesystem read-only
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8): ext3_free_blocks_sb: bit already cleared for block 25442311
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8) in ext3_reserve_inode_write: Journal has aborted
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8) in ext3_truncate: Journal has aborted
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8) in ext3_reserve_inode_write: Journal has aborted
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8) in ext3_orphan_del: Journal has aborted
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8) in ext3_reserve_inode_write: Journal has aborted
    Jun 22 03:47:52 venus kernel: EXT3-fs error (device sda8) in ext3_delete_inode: Journal has aborted
    Jun 22 03:47:52 venus kernel: __journal_remove_journal_head: freeing b_committed_data
    Jun 22 03:47:52 venus kernel: __journal_remove_journal_head: freeing b_committed_data
     
  4. mike25

    mike25 Well-Known Member

    Joined:
    Aug 29, 2003
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Raleigh NC, USA

    Thanks for the feedback, It seems we posted at the same time, is you advice consistant with the journal errors I discovered? with them being at the same time each day, I wonder if a cron job somewhere is causing it to occur as well.
     
  5. cPanelNick

    cPanelNick Administrator
    Staff Member

    Joined:
    Mar 9, 2015
    Messages:
    3,426
    Likes Received:
    2
    Trophy Points:
    38
    cPanel Access Level:
    DataCenter Provider
    Advice is still the same :)


    Actually in your position, I'd get a new server. Copy all the accounts off, then try to fix the old one. Generally will file system corruption I get very afraid to reboot as sometimes you'll never see your files again.
     
  6. rpmws

    rpmws Well-Known Member

    Joined:
    Aug 14, 2001
    Messages:
    1,824
    Likes Received:
    5
    Trophy Points:
    38
    Location:
    back woods of NC, USA
    backup everything BEFORE you FSCK and you may want to kill crond for now. It may be that whatever is running is accessing some blocks that are bad. good luck!!
     
  7. mike25

    mike25 Well-Known Member

    Joined:
    Aug 29, 2003
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Raleigh NC, USA
    Thanks for the help guys. The main drive is actually a raid 1 hardware array, I ran the verify util and it found some issues and rebuilt the array. I hope that will fix it, guess I'll find out at 3:47. As far as backing up is concernced, I have full cpanel backups on another drive, with that and the raid mirror I think I should be alright, or am I missing something vital?
     
  8. adapter

    adapter Well-Known Member
    PartnerNOC

    Joined:
    Sep 17, 2003
    Messages:
    391
    Likes Received:
    0
    Trophy Points:
    16
    Hi

    i have the same problem, how did u fix it?

    my server go offline every morning at 4.25AM, checking /var/log/messages i have always this error:

    Nov 18 04:12:57 venus snmpd[2978]: refused smux peer: oid SNMPv2-SMI::enterprises.674.10892.1, password , descr Systems Management SNMP MIB Plug-in Manager
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7): htree_dirblock_to_tree: bad entry in directory #5210649: rec_len % 4 != 0 - offset=0, inode=1970562
    386, rec_len=28274, name_len=45
    Nov 18 04:12:58 venus kernel: Aborting journal on device sda7.
    Nov 18 04:12:58 venus kernel: ext3_abort called.
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
    Nov 18 04:12:58 venus kernel: Remounting filesystem read-only
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7) in start_transaction: Journal has aborted

    on sda7 i have /home partition and i have already run a fsck without found error on it, i have try to upgrade also to the latest RHEL kernel but the problem persist, how can i fix?
     
  9. sampride

    sampride Member

    Joined:
    Jul 8, 2005
    Messages:
    24
    Likes Received:
    0
    Trophy Points:
    1
  10. xufeng

    xufeng Member

    Joined:
    May 13, 2004
    Messages:
    14
    Likes Received:
    0
    Trophy Points:
    1
    Hi, Mike and Adapter:

    Have you tried Sampride's method? How is the effectiveness? Could you post a reply here? Thanks!

    I would like to know the result since it may help in resolve similar issues for new servers in future for all other users.

    :)
     
Loading...

Share This Page