New Member
Posts: 3
Registered: ‎08-10-2012
Kudos: 9

SQUASHFS error and repair

[ Edited ]

I also experienced a SQUASHFS filesystem corruption as reported elsewhere in this forum. Instead of using the EdgeMax rescue kit ( http://community.ubnt.com/t5/EdgeMAX/EdgeMax-rescue-kit-now-you-can-reinstall-EdgeOS-from-scratch/td... ), I opted to remove the flash drive and use another Linux system to fsck and repair the ext3 partition and copy a new Linux kernel and squashfs image to the flash drive from the latest EdgeMax upgrade image. 

Initial problem

After a 10 second power loss, my EdgeRouter Lite (EdgeMAX v. 1.5.0 beta 1) became unresponsive via eth0, eth1, or eth2. Factory reset with a paperclip did not revive the device. Connecting via console revealed SQUASHFS errors resulting in kernel panic soft reset. From reviewing related posts on this forum, the device is suffering filesystem corruption. All devices on the LAN are connected to TrippLite ISOBAR surge protectors, so, without further evidence, I assume the problem is due to ext3 journal corruption and not unusual sequelae from a voltage spike.

Serial console output

U-Boot 1.1.1 (UBNT Build ID: 4493936-g009d77b) (Build time: Sep 20 2012 - 15:48:51)
Octeon ubnt_e100# reset

Looking for valid bootloader image....
Jumping to start of image at address 0xbfc80000


U-Boot 1.1.1 (UBNT Build ID: 4493936-g009d77b) (Build time: Sep 20 2012 - 15:48:51)

BIST check passed.
UBNT_E100 r1:2, r2:14, serial #: DC9FDB283366
Core clock: 500 MHz, DDR clock: 266 MHz (532 Mhz data rate)
DRAM:  512 MB
Clearing DRAM....... done
Flash:  4 MB
Net:   octeth0, octeth1, octeth2

USB:   (port 0) scanning bus for devices... 1 USB Devices found
       scanning bus for storage devices...
  Device 0: Vendor:          Prod.: USB DISK 2.0     Rev: PMAP
            Type: Removable Hard Disk
            Capacity: 3700.6 MB = 3.6 GB (7579008 x 512)                      0
reading vmlinux.64
............................

5683792 bytes read
argv[2]: coremask=0x3
argv[3]: root=/dev/sda2
argv[4]: rootdelay=15
argv[5]: rw
argv[6]: rootsqimg=squashfs.img
argv[7]: rootsqwdir=w
argv[8]: mtdparts=phys_mapped_flash:512k(boot0),512k(boot1),64k@3072k(eeprom)
ELF file is 64 bit
Allocating memory for ELF segment: addr: 0xffffffff80100000 (adjusted to: 0x100000), size 0x59c5f0
Allocated memory for ELF segment: addr: 0xffffffff80100000, size 0x59c5f0
Processing PHDR 0
  Loading 567400 bytes at ffffffff80100000
  Clearing 351f0 bytes at ffffffff80667400
## Loading Linux kernel with entry point: 0xffffffff80458590 ...
Bootloader: Done loading app on coremask: 0x3
Linux version 3.4.27-UBNT (ancheng@ubnt-builder2) (gcc version 4.7.0 (Cavium Inc. Version: SDK_3_0_0 build 16) ) #1 SMP Fri May 2 01:05:41 PDT 2014
CVMSEG size: 2 cache lines (256 bytes)
Cavium Inc. SDK-3.0
bootconsole [early0] enabled
CPU revision is: 000d0601 (Cavium Octeon+)
Checking for the multiply/shift bug... no.
Checking for the daddiu bug... no.
Determined physical RAM map:
 memory: 0000000007800000 @ 0000000000700000 (usable)
 memory: 0000000007c00000 @ 0000000008200000 (usable)
 memory: 000000000fc00000 @ 0000000410000000 (usable)
 memory: 0000000000047000 @ 0000000000629000 (usable after init)
Wasting 88312 bytes for tracking 1577 unused pages
Placing 0MB software IO TLB between 8000000001707000 - 8000000001747000
software IO TLB at phys 0x1707000 - 0x1747000
Zone PFN ranges:
  DMA32    0x00000629 -> 0x000f0000
  Normal   0x000f0000 -> 0x0041fc00
Movable zone start PFN for each node
Early memory PFN ranges
    0: 0x00000629 -> 0x00000670
    0: 0x00000700 -> 0x00007f00
    0: 0x00008200 -> 0x0000fe00
    0: 0x00410000 -> 0x0041fc00
Cavium Hotplug: Available coremask 0x0
Primary instruction cache 32kB, virtually tagged, 4 way, 64 sets, linesize 128 bytes.
Primary data cache 16kB, 64-way, 2 sets, linesize 128 bytes.
Secondary unified cache 128kB, 8-way, 128 sets, linesize 128 bytes.
PERCPU: Embedded 10 pages/cpu @8000000001784000 s9216 r8192 d23552 u40960
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 67946
Kernel command line:  bootoctlinux $loadaddr coremask=0x3 root=/dev/sda2 rootdelay=15 rw rootsqimg=squashfs.img rootsqwdir=w mtdparts=phys_mapped_flash:512k(boot0),512k(boot1),64k@3072k(eeprom) console=ttyS0,115200
PID hash table entries: 2048 (order: 2, 16384 bytes)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Memory: 499340k/508188k available (3469k kernel code, 8848k reserved, 1812k data, 284k init, 0k highmem)
Hierarchical RCU implementation.
NR_IRQS:256
Calibrating delay loop (skipped) preset value.. 1000.00 BogoMIPS (lpj=5000000)
pid_max: default: 32768 minimum: 301
Security Framework initialized
Mount-cache hash table entries: 256
Checking for the daddi bug... no.
SMP: Booting CPU01 (CoreId  1)...
CPU revision is: 000d0601 (Cavium Octeon+)
Brought up 2 CPUs
NET: Registered protocol family 16
bio: create slab <bio-0> at 0
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Switching to clocksource OCTEON_CVMCOUNT
NET: Registered protocol family 2
IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
TCP established hash table entries: 16384 (order: 6, 262144 bytes)
TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 16384 bind 16384)
TCP: reno registered
UDP hash table entries: 256 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
NET: Registered protocol family 1
ERROR: octeon_pci_console_setup0 failed.
/proc/octeon_perf: Octeon performance counter interface loaded
squashfs: version 4.0 (2009/01/31) Phillip Lougher
Registering unionfs 2.5.11 (for 3.4)
msgmni has been set to 975
io scheduler noop registered
io scheduler cfq registered (default)
Serial: 8250/16550 driver, 6 ports, IRQ sharing disabled
loop: module loaded
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
OcteonUSB 16f0010000000.usbc: Octeon Host Controller
OcteonUSB 16f0010000000.usbc: new USB bus registered, assigned bus number 1
OcteonUSB 16f0010000000.usbc: irq 56, io mem 0x00000000
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
OcteonUSB: Registered HCD for port 0 on irq 56
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver libusual
octeon_wdt: Initial granularity 5 Sec
TCP: cubic registered
NET: Registered protocol family 17
NET: Registered protocol family 15
L2 lock: TLB refill 256 bytes
L2 lock: General exception 128 bytes
L2 lock: low-level interrupt 128 bytes
L2 lock: interrupt 640 bytes
L2 lock: memcpy 1152 bytes
1180000000800.serial: ttyS0 at MMIO 0x1180000000800 (irq = 34) is a OCTEON
console [ttyS0] enabled, bootconsole disabled
console [ttyS0] enabled, bootconsole disabled
1180000000c00.serial: ttyS1 at MMIO 0x1180000000c00 (irq = 35) is a OCTEON
Bootbus flash: Setting flash for 4MB flash at 0x1f800000
phys_mapped_flash: Found 1 x16 devices at 0x0 in 8-bit bank. Manufacturer ID 0x0000c2 Chip ID 0x0000a7
Amd/Fujitsu Extended Query Table at 0x0040
  Amd/Fujitsu Extended Query version 1.1.
phys_mapped_flash: Swapping erase regions for top-boot CFI table.
number of CFI chips: 1
3 cmdlinepart partitions found on MTD device phys_mapped_flash
Creating 3 MTD partitions on "phys_mapped_flash":
0x000000000000-0x000000080000 : "boot0"
0x000000080000-0x000000100000 : "boot1"
0x000000300000-0x000000310000 : "eeprom"
Waiting 15sec before mounting root device...
usb 1-1: new high-speed USB device number 2 using OcteonUSB
scsi0 : usb-storage 1-1:1.0
scsi 0:0:0:0: Direct-Access              USB DISK 2.0     PMAP PQ: 0 ANSI: 4
sd 0:0:0:0: [sda] 7579008 512-byte logical blocks: (3.88 GB/3.61 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] No Caching mode page present
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] No Caching mode page present
sd 0:0:0:0: [sda] Assuming drive cache: write through
 sda: sda1 sda2
sd 0:0:0:0: [sda] No Caching mode page present
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] Attached SCSI removable disk
kjournald starting.  Commit interval 3 seconds
EXT3-fs (sda2): warning: mounting fs with errors, running e2fsck is recommended
EXT3-fs (sda2): using internal journal
EXT3-fs (sda2): recovery complete
EXT3-fs (sda2): mounted filesystem with journal data mode
VFS: Mounted root (unionfs filesystem) on device 0:11.
Freeing unused kernel memory: 284k freed
SQUASHFS error: zlib_inflate error, data probably corrupt
SQUASHFS error: squashfs_read_data failed to read block 0x1e0a28
SQUASHFS error: Unable to read data cache entry [1e0a28]
SQUASHFS error: Unable to read page, block 1e0a28, size fe84
SQUASHFS error: Unable to read data cache entry [1e0a28]
SQUASHFS error: Unable to read page, block 1e0a28, size fe84
SQUASHFS error: Unable to read data cache entry [1e0a28]
SQUASHFS error: Unable to read page, block 1e0a28, size fe84
SQUASHFS error: Unable to read data cache entry [1e0a28]
SQUASHFS error: Unable to read page, block 1e0a28, size fe84
SQUASHFS error: Unable to read data cache entry [1e0a28]
SQUASHFS error: Unable to read page, block 1e0a28, size fe84
SQUASHFS error: zlib_inflate error, data probably corrupt
SQUASHFS error: squashfs_read_data failed to read block 0x1e0a28
SQUASHFS error: Unable to read data cache entry [1e0a28]
SQUASHFS error: Unable to read page, block 1e0a28, size fe84
Kernel panic - not syncing: No init found.  Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.

*** NMI Watchdog interrupt on Core 0x01 ***
        $0      0x0000000000000000      at      0x0000000010108ce0
        v0      0xffffffff80139d20      v1      0x0000000000000001
        a0      0xfffffffffffffffd      a1      0x0000000000000000
        a2      0x00000000000002d4      a3      0xffffffffffff00fe
        a4      0x0000000000000400      a5      0x0000000000000400
        a6      0x800000041c29f9c8      a7      0x800000041c29fa80
        t0      0x0000000000000000      t1      0x000000001000001f
        t2      0x800000041c29c000      t3      0x0000000000000000
        s0      0xffffffff80670000      s1      0x0000000000000001
        s2      0x0000000000200200      s3      0x0000000000100100
        s4      0xffffffff80139d40      s5      0x0000000000000069
        s6      0x0000000000000000      s7      0x800000041ff00000
        t8      0x0000000000000001      t9      0xffffffff80188e80
        k0      0x0000000000000000      k1      0x0000000000000000
        gp      0x800000041c0a4000      sp      0x800000041c0a7bb0
        s8      0xffffffff804ef488      ra      0xffffffff80141410
        err_epc 0xffffffff80139d40      epc     0xffffffff80139d40
        status  0x0000000010588ce4      cause   0x0000000040808800
        sum0    0x000000f000000000      en0     0x0000000000000000
*** Chip soft reset soon ***

 

Repair

I managed to repair the corrupt ext3 partition and re-write the squashfs image by removing the EdgeRouter flash drive and inserting it into an Ubuntu system. Below is a rather verbose summary of commands aimed at users with limited Linux experience.

## Remove EdgeRouter flash drive from router
## Insert flash drive into system


## Find the EdgeRouter flash drive's paritions
## Here, the partitions are located at /dev/sdd1 and /dev/sdd2
sudo fdisk -l
# Disk /dev/sdd: 3880 MB, 3880452096 bytes
# 120 heads, 62 sectors/track, 1018 cylinders, total 7579008 sectors
# Units = sectors of 1 * 512 = 512 bytes
# Sector size (logical/physical): 512 bytes / 512 bytes
# I/O size (minimum/optimal): 512 bytes / 512 bytes
# Disk identifier: 0xaa2f0600
# 
#    Device Boot      Start         End      Blocks   Id  System
# /dev/sdd1            2048      292863      145408    c  W95 FAT32 (LBA)
# /dev/sdd2          292864     3710975     1709056   83  Linux


## Optional: Confirm flash drive partitions are of type "vfat" and "ext3"
blkid
# /dev/sdd1: SEC_TYPE="msdos" UUID="0267-11C9" TYPE="vfat"
# /dev/sdd2: UUID="15062a93-e869-4dd6-adff-3e56bb772ab1" TYPE="ext3" SEC_TYPE="ext2"


## Optional: Create a backup image of the flash drive
## Note in this example the sdd1 and sdd2 partitions are on the sdd drive
## Note this operation can take many minutes to complete and does not report progress
sudo dd if=/dev/sdd of=edgerouter.image


## Check for errors on flash drive partitions
## Example output below shows no errors
sudo fsck -f -n /dev/sdd1
# fsck from util-linux 2.20.1
# fsck.fat 3.0.26 (2014-03-07)
# /dev/sdd1: 4 files, 2778/36311 clusters

sudo fsck -f -n /dev/sdd2
# fsck from util-linux 2.20.1
# e2fsck 1.42.9 (4-Feb-2014)
# Pass 1: Checking inodes, blocks, and sizes
# Pass 2: Checking directory structure
# Pass 3: Checking directory connectivity
# Pass 4: Checking reference counts
# Pass 5: Checking group summary information
# /dev/sdd2: 620/213696 files (3.2% non-contiguous), 66138/427264 blocks


## If errors are found, try to automatically repair, e.g.:
sudo fsck -y /dev/sdd1
sudo fsck -y /dev/sdd2


## If errors cannot be fixed, consider:
##   1) Return to Ubiquiti under warranty
##   2) Purchase replacement flash drive that will fit inside EdgeRouter
##   3) Further repair beyond the scope of this post
## Otherwise, continue:


## Create temporary mountpoints
sudo mkdir /mnt/ubnt1 /mnt/ubnt2


## Mount flash drive partitions to mountpoints
sudo mount /dev/sdd1 /mnt/ubnt1
sudo mount /dev/sdd2 /mnt/ubnt2


## Optional: View filesystem disk usage of the mounted partitions
df -h
# /dev/sdd1       142M   11M  131M   8% /mnt/ubnt1
# /dev/sdd2       1.6G  201M  1.3G  14% /mnt/ubnt2


## Optional: View mounted partitions filesystems
## Note files ending in 'o' are old copies for fallback during upgrade failure
ls /mnt/ubnt1
# total 11M
# drwxr-xr-x 2 root root  16K Dec 31  1969 .
# drwxr-xr-x 1 root root   20 Jun 26 12:06 ..
# -rwxr-xr-x 1 root root 5.5M Jun 26 12:24 vmlinux.64
# -rwxr-xr-x 1 root root   33 Jun 26 11:21 vmlinux.64.md5
# -rwxr-xr-x 1 root root 5.5M Dec 24  2013 vmlinux.64o
# -rwxr-xr-x 1 root root   33 Mar 10 22:12 vmlinux.64o.md5

ls /mnt/ubnt2
# total 127M
# drwxr-xr-x 6 root     root    4.0K Jun 26 11:24 .
# drwxr-xr-x 1 root     root      20 Jun 26 12:06 ..
# drwx------ 2 root     root     16K Dec 31  1969 lost+found
# -rw-r--r-- 1 root     root     64M Jun 26 11:24 squashfs.img
# -rw-r--r-- 1 root     root      33 Jun 26 11:24 squashfs.img.md5
# -rw-r--r-- 1 root     root     64M Mar 10 20:42 squashfs.o
# -rw-r--r-- 1 root     root      33 Mar 10 20:42 squashfs.o.md5
# -rw-r--r-- 1 root     root      46 May  5 18:17 version
# -rw-r--r-- 1 root     root      41 Mar 10 20:38 version.o
# drwxr-xr-x 8 root     crontab 4.0K Jun  1  2011 w
# drwxr-xr-x 9 root     crontab 4.0K May  3 13:21 w.o
# drwxr-xr-x 2 www-data root    4.0K Jun  1  2011 www


## Download the EdgeRouter firmware upgrade image
## Exmple below uses EdgeMax v1.5.0
wget http://www.ubnt.com/downloads/firmwares/edgemax/v1.5.0/ER-e100.v1.5.0.4677648.tar


## Expand firmware tar archive
tar -xvf ER-e100.v1.5.0.4677648.tar


## Copy linux kernel and md5 hash to first (vfat) partition of flash drive
sudo cp vmlinux.tmp /mnt/ubnt1/vmlinux.64
sudo cp vmlinux.tmp.md5 /mnt/ubnt1/vmlinux.64.md5


## Copy squashfs filesystem image, md5 hash, and version file
## to second (ext3) partition of flash drive
sudo cp squashfs.tmp /mnt/ubnt2/squashfs.img
sudo cp squashfs.tmp.md5 /mnt/ubnt2/squashfs.img.md5
sudo cp version.tmp /mnt/ubnt2/version


## Unmount partitions
sudo umount /mnt/ubnt1 /mnt/ubnt2


## Delete temporary mountpoints
sudo rmdir /mnt/ubnt1 /mnt/ubnt2


## Remove flash drive from computer and insert back in EdgeRouter
## Power on EdgeRouter, optionally using console cable to confirm
## succesful boot.


## If you are temporarily using another router and have a DHCP lease from your ISP,
## don't forget to release the old DHCP lease if necessary.

 

Router is working normally now. If corruptions repeat in future, I'll consider replacing flash drive.