Reply
Member
Posts: 198
Registered: ‎10-18-2010
Kudos: 300
Solutions: 2
Contributions: 1

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

Your instructions worked great. Very straight-forward and simple. Awesome work.

Established Member
Posts: 919
Registered: ‎05-28-2012
Kudos: 183
Solutions: 6

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

[ Edited ]

I tried booting this after changing the USB drive and it just hangs on checking boot partition

I guess either the ERL doesn't like that drive or it's because the partition table on the drive won't be right for the edgemax.

Edit:

Tried creating a fat partition and an ext3 partition with half the drive left blank simular to the orginal drive but it didn't seem to help.

Edit2:

Ended up booting without the USB drive inserted once the ERK had fully booted and had an IP address I re-inserted the drive.

This allowed me to run emrk-reinstall which then partioned the new drive and installed edgeOS onto it.

Edgerouter-Lite seems to be happily booting of the Kingston drive now,

 

New Member
Posts: 4
Registered: ‎07-05-2013

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

Thanks for the tools and instructions!  A botched 1.4.1 upgrade led to discovering that my USB flash memory was bad.  I replaced with a Kingston 8gb and was able to restore directly to 1.4.1 using your process.  Thankfully I had just backed up my config and was able to restore that too once the system was running.

Thanks again!

New Member
Posts: 2
Registered: ‎04-29-2014
Kudos: 1

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

Hey dmbaturin

I just wanted to say thanks a lot for this toolkit, it just saved me hours of fiddling about.  I really appreciate it!!!

Great work!

Tom

 

Veteran Member
Posts: 4,628
Registered: ‎03-12-2011
Kudos: 2272
Solutions: 112

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

So just a quick gotcha, I suspect this won't work on ER8 or ER8 Pro's - only ERL's and ER-PoE's.

Reason being is that /dev/sda(1|2) is only used for ERL/ER-PoE's, on the other models they use /dev/mmcblk0p1 and /dev/mmcblk0p2.

Should be fairly simple to modify to make work with that, but ideally with some selector or something.

Member
Posts: 209
Registered: ‎05-21-2013
Kudos: 254
Solutions: 8
Contributions: 3

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

Oh, it's not that easy. ER8 also needs a newer kernel, and there weren't octeon SDK 3.0 binaries last time I checked, and also I don't have a spare ER8 for experiments (since ERL has detacheable USB stick, it was easy to fix it after unsuccessful attempts).
If you do have a spare ER8 and some time, you can try building the gcc source from SDK 3.0 and building the kernel from it.
Veteran Member
Posts: 4,628
Registered: ‎03-12-2011
Kudos: 2272
Solutions: 112

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

Ah, can't use the kernel that comes in the tarball? Guessing you need additional modules?

Member
Posts: 209
Registered: ‎05-21-2013
Kudos: 254
Solutions: 8
Contributions: 3

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

You can, but you still need a cross-compiler. Old SDK, the one ERL kernel comes from, offered precompiled toolchain and a (relatively) easy to use environment setup script.
Veteran Member
Posts: 4,628
Registered: ‎03-12-2011
Kudos: 2272
Solutions: 112

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)


dmbaturin wrote:
You can, but you still need a cross-compiler. Old SDK, the one ERL kernel comes from, offered precompiled toolchain and a (relatively) easy to use environment setup script.

Ah no, I mean can you not use the pre-built binary kernel from the ER-e200 firmware update tarball? (ie, the same kernel the EdgeRouter itself boots from).

Member
Posts: 209
Registered: ‎05-21-2013
Kudos: 254
Solutions: 8
Contributions: 3

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

Ah, I see. There are two problems: first, not sure if you can replace the initrd image in a binary kernel; second, there are proprietary Cavium modules included, so they need to be taken out of the initrd to avoid license problems.

Anyway, setting up cross-environment is more tedious than hard. If you make statically linked GNU toolchain binaries or detailed instructions how to setup the environment, this would be a valuable contribution per se. (:
Veteran Member
Posts: 4,628
Registered: ‎03-12-2011
Kudos: 2272
Solutions: 112

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

Only a licencing issue if you distribute it Man Wink

It may be possible to load an initrd via tftp without having to touch the vanilla kernel, go figure.

I might have a bit of a play and see if I can boot from uboot via tftp without relying on the flash at some point using only the vanilla kernel - should be able to test that on my spare dev ERL and if it works copy the concept on an ER8 (where my only "spare" one is serving my home network, which I generally prefer to be running Man Tongue).

Emerging Member
Posts: 42
Registered: ‎11-22-2013
Kudos: 28
Solutions: 2

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

I run my router on a UPS but after a lengthy power outage the ups turned off and I ended up with a squashfs error on reboot. I need to get it set up to shut things down on power failures but they are so rare for me I had kept putting it off.

Anyhow, this rescue kit was a piece of cake for me and had me rolling in just a few minutes.

thanks mate!

Regular Member
Posts: 710
Registered: ‎08-22-2013
Kudos: 189
Solutions: 15

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

[ Edited ]

Very helpful. Thanks for putting your efforts into this. Don't know exactly what caused my bootloader corruption, as I had shut the router down properly, only to put it on a UPS! Powered it back up and it wouldn't load my config. Rescue kit got the device back up and running. Thanks!

UBRSS, UEWA.


Milwaukee, WI
SuperUser
Posts: 4,561
Registered: ‎01-10-2012
Kudos: 1998
Solutions: 224

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

[ Edited ]

Ugh - I think I have a hardware issue Man Sad    I get:  

 

Formatting root partition
------------[ cut here ]------------
WARNING: at ¨:512 0xffffffff8131cd64()
Call Trace:[<ffffffff81107b28>] 0xffffffff81107b28

 [...]

Disabling lock debugging due to kernel taint

*** NMI Watchdog interrupt on Core 0x0 ***
$0 0x0000000000000000 at 0x0000000000000001
v0 0x0000000000000001 v1 0x0000000000010000
a0 0xa80000041f571498 a1 0x0000000000000000
a2 0x0000000000005bc9 a3 0xa80000041f5ef540
a4 0xa80000041f5ef580 a5 0xa80000041f5beb00
a6 0x00000000000f4240 a7 0xffffffff81f759a0
t0 0xa80000041e24ffe0 t1 0x0000000000008c00
t2 0xa80000041f434000 t3 0x0000000000000000
s0 0xa80000041ddabc00 s1 0xa80000041ddadc60
s2 0xa80000041f5711e0 s3 0xa80000041f41f540
s4 0xffffffffffffabd9 s5 0xa80000041f5ef6c0
s6 0x0000000000000001 s7 0xffffffff81880000
t8 0x0000000000000002 t9 0xffffffff812f6bd0
k0 0x0000000000000000 k1 0x0000000000000000
gp 0xa80000041e24c000 sp 0xa80000041e24f560
s8 0x0000000000000000 ra 0xffffffff81327ad0
err_epc 0xffffffff8110b2d4 epc 0xffffffff813baa70
status 0x0000000010488ce4 cause 0x0000000040808c00
sum0 0x010000f000008000 en0 0x0100400700008000
*** Chip soft reset soon ***

 Sigh.  And apparently my unit is out of warranty Man Sad  Shame, I really haven't used it that much - it's been on maybe three months Man Sad  

It doesn't always boot up right away either - I'll sometimes have to unplug it, wait a minute, plug it in - and about every fifth time it will then boot (which pops on the terminal display with no delay).  Very frustrating...

 

EDIT:  OK, I took the case off (it's out of warranty so what the heck), aimed a big fan at it, and after a few false starts it's back and the latest version is loaded!

Thanks to @dmbaturin for his excellent writeup. I didn't bother fiddling with my DHCP server - it's only four commands to bootstrap the thing from the TFTP server.

So it looks like a hardware issue of some sort, and it's heat related.  Obviously I can't deploy this thing like this so now I get to decide if I want to try again or go elsewhere Man Sad  

When you receive a solution to your question/issue, don't forget to mark your thread as solved and to give kudo's to the people who have helped you out!
Established Member
Posts: 919
Registered: ‎05-28-2012
Kudos: 183
Solutions: 6

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

[ Edited ]

Tried another USB?


If the problem is happening during formatting/writing it could be the flash going bad

I've used a Kingston DT SE9 8GB which fits (Just about, it does sit quite close to the edge of the case, Didn't pick this one for the ERL, it was just spare at the time my flash failed) alternatively I've heard good things about the "Sandisk Crusier fit" a few people on here have used them as replacement flash I believe.

Others will probably work, but due to the way the USB port is mounted on the PCB it has to be a relatively slim drive.

 

You should be able to pickup a suitible USB drive fairly cheaply so might be worth a shot.

 



Veteran Member
Posts: 4,628
Registered: ‎03-12-2011
Kudos: 2272
Solutions: 112

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)


EricE wrote:

Ugh - I think I have a hardware issue Man Sad    I get:  

 

Formatting root partition
------------[ cut here ]------------
WARNING: at ¨:512 0xffffffff8131cd64()
Call Trace:[<ffffffff81107b28>] 0xffffffff81107b28

 [...]

Disabling lock debugging due to kernel taint

*** NMI Watchdog interrupt on Core 0x0 ***
$0 0x0000000000000000 at 0x0000000000000001
v0 0x0000000000000001 v1 0x0000000000010000
a0 0xa80000041f571498 a1 0x0000000000000000
a2 0x0000000000005bc9 a3 0xa80000041f5ef540
a4 0xa80000041f5ef580 a5 0xa80000041f5beb00
a6 0x00000000000f4240 a7 0xffffffff81f759a0
t0 0xa80000041e24ffe0 t1 0x0000000000008c00
t2 0xa80000041f434000 t3 0x0000000000000000
s0 0xa80000041ddabc00 s1 0xa80000041ddadc60
s2 0xa80000041f5711e0 s3 0xa80000041f41f540
s4 0xffffffffffffabd9 s5 0xa80000041f5ef6c0
s6 0x0000000000000001 s7 0xffffffff81880000
t8 0x0000000000000002 t9 0xffffffff812f6bd0
k0 0x0000000000000000 k1 0x0000000000000000
gp 0xa80000041e24c000 sp 0xa80000041e24f560
s8 0x0000000000000000 ra 0xffffffff81327ad0
err_epc 0xffffffff8110b2d4 epc 0xffffffff813baa70
status 0x0000000010488ce4 cause 0x0000000040808c00
sum0 0x010000f000008000 en0 0x0100400700008000
*** Chip soft reset soon ***

 Sigh.  And apparently my unit is out of warranty Man Sad  Shame, I really haven't used it that much - it's been on maybe three months Man Sad  

It doesn't always boot up right away either - I'll sometimes have to unplug it, wait a minute, plug it in - and about every fifth time it will then boot (which pops on the terminal display with no delay).  Very frustrating...

 

EDIT:  OK, I took the case off (it's out of warranty so what the heck), aimed a big fan at it, and after a few false starts it's back and the latest version is loaded!

Thanks to @dmbaturin for his excellent writeup. I didn't bother fiddling with my DHCP server - it's only four commands to bootstrap the thing from the TFTP server.

So it looks like a hardware issue of some sort, and it's heat related.  Obviously I can't deploy this thing like this so now I get to decide if I want to try again or go elsewhere Man Sad  


Sounds like the memory bug. You mention 3 months, that sounds like it's still in warranty?

What is the manufacture date of the device? If it's one of the first manufactured ones it could well be the known memory issue bug. Probably worth emailing support@ even if it is out of the warranty period as it was a known hardware issue that affected some of the earliest units and not something you could have caused/etc. The fact that it was sitting on a shelf for <x> time unused so you couldn't spot the issue earlier doesn't change that the issue would have surfaced fairly quickly if you had started using it straight away.

In the bootloader run "mtest", if it hangs or errors out then it's probably the memory bug. It should just count up indefinitely.

Regular Member
Posts: 710
Registered: ‎08-22-2013
Kudos: 189
Solutions: 15

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

When my recent ERL went down for the count it was hot as hades. It's an older plastic ERL, and though my mtest kept counting up in iterations indefinitely. Though I got it back to life thanks to the rescue kit, I'm leery about putting not back out into prod. For now, I'm endorsing the 8 port model as it's rack mountable and has fans.

UBRSS, UEWA.


Milwaukee, WI
New Member
Posts: 26
Registered: ‎09-22-2013
Kudos: 5

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

[ Edited ]

I second the high temperature problem! I have two of the older plastic type, and one of the newer metal ones. They all get VERY hot.. the older plastic one actually has more internal aluminum heatspreaders.. 

The only ERL I had fail(corrupted file system multiple times) was the newest metal case one surprisingly.. Sent it in for RMA which was easy.. did you try RMA yours? they may still take it back for this type of hardware failure. 

Since observing +70C on the outside of the plastic case ERL with my IR thermometer I have put small 12V DC 80mm fans blowing on all my ERL from the side.. Just get a small case fan for a computer and hardwire it to a 12V DC wall wart... Then aim it at the ERL case.. Seems to keep their temps MUCH more reasonalbe. 

The lifetime of electronic components like the silicon devices and also ceramic multilayer capacitors is directly related to temperature. I love the ERL but their thermal design is inadequate if you ask me. External fan fixes this. I would not run one without an external fan after observing such high temps. 

Oh and one trick to keep the fan quiet, most 12V dc case fans will run at a lower speed at 5V - on one of my routers I am actually using a small 12V dc squirrel cage blower from an old broken Imac CPU cooler. Run at 5V it is almost inaudable and I have it aimed at the bottom of the Edge Router Lite where the heatspreader is, keeps the bottom barely above room temp. 

Cheers!
Chester

Veteran Member
Posts: 4,628
Registered: ‎03-12-2011
Kudos: 2272
Solutions: 112

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)


hilo90mhz wrote:

The lifetime of electronic components like the silicon devices and also ceramic multilayer capacitors is directly related to temperature. I love the ERL but their thermal design is inadequate if you ask me. External fan fixes this. I would not run one without an external fan after observing such high temps. 


Eh I've found the biggest thing affected by heat are big electrolytic capacitors (usually found in power circuitry), not the tiny ceramic caps.

I think the USB flash failures are just due to crappy flash (these things are built down to a price point, I've seen decent flash drives sold for more than an entire ERL) and unrelated to heat.

New Member
Posts: 26
Registered: ‎09-22-2013
Kudos: 5

Re: EdgeMax rescue kit (now you can reinstall EdgeOS from scratch)

[ Edited ]

Youre right electrolytic capacitors are also especially affected by heat and because of their liquid electrolyte they can dry out when the rubber seals inevitably fail.. 

Ceramic caps do also fail though, I had an asus laptop motherboard fail, traced it to an internally shorted ceramic capacitor right near the CPU, removed the shorted cap, laptop works again. 

I agree the USB drives are probably not the best quality and their lifetime does worry me. But I disagree about heat being unimportant in this issue, look up some silicon lifetime charts when related to temperature and you will see what I mean. The flash drives are just silicon like the CPU, in the standard ERL setup the flash drive is probably sitting at 70C 24/7 - most flash drives are not exposed to this sort of temperature for extended periods of time. 

Reply