
Distant Curve Remote Area Communications prides itself on considered and redundant design for remote areas and challenging environments, having said that, sometimes we've learnt to expect the unexpected. From Cockatoos eating GPS units to Wallabies jumping on our solar panels...
We had one such event today... The attached graph is temperature inside 2 of our sites in Central Northern Nortern Territory (each 30km apart). There's a couple of things unusual about this graph - the blue line (at a site called 'Yakulla') shows a dramatic drop in temperature just after sunrise - that's the fans activating just before sunrise and pulling in cooler air - as they're meant too. You'll see the orange line (Kankawalla) shows no such drop - in fact it's just continued to climb until reaching a peak of nearly 50 degrees Celcius at about lunchtime - it looked to us like all 5 fans had stopped working - unusual in itself because they're on two separate circuits for redundancy and we use expensive fans with ceramic bearings for long life.
Logging into the cameras this morning about 7am we confirmed the same - we could not hear the fans. With the equipment inside the boxes rapidly looking like it might reach 70 degrees Celcius today we knew we needed to take some action. To fly out there is incredibly expensive - around $3000 return. But what else could we do? All of the gear inside each control box is industrial rated (we tend to use rockwell, Allen Bradley, Hirschmann etc) but nonetheless a total ventilation failure is something that is not only unusual, but potentially problematic. So time for some analysis - What was the cause? Stuck Fan Control Relays? A problem with the Controller? Hot temperatures causing a false trigger of the polyswitch fuses?
We custom designed the controllers for each site - we call these site controllers the 'curveIQ' - they're an embedded expert system that controls all the subsystems at each site - they're based around an atmel processor and they're at the heart of what makes our systems so reliable. They do things like supremely accurately control the charging of batteries, monitor state of charge, autonomously interrogate devices and troubleshoot in the event of a link failure, fix and communicate problems with redundancy subsystems etc..
Amongst many other things they control the fans based upon temperature, but we do have the ability to switch the fans on and off remotely - with about a 2 minute time delay between 'executing the instruction' and the fans following the instruction. So we tried power cycling the fans a few times... Using the camera, we could hear the fan control switches clicking when we executed these commands - but still, the fans wouldn't start - so we were down to 1 of 3 causes - a thermal effect on our fuses (self resetting fuses), poor contact on our relays or perhaps a progressive failure of our fans.
So what could we do? If it was a polyswitch problem, it would probably self resolve in a week or so when the temps get lower - but it seemed really unlikely this was the cause..
One huge benefit of the fact we designed and wrote the code for CurveIQ is that we can get it to do novel stuff... so after a bit of thinking, we wrote an extra function in C++ and uploaded the new code / firmware through an encrypted tunnel 3000kms into the Desert into the atmel chip of the controller - the code looked something like this:-
void rattleRelays()
{
for (int i=0; i <= 200; i++){
wdt_reset();
PORTC=B11111100;
delay(100);
PORTC=B11111111;
wdt_reset();
delay(100);
}
}
The basic idea (or 'hope') of the code was to rapidly open and close the electronic switches (relays) on both sets of fans about 5 times a second for 40 seconds - to 'rattle' the fans - and hopefully get them turning again - after uploading, we told the new program to execute - straight away we could hear the rattling of the relays - a sound a bit like a woodpecker - through the cameras and gradually, after about 30 seconds, we started to hear the familiar 'hum' of the fans kicking in.
If you have a look at the graph you'll see the temps dropped substantially after 12pm (when we executed 'the rattle') - and they continue to drop. It was a success.
We're now fairly certain that the problem was probably due to the recent dust storms in the area - it seems that the very fine (like talcum powder) desert dust had entered through the filters and been ingested into the fan ceramic bearings - and with their very fine tolerances had clogged them. With the benefit of hindsight we can now see that the power consumption of the box seems to indicate that they'd probably been failing one at a time over the last few days until this morning all 5 had failed -
We thought we were doing the right thing by using open ceramic bearings for long life, but it seems to me like in the future we'll be using more traditional grease encapsulated sealed sleeve bearing fans. Managing these remote sites sometimes feels a bit like managing something on mars - but we've got enough redundancy in each system that we can address almost any problem.At no time during this event did our client ever lose their connectivity, so, for that reason, I consider this unexpected failure to be a success . We've got a trip planned out there in March and the weather is due to cool down over the next 2 weeks or so, so we'll replace these fans when we're out there instead of being forced to rush out - and in the meantime we'll just leave the fans running 24/7 - momentum usually keeps a dicky fan going.
Distant Curve - A Story About Coming Unstuck...
Distant Curve Remote Area Communications prides itself on considered and redundant design for remote areas and challenging environments, having said that, sometimes we've learnt to expect the unexpected. From Cockatoos eating GPS units to Wallabies jumping on our solar panels...
We had one such event today... The attached graph is temperature inside 2 of our sites in Central Northern Nortern Territory (each 30km apart). There's a couple of things unusual about this graph - the blue line (at a site called 'Yakulla') shows a dramatic drop in temperature just after sunrise - that's the fans activating just before sunrise and pulling in cooler air - as they're meant too. You'll see the orange line (Kankawalla) shows no such drop - in fact it's just continued to climb until reaching a peak of nearly 50 degrees Celcius at about lunchtime - it looked to us like all 5 fans had stopped working - unusual in itself because they're on two separate circuits for redundancy and we use expensive fans with ceramic bearings for long life.
Logging into the cameras this morning about 7am we confirmed the same - we could not hear the fans. With the equipment inside the boxes rapidly looking like it might reach 70 degrees Celcius today we knew we needed to take some action. To fly out there is incredibly expensive - around $3000 return. But what else could we do? All of the gear inside each control box is industrial rated (we tend to use rockwell, Allen Bradley, Hirschmann etc) but nonetheless a total ventilation failure is something that is not only unusual, but potentially problematic. So time for some analysis - What was the cause? Stuck Fan Control Relays? A problem with the Controller? Hot temperatures causing a false trigger of the polyswitch fuses?
We custom designed the controllers for each site - we call these site controllers the 'curveIQ' - they're an embedded expert system that controls all the subsystems at each site - they're based around an atmel processor and they're at the heart of what makes our systems so reliable. They do things like supremely accurately control the charging of batteries, monitor state of charge, autonomously interrogate devices and troubleshoot in the event of a link failure, fix and communicate problems with redundancy subsystems etc..
Amongst many other things they control the fans based upon temperature, but we do have the ability to switch the fans on and off remotely - with about a 2 minute time delay between 'executing the instruction' and the fans following the instruction. So we tried power cycling the fans a few times... Using the camera, we could hear the fan control switches clicking when we executed these commands - but still, the fans wouldn't start - so we were down to 1 of 3 causes - a thermal effect on our fuses (self resetting fuses), poor contact on our relays or perhaps a progressive failure of our fans.
So what could we do? If it was a polyswitch problem, it would probably self resolve in a week or so when the temps get lower - but it seemed really unlikely this was the cause..
One huge benefit of the fact we designed and wrote the code for CurveIQ is that we can get it to do novel stuff... so after a bit of thinking, we wrote an extra function in C++ and uploaded the new code / firmware through an encrypted tunnel 3000kms into the Desert into the atmel chip of the controller - the code looked something like this:-
void rattleRelays()
{
for (int i=0; i <= 200; i++){
wdt_reset();
PORTC=B11111100;
delay(100);
PORTC=B11111111;
wdt_reset();
delay(100);
}
}
The basic idea (or 'hope') of the code was to rapidly open and close the electronic switches (relays) on both sets of fans about 5 times a second for 40 seconds - to 'rattle' the fans - and hopefully get them turning again - after uploading, we told the new program to execute - straight away we could hear the rattling of the relays - a sound a bit like a woodpecker - through the cameras and gradually, after about 30 seconds, we started to hear the familiar 'hum' of the fans kicking in.
If you have a look at the graph you'll see the temps dropped substantially after 12pm (when we executed 'the rattle') - and they continue to drop. It was a success.
We're now fairly certain that the problem was probably due to the recent dust storms in the area - it seems that the very fine (like talcum powder) desert dust had entered through the filters and been ingested into the fan ceramic bearings - and with their very fine tolerances had clogged them. With the benefit of hindsight we can now see that the power consumption of the box seems to indicate that they'd probably been failing one at a time over the last few days until this morning all 5 had failed -
We thought we were doing the right thing by using open ceramic bearings for long life, but it seems to me like in the future we'll be using more traditional grease encapsulated sealed sleeve bearing fans. Managing these remote sites sometimes feels a bit like managing something on mars - but we've got enough redundancy in each system that we can address almost any problem.At no time during this event did our client ever lose their connectivity, so, for that reason, I consider this unexpected failure to be a success . We've got a trip planned out there in March and the weather is due to cool down over the next 2 weeks or so, so we'll replace these fans when we're out there instead of being forced to rush out - and in the meantime we'll just leave the fans running 24/7 - momentum usually keeps a dicky fan going.
{"location":{"title":"Tennant Creek NT, Australia","placeId":"ChIJSVS7TkNMTCsRIIQkKqgXAgQ"},"addedProducts":[{"id":"airfiber-5x","count":28}],"solved":"","numbers":"","description":"A series of radio repeaters to take high speed internet to an industrial site in the Northern Territory","mainImage":"144839i7AFB298A8ABDE8D1"}
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.