GPy refuses to LTE attach after running for months

curtis.hendrix

I had 4 GPys running in devices for about 6 months with firmware 1.18.2 ( I don't know which "r" version) and at least 2 of them refuse to attach to an LTE connection. Here's what I tried on one GPy:

Erase the micropython firmware with os.mkfs('/flash')
Clear the modem blacklist with the following code:

from network import LTE
lte = LTE()
lte.send_at_cmd('AT+CRSM=214,28539,0,0,12,"FFFFFFFFFFFFFFFFFFFFFFFF"') 
lte.send_at_cmd('AT+CRSM=176,28539,0,0,12')

Manually try to connect to LTE in the REPL with:

from network import LTE
lte= LTE()
lte.reset()
lte.attach()

Every call to lte.isattached() returned false.

Reflashed modem firmware 41065.
Tried a different (known working) SIM card.

I finally tried updating the firmware on the GPy (not the modem) to the latest (1.20.1.r1) and selected "Erase during update", and now the GPy with original SIM has no issues with LTE.

Is there a known bug with the 1.18.2 firmware where LTE has not been stable with long term use? Anyone else have this issue?

I should be getting the other GPys back in the near future to investigate further.

kjm

@kjm I spoke too soon. I've got 2 x GPYs now that won't attach & all I've done is to try the lte.send_at_cmd('AT+CEREG=2;+CEREG?;+CEREG=0') cmd on them. Seems an improbable result I know but 2 out of 2 is enough for me, I'm not about to try it on a 3rd GPY just to get the hat trick & prove my point!

kjm

@curtis-hendrix Gee that's a bit scary, I'm running 41065 on my GPY modems too. Modem updating seems to be a bit of a stumbling block for pycom. When I asked them why I can't get a copy of the 43818 modem firmware in my latest GPY purchase for my other GPYs they said it was because they're still testing the upgrade packages.

One thing I've learned about the monarch modems is they have a lot of sensitivity to the 3v3 rail. I've lost 2 modems from when I was powering the GPY via USB. It took a while for me to realise that the frequent resets we were experiencing were from dips in the 3v3 rail when the sqns tried to transmit. Since we've been running our GPYs from a LiPo we haven't had any more power related resets & haven't had any more modem failures.

curtis.hendrix

I'm starting to suspect something is going horribly wrong with the modem. I have 2 GPys that appear to be operating normally, but will not do an LTE connection. I've wiped / updated the firmware on them and switched the file system to LittleFS.

HOWEVER, neither one can do a modem firmware update. I'm using the UART Recovery method outlined here: https://docs.pycom.io/tutorials/lte/firmware/

The devices will enter the upgrade state just fine when calling sqnsupgrade.uart(True), and I don't have any issue getting sqnsupgrade.run('COM8', 'C:/Users/Curtis/Downloads/CATM1-41065/CATM1-41065/CATM1-41065.dup', 'C:/Users/Curtis/Downloads/CATM1-41065/CATM1-41065/updater.elf') to start.

However, both devices will fail with Failed to start STP mode! after a long at <<< Welcome to the SQN3330 firmware updater [1.2.5] >>>.

So it appears there's an issue within the modem firmware (both were running CATM1-41065 before they died) that bricks the modem.

kjm

@curtis-hendrix said in GPy refuses to LTE attach after running for months:

Bringing this guy back from the dead

I ain't dead, not yet anyway. Although sometimes I think pycom are trying to kill me with frustration! Last month I spotted a little item in the pycom twitter feed that shows up sometimes on the right of this forums homepage. They were rabbiting on about new GPYs being better RF calibrated so I got hold of a new GPY. It came with v1.20.0.r4 firmware (43818 in the lte modem). Been running it now for a couple of weeks & it is much better behaved than anything I've seen so far. Not sure if this is down to it being one of the new calibrated variety or the different firmwares (currently using 1.20.1.r1/41065 in my other GPYs)

If I can find out where to get an upgdiff_41065-to-43818 I'll get all my GPYs on the same firmwares to see if the improved behaviour is down to the firmwares or the calibration.

securigy

@curtis-hendrix I ran some of these AT commands followed by lte.attach() for awhile (it worked) and then I read somewhere by Pycom person that all these AT commands (par some of them) are obsolete and should not be used. So I commented them out - and my FiPy attaches just fine, although lately, on average it takes longer...I am running non-official 1.20.1.r1 provided by Robert_hh...

curtis.hendrix

Bringing this guy back from the dead because I got 2 more GPy's back from the field with this same behavior and I'd LOVE to figure out why.

To summarize:

GPy's are using Hologram in the US for cell service.
They are frequently power cycled. They run off wall power and are moved around at least once per week.
They were fully functioning when they were put out into the field.
-After several (4-8 months) of flawless operation, they refuse to connect via LTE. All calls to lte.connect() always return false.

In response to @kjm, when I power up one of these problematic devices and execute the following:

from network import LTE
lte = LTE()
lte.reset()
mood=lte.send_at_cmd('at+cpin?').split('\r\n')[1]; print('mood =', mood)

I get mood = +CPIN: READY

I then executed lte.attach() and got back +SYSTART for mood. I sent the AT command again but got back ERROR, as shown below.

>>> lte.attach()
>>> mood=lte.send_at_cmd('at+cpin?').split('\r\n')[1]; print('mood =', mood)
mood = +SYSSTART
>>> mood=lte.send_at_cmd('at+cpin?').split('\r\n')[1]; print('mood =', mood)
 mood = ERROR
 >>>

So it looks like something is going sideways in the attach process.

How can I figure out what's going wrong?

crumble

@kjm said in GPy refuses to LTE attach after running for months:

@andreas I was talking 20s to wakeup, attach/connect, read sensors, upload-to-server then detach/disconnect which includes the program load time. Once we went above ver 1.18.1.r7 this time doubled. Some of it seemed to be a longer load time, but a lot if it was down down to longer everything, it was like boot.py just ran much slower.

You can reduce the time for program loading by using precompiled *.mpy or frozen modules. The frozen ones needs more attention and can do more stuff, if you decide to put some resources into the flash like large immutable string etc.

The mpy files don't need to be compiled on loading time. this ist much faster. Additionally you do not fragement your RAM which in theory a little faster as well.

With firmware 1.20.1 the mpy files will be again faster, because the new micropython version uses smaller mpy files.
The pure startup procedure shall be a lot faster with the latest version. The remaining time depends on how well you can use the waiting time for the network connections.

A further observation on file system problems. It seems to be intimately related to deepsleep. I now have to 2 GPYs under continuous test. The first one deepsleeps for 9.5mins out of every 10 & does not bother to reboot if it has a problem, just runs a new cycle every 10 min. It has a lot of instances where it wakes up & decides it has a syntax error in boot.py or no boot.py in which case it rebuilds the file system. For 144 deepsleeps/day it will usually crash within a day or two.

This sounds not like a file system problem. It seems that you are running out of memory. Precompiled files will help. And a careful look onto your allocations. Your memory may be too fragmented for compiling large files. So you have to import your files which are not precompiled as early as possible.

When I am right the SPI RAM cannot be addressed directly for all objects. So you still have to deal with the small internal RAM and its fragmentation.

but it often runs for a month or two before crashing the file system.

Close your logfiles before going to sleep. Add some waiting time, so that they are really closed.

Give the erase_all a try. Some updates or bad files seems to be wiped out instead of simply overwritten. It seems that this feature helps in a lot of issues with strange behaviour. This featrue belongs into the firmware update GUI.

andreas

Dear @kjm,

thanks for sharing these details about your setup and the experiences around it. As we have never used 1.18 but started development on 1.20 already, we haven't had the same opportunity to compare boot times between these releases.

Saying this, it is obvious that you are hesitant to move forward to a new firmware which is slower from the perspective of the experiences you had.

However, I still encourage you to try freezing your Python modules into a custom-baked firmware as this will be totally resilient against file system corruption (this will probably also work on 1.18) and then check the runtimes again.

With some luck, you might get the runtime to an equal amount on 1.20 already. On our thing [1], we are finished within 26s for the whole cycle [2]. However, we are currently on WiFi instead of LTE, which might consume additional time for attaching to the network.

Will be happy to hear about your outcomes.

With kind regards,
Andreas.

[1] https://github.com/hiveeyes/hiveeyes-micropython-firmware
[2] https://community.hiveeyes.org/t/micropython-module-freezing/2445/10

kjm

@andreas I was talking 20s to wakeup, attach/connect, read sensors, upload-to-server then detach/disconnect which includes the program load time. Once we went above ver 1.18.1.r7 this time doubled. Some of it seemed to be a longer load time, but a lot if it was down down to longer everything, it was like boot.py just ran much slower.

A further observation on file system problems. It seems to be intimately related to deepsleep. I now have to 2 GPYs under continuous test. The first one deepsleeps for 9.5mins out of every 10 & does not bother to reboot if it has a problem, just runs a new cycle every 10 min. It has a lot of instances where it wakes up & decides it has a syntax error in boot.py or no boot.py in which case it rebuilds the file system. For 144 deepsleeps/day it will usually crash within a day or two.

The other GPY doesn't deepsleep, it simply loops the program with a while 1: statement. at the same 6 cycles/hr as the deepsleeper. This GPY reboots itself if it can't get an lte connection or the modem starts returning empty responses to lte.send_at_cmd('AT+VZWRSRP') or the DS18B20 sensor won't give a temperature. On average it reboots about 45 times/day. Since it reloads boot.py from flash approximately 3 times less frequently I expected it to last about 3 times longer, but it often runs for a month or two before crashing the file system.

I've swapped the GPYs over so this it's not down to hardware, there is something about deepsleep that is harder on the FAT file system than a machine.reset().

andreas

@kjm said in GPy refuses to LTE attach after running for months:

wakeup from deepsleep cycle of just 20s

Don't know if we are talking about the same thing, but we experienced similar startup times of up to 30(!) seconds with our relatively large datalogger program we are currently conceiving [1] and found this to be considerably slow.

If you are coming from embedded programming on the C/C++ level, you are used to have startup times in the microsecond or millisecond range.

Saying this, we have been extremely happy to see MicroPython 1.11 coming to the Pycom Firmware Release 1.20.1.r1 as this improves module loading time significantly.

Now, we are down to 5 seconds loading time [2] and we wouldn't look back.

[1] https://github.com/hiveeyes/hiveeyes-micropython-firmware
[2] https://community.hiveeyes.org/t/micropython-module-freezing/2445/10

kjm

@andreas Our problem is we are very keen on the release 1.18.1 .r7 we're on now, it has a wakeup from deepsleep cycle of just 20s every 10 minutes. Any release after that has bloated out to double that time with devastating consequences for our battery. Is there any way to get LIttleFS in 1.18.1 .r7 ?

andreas

Dear @kjm,

we can second the observations made by @tlanier, see also our journey at [1].

You should definitely go with LittleFS instead of FatFS, which is resilient to power failures, has improved wear leveling features and is more efficient in general.

With kind regards,
Andreas.

[1] https://community.hiveeyes.org/t/fipy-verliert-programm-nach-power-off-durch-leeren-lipo-file-system-corruption-through-brownout-conditions/2057

tlanier

@kjm We were also having file system corruption problems until we switched to the LittleFS. Since then we have not had issues.

kjm

@curtis-hendrix I've been tinkering with half a dozen different GPYs for a year now, the file system is fragile. The longest I've ever got out of any of 'em is a couple of months at 6 deepsleep cycles/hr. Mostly, file system corruption will lead to anything from your program returning 'syntax error' when it tries to run to the gpy rebuilding the file system with blank boot.py & main.py, but it can also cause bizarre 4g modem behaviour too.

One trick I've learned is that at+cpin? (a command normally used to check for sim presence) can tell you a lot about where things are at. If you run

mood=lte.send_at_cmd('at+cpin?').split('\r\n')[1]; print('mood =', mood)

and you get anything other than READY or +SYSSTART as the reply you can forget attachment, it ain't gonna happen!

curtis.hendrix

@jcaron I think both were on Verizon.

jcaron

@curtis-hendrix From what I understand Hologram is just an MVNO, they use others' networks. Do you know which actual network each of the devices connects to?

Note that even when all on the same network, it may be the case that different parts of the network see changes at different times.

However, it's interesting that you saw the same issue at the remote location and back at base. Makes it less likely this is the issue (though not impossible).

robert-hh

@curtis-hendrix Yes, that would have been my suggestion, to completely wipe the devices, including a full erase of the flash, and do a clean new set-up.

curtis.hendrix

@jcaron All of my GPys are on Hologram. I have 4 total, 2 that are still functioning and 2 that stopped working. All of them had the same hardware and firmware.

I know it's not a lot of information to go on, especially since I "fixed" the one I have by updating the firmware. Once I get the other broken one back I'll hopefully be able to dig in deeper and figure out exactly what went sideways.

jcaron

@curtis-hendrix What network are you using? Do you have more devices using the same hardware/firmware and the same network which continued to work?

There have been instances in the past of networks changing settings on their side suddenly breaking things for devices which were running perfectly fine until then, it may be something similar in your case. The newer firmware may have different settings which are compatible with the new carrier settings while the old one wasn't.

This is all pure speculation of course, just a possibility to explore.

Explore Pybytes | Official Documentation | Report a Firmware Bug/Issue | GitHub

GPy refuses to LTE attach after running for months

Pycom on Twitter