LoPy4 Crash with NanoGateway
We are testing out the NanoGateway with our stack and keep getting crashes. Trying to narrow it down but wanted to see if anyone had seen similar and had any suggestions. This seems to happen within 5 to 10 minutes of starting it up. Sometimes sooner.
Guru Meditation Error: Core 1 panic'ed (LoadProhibited) . Exception was unhandled. Register dump: PC : 0x400fc078 PS : 0x00060d30 A0 : 0x800dd76c A1 : 0x3ffd8a20 A2 : 0x00000001 A3 : 0x00000001 A4 : 0x00000000 A5 : 0x00000001 A6 : 0x00000000 A7 : 0x00060023 A8 : 0x800fae6a A9 : 0x3ffe7ce0 A10 : 0x00000001 A11 : 0x00000001 A12 : 0x3f953bc0 A13 : 0x3f953bc0 A14 : 0x3ffe7e50 A15 : 0x3ffe7db0 SAR : 0x00000011 EXCCAUSE: 0x0000001c EXCVADDR: 0x00000001 LBEG : 0x4009aa8c LEND : 0x4009aa97 LCOUNT : 0x00000000 Backtrace: 0x400fc078:0x3ffd8a20 0x400dd769:0x3ffd8a40 0x400dd7d3:0x3ffd8a60 0x400ddc75:0x3ffd8a90 0x40105a66:0x3ffd8ab0 0x400f3e2e:0x3ffd8ae0 0x400f07c5:0x3ffd8b10 0x400fbb43:0x3ffd8b30 0x400f3eb8:0x3ffd8bd0 0x400f07c5:0x3ffd8c10 0x400f082d:0x3ffd8c30 0x400fbbd9:0x3ffd8c50 0x400f3eb8:0x3ffd8cf0 0x400f07c5:0x3ffd8d70 0x400f07f2:0x3ffd8d90 0x400f1213:0x3ffd8db0 0x40108e25:0x3ffd8e40 0x40108e55:0x3ffd8e70 0x400f3e2e:0x3ffd8e90 0x400f07c5:0x3ffd8ec0 0x400fbb43:0x3ffd8ee0 0x400f3eb8:0x3ffd8f80 0x400f07c5:0x3ffd9010 0x400f07f2:0x3ffd9030 0x400ddd07:0x3ffd9050 0x400ddfd0:0x3ffd90f0 0x400dcf4b:0x3ffd9110
@jmarcelino Thanks. I was thinking on something like the FT2232H. I would also say it is difficult but it is for an academic research, so I think it is worthy. :)
I have just read that page, but today the OpenOCD page with the compatible JTAG adapters seems to have fallen.
I've used both FTDI FT2232H adapter (must disable the FTDI VCP driver on Mac/Linux) and Segger J-Link
That said in practice debugging the MicroPython task over JTAG is difficult
@seb Do you recommend any JTAG adapter for Lopy debugging?
JTAG TDO =
JTAG TDI =
JTAG TMS =
JTAG TCK =
JTAG TRST_N =
@juanma Thanks. The JTAG pins are pinned out on the LoPy. You just have to cross reference to the ESP32 datasheet to decipher them. When I get to that point I'll send them to you if you want.
@ssmith I have configured Eclipse IDE following this guide, although it is not complete because you also have to add all the include paths of the project.
I don't know anything about JTAG although I am looking for a solution too. It seems the JTAG pins are not available on the Lopy datasheet, so it is gonna be difficult.
A bit more digging into this. Modified the UDP_THREAD_CYCLE_MS constant in lorawan_nanogateway.py from 10 to 1000 and the LoPy ran overnight without issue. Seems like something is getting backed up and crashing. I've looked for memory leaks and stack overflows but haven';t been able to find anything.
I think it's time to try and debug at the firmware level. Anyone have any instructions on setting up eclipse and a jtag probe to be able to debug the firmware?
@daniel I tried setting the stack size up to 16K and no determinable difference is how long it takes to crash. This still feels like some sort of memory leak. We've tried monitoring mem_free() and mem_info() and can't detect anything. We do have a lot of things running in our system, WiFi, Bluetooth, LoRaWAN. We have not seen this type of crash running raw LoRa though. Still trying to narrow it down to the most basic items that make it crash.
@daniel I tried setting the stack size and it feels like it ran for a longer period of time but it did crash the same way. Is there a way to monitor the remaining stack size for a thread?
@daniel I'll try that right now.
@ssmith thanks for reporting this. What happens if you do:
In the constructor of the NanoGateway class?
I suspect that this is a stack overflow issue...
Yes, I'm running 1.15.0.b1
(sysname='LoPy', nodename='LoPy', release='1.15.0.b1', version='v1.8.6-849-baa8c33 on 2018-01-29', machine='LoPy with ESP32', lorawan='1.0.0')
Have you updated your device to the latest firmware? What is the output when you run:
import os os.uname()