Sporadic Guru Meditation Errors on L01

Hi al!!
In my current project, I use a custom base board with a L01. There are a cuple of sensors and communication via WLAN (MQTT). The sensor data are read in seperate threads. My main loop just checks if new data are there, and sends it to the MQTT Broker.
So far, everything works fine. But, sporadically, sometimes after a few hours, sometimes after a few days, I get a Guru Meditation Error. Here's the log from my Otii:141039800  >>>>>>Meteo_Sensor1 Sensor Data send to MQTT Broker at 20181128 21:35:12 141039804  gGas1: 139933.24 Ohms, Air Quality1: 72.58 141039810  Temperature1: 26.7 C, Pressure1: 1012 hPa, Humidity1: 22.48 %RH 141039869  ~~~~~~~Free memory: 2498880 141039898  Guru Meditation Error: Core 1 panic'ed (StoreProhibited) 141039901  . Exception was unhandled. 141039903  Register dump: 141039911  PC : 0x400f6b6c PS : 0x00060130 A0 : 0x800f709d A1 : 0x3ffdaa90 141039919  A2 : 0x0000002a A3 : 0x3ffdaab0 A4 : 0x3f9525f0 A5 : 0x00000003 141039927  A6 : 0x00000000 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x3ffdaa70 141039934  A10 : 0x3f9532d0 A11 : 0x0000002b A12 : 0x00000000 A13 : 0x3f408c30 141039942  A14 : 0x3f408c30 A15 : 0x3f407ccc SAR : 0x0000001a EXCCAUSE: 0x0000001d 141039950  EXCVADDR: 0x0000002a LBEG : 0x400eae2c LEND : 0x400eae58 LCOUNT : 0x00026cdb 141039972  Backtrace: 0x400f6b6c:0x3ffdaa90 0x400f709a:0x3ffdaab0 0x400f8efb:0x3ffdaaf0 0x400f147d:0x3ffdab10 0x400fc857:0x3ffdab30 0x400f4b6c:0x3ffdabd0 0x400f147d:0x3ffdac10 0x400f14aa:0x3ffdac30 0x400ddf3f:0x3ffdac50 0x400de210:0x3ffdacf0 0x400dd043:0x3ffdad10 141039977  ================= CORE DUMP START =================  ================= CORE DUMP END ================= 141041915  Rebooting...
Ok, I could say, as long the device is automatically rebooting, "So What!!". But there's this little stinging pain in my head...
Any suggestions?
Cheers,
Thomas

@danielm
Not yet, but I will do it during this week.

@thosch42
v1.20.0.rc0 should implement: Fixed Issue with memory leak in thread lock creation
Did you already have chance to test with this new release?

@elizar said in Sporadic Guru Meditation Errors on L01:
Can you hook up a scope and seek for short voltage drops?
I logged voltage and main current with an Otii. Everything looks fine.

@thosch42 Do you possibly have a power problem? Can you hook up a scope and seek for short voltage drops?

@danielm
Hmm, but if I power cycle or hard reset the device, everything works fine. So the file system doesn't seems to be corrupted.

@thosch42
Hi, I have seen this before but I have no idea what is this issue related to. I think I tried to erase entire file system at that point.

Hi all!
My problem reached the next level ;)
I simplified my code. Just sending packed binary data to three topics instead of sending JSON objects to 28 topics. Initially it seems to help, but after over 4 days of error free operation, I get this (last lines of my Otii log):385275549  Traceback (most recent call last): 385275552  File "main.py", line 642, in <module> 385275556  File "main.py", line 589, in <module> 385275561  File "/flash/lib/mqtt_robust.py", line 34, in publish 385275566  File "/flash/lib/mqtt_robust.py", line 32, in publish 385275570  File "/flash/lib/mqtt.py", line 111, in publish 385275576  AttributeError: 'MQTTClient' object has no attribute '_send_str' 385276819  Pycom MicroPython 1.18.1.r1 [v1.8.6849b0520f1] on 20180829; LoPy with ESP32 385276822  Type "help()" for more information.
The L01 seems to "forget" parts of the scripts. While the main loop crashed, the three sensor threads are still perfectly running.
Now I think about putting the MQTT stuff also in threads. Is there a safe way, to check, if a thread is still running?Cheers,
Thomas

@danielm said in Sporadic Guru Meditation Errors on L01:
Are you able to prepare simplified code which causes the crash and which could be shared?
I will do it at the weekend.
Cheers,
Thomas

I have similar experience with using threads. As a workaround try to redesign your application to work without using threads.
Anyway this issue needs to be solved eventually. Are you able to prepare simplified code which causes the crash and which could be shared?