All timers seem to fail when corrupted LoRa message is received on LoPy4
I've recently dug out my pair of LoPy4's after leaving them untouched for quite a while. One is on a PyTrack and the other is on an ExpBrd2. They both seem to be experiencing intermittent issues with timer failures that seem to occur at about the same time that a potentially corrupted LoRa message is received.
Both send status and stats via MQTT from a timer callback every 10 secs.
The Lopy4/PyTrack sends nav info via MQTT and via SOCK_RAW LoRa every 10 secs, rate governed by the GPS reading thread (i.e. not driven by a timer callback).
The Lopy4/ExpBrd2 sends a beacon over LoRa at a 33 second interval, to avoid inadvertent sync with the nav messages from the other device, but I think that my be resolved now with a collision avoidance mechanism, which would be nice. This is driven by a timer callback.
Both respond to MQTT test messages by blinking their LED blue (using a timer callback to turn off the LED). The same mechanism is used to blink the LED red when LoRa sends a packet, and green when LoRa receives a packet. Most of this code is relatively old and seemingly mostly bug free, though it does tend to be affected by quirky firmware versions.
Both devices are currently updated like so ...
(sysname='LoPy4', nodename='LoPy4', release='1.20.0.rc13', version='v1.9.4-94bb382 on 2019-08-22', machine='LoPy4 with ESP32', lorawan='1.0.2', sigfox='1.0.1')
The symptoms of the problem are as follows.
Whichever device becomes affected stops sending status and stats via MQTT (governed by timer callbacks) and its LED is left on and is green (should be turned off after timer callback). It continues to receive test messages via MQTT, changing the LED to blue, as expected, but no timer callback turns it off, so it remains on and blue.
If the LoPy4/ExpBrd2 is affected, it stops sending LoRa beacons (governed by timer callbacks). It continues to receive nav messages from the Lopy4/PyTrack over LoRa.
If the Lopy4/PyTrack is affected, it continues to publish nav info over MQTT and LoRa. This mechanism is not governed by timers and continues to work.
When this failure mode begins, I see a message indicating that a LoRa packet with a bad HMAC-MD5 was received. It seems to me that something odd is happening, causing a corrupted packet to come from the LoRa device and for all the system timers to fail.
For now, I have hacked my code to call machine.reset() when the HMAC fails on a received packet. This happens now and then, but the distributed mish-mash of stuff recovers and continues. This is in general, a bad idea, as alien messages (guaranteed HMAC failure) would cause resets. I am considering trying to drop into deep-sleep instead of calling machine.reset(), as this might be less time consuming, but is still not a good thing to do.
Has anyone experienced anything similar, or can shed any light on the situation?
I have also seen another issue (though not frequently), where the LoPy4/ExpBrd2 stops transmitting its LoRa beacons. Manually breaking in via the REPL and calling machine.reset() does not solve the problem, only a physical power interruption seems to allow normal operation to recover.
Cheers & Merry Xmas.