Experiences with LoPys
TLDR: Lots of waffle, some flakiness, a few questions. Something to read whilst waiting for answers to other questions.
I received my first lopy a couple of weeks ago and my second 3 days ago.
The first one is on a PyTrack. It is the "portable node" in my domestic "Internet of Bodges". It is battery powered using a USB power brick and is administered using telnet over wifi. This sits well with the role of "portable node", but is in fact primarily motivated by the fact that the USB UART on the PyTrack board does not play nicely with the drivers in my Ubuntu 16.04 LTS's 4.10.0-32 kernel, with the problem described in the topic on this forum entitled "failed to set dtr/rts".
Currently I have it running in a infinite loop, which does soemthing like this:
if GPS data available:
- reads all NMEA data from the GPS, and publishes each NMEA message as a distinct sub-topic over MQTT
else if just fiinshed reading GPS:
- Reads accelerometer data and publishes over MQTT
- Reads roll,pitch,yaw data and publishes over MQTT
- Reads freemem and allocmem and publishes over MQTT
- Reads battery voltage and publishes over MQTT
- Considers whether or not to send beacon message via the LORA socket and if the time is right, sends one, publishing the sent message over MQTT.
- Reads from the LORA socket publishing anything it receives over MQTT, followed by the set of LORA stats for the previously received packet.
- Invokes the garbage collector
Some exceptions are handled locally within the loop, prinicpally those associated with processing received LORA traffic, otherwise exceptions would terminate the main loop, be caught, try to publish a notification of impending doom (exceptions thrown here are caught and discarded), set the LED to red and the exception raised again.
The 2nd one is on an ExpansionBoard2. it is hooked up to the Linux box on my coffee table via USB and is administered via the USB UART. I think of it as the "gateway node". It effectively does the same thing as the other one, except that it has no GPS or accelerometer, so it sits in a badly rate governed loop, publishing some status info, occasionally beaconing on LORA and listening for LORA traffic, publishing whatever it has over MQTT.
More recently, I decided to try the AES crypto implementation with a view to protecting my traffic from eavesdroppers (principally myself and my house-mates, both of whom are known to be malicious in the extreme).
In general, all of this works fine, but only most of the time.
There are typically 3 scenarios in which things do not work fine.
The "portable node" experiences transient phases lasting 2-5 seconds, during which there is seemingly always GPS traffic to read, which means that the expected updates of accelerometer. rpy, freemem, etc. are not published over MQTT. This is most obviously seen when keeping an eye on a command line subscriber and seeing the cycling pattern of line lengths change. I've had a look at wireshark packet captures when these quirks occur and can see TCP throughput drop to nothing for the duration. That's no real clue though, as that could be a response to congestion or packet loss, or a starvation of outbound traffic. Eventually the situation stabilizes, and its back to the monotonic one second cycle. I don't know what causes this. Has anyone experienced anything similar?
The "portable node" eventually stops working. Unless I manually intervene beforehand, this is the eventual fate of that lopy. When it has stopped working, it is IPv4 pingable, but no telnet connection can be established. A power cycle will restore a functioning state. There are no indications of an exception being thrown shown in the REPL via telnet. There is no "goodbye cruel world" message sent over MQTT. There is no setting of the LED to RED. During the last occurrence of this issue, the last thing said over MQTT was "lopy-0/gpspub/status/freemem 38736". It didn't get around to reporting on allocmem before it stopped working. Interestingly, the TCP connection between the lopy and the broker host remains open even now, 6 hours since the last published message from the lopy, so no "last will" is sent by the broker on behalf of the lopy, so my monitoring system never knows that the lopy is dead in the water.
I seemingly receive something over LORA that my code doesn't like processing. This is mostly the cause of doom for the "gateway node" though it can hit the "portable node" too.
The "gateway node" remains pingable, though is unresponsive to both the serial console REPL and via telnet, though the telnet session can be established after appropriate credentials are provided.
Originally, I was simply sending ASCII plain text messages over LORA, for the purpose of confirming functionality. The feedback over MQTT allowed me to know when LORA trafiic is being sent, received and have the stats for all the traffic too. Occassionaly one or the other node would receive data from its LORA socket that had not been sent by its peer. These payloads clearly contained binary data which wasn't being handled well, either in my publishing code, or my console logging code. I attempted to instrument the code to catch exceptions thrown by this payload processing, and hex print the payloads instead, but this proved just as prone to flakiness, so now I just catch any exceptions thrown by the payload processing code, garbage collect and continue.
The last I heard from it, it said ...
20170821-10:35:08.358980 lopy-1/lora/recv -_b������mL�
That output is from mosquitto_sub piped through ts, which presumably timestamped a newline in the payload received over LORA (and probably mis-decrypted, as the origin is unknown). That node then became unresponsive with no further clues as to its fate.
The take away for me, from this, is that I need to contrive a proper framing protocol to carry my raw LORA traffic. That leads me to wonder about the type of service I'm getting from a socket(socket.AF_LORA, socket.SOCK_RAW). Are the semantics of a SOCK_RAW similar to that of SOCK_DGRAM, in that a socket read of enough bytes will yield an entire application data unit, as sent by the peer node, assuming that it fits within the MTU for the LORA modem? Or is it possible to read data from multiple transmission units in one call to read()?
I have continued to receive "other traffic" despite changing frequencies and modes. I wonder if using socket.bind() and socket.connect() would have any effect on the reception of "other traffic"? Would the adresses used in bind and connect be used for separation of different application traffic on the same node, or is the address field intended to identify the node itself?
Does anyone have any comments on these observations, or answers to whatever questions I wrote?