How to interpret the output from the pygate and how to debug communication problems?
I have set up a pygate with a gpy fitted with a sim card, and was able to register the gateway on TTN an send LoRaWAN messages through the gateway (uplink and downlink). But when I test it for a longer time, some of the messages never come through, and at times the pygate drops out without coming back (LED stays green, but the gateway is no longer detected at TTN). So I need to debug this, and need to understand the output which the pygate-gpy is writing. What is the meaning of the following, or where can I find more information on this?
##### END #####  lorapf: WARN_ [up ] ignored out-of sync PUSH_ACK packet buff_ack[0:121] != token[0:122]  lorapf: WARN_ [up ] PUSH_ACK recieve timeout 1  lorapf: INFO_ [up ] received pkt from mote: 26012C56 (fcnt=147/93), RSSI -80.0  lorapf: WARN_ [up ] ignored out-of sync PUSH_ACK packet buff_ack[0:122] != token[0:123]  lorapf: WARN_ [up ] PUSH_ACK recieve timeout 1  lorapf: INFO_ [main] report ##### 2020-07-30 21:51:24 GMT ##### ### [UPSTREAM] ### # RF packets received by concentrator: 1 # CRC_OK: 100.00%, CRC_FAIL: 0.00%, NO_CRC: 0.00% # RF packets forwarded: 1 (35 bytes) # PUSH_DATA datagrams sent: 2 (334 bytes) # PUSH_DATA acknowledged: 0.00% ### [DOWNSTREAM] ### # PULL_DATA sent: 3 (100.00% acknowledged) # PULL_RESP(onse) datagrams received: 0 (0 bytes) # RF packets sent to concentrator: 0 (0 bytes) # TX errors: 0 ### [JIT] ### [jit] queue is empty ### [GPS] ### # GPS sync is disabled ##### END #####
And I noticed that when the gateways starts failing, all pygate output stays the same except that we then get get "0 % acknowledged":
### [DOWNSTREAM] ### # PULL_DATA sent: 3 (0.00% acknowledged)
Can this be detected in a code that I run on the gpy so action can be taken by restarting something?
I am running the following firmware:
(sysname='GPy', nodename='GPy', release='1.20.2.rc10', version='v1.11-b2436d4 on 2020-07-01', machine='GPy with ESP32', pybytes='1.5.0', pygate='1.0.1')
And this is the code that I am running on the gpy:
# main.py -- put your code here! from network import LTE import time import machine from machine import RTC import pycom # Disable Heartbeat pycom.heartbeat(False) # Define callback function for Pygate events def machine_cb (arg): evt = machine.events() if (evt & machine.PYGATE_START_EVT): # Green pycom.rgbled(0x103300) elif (evt & machine.PYGATE_ERROR_EVT): # Red pycom.rgbled(0x331000) elif (evt & machine.PYGATE_STOP_EVT): # RGB off pycom.rgbled(0x000000) # register callback function machine.callback(trigger = (machine.PYGATE_START_EVT | machine.PYGATE_STOP_EVT | machine.PYGATE_ERROR_EVT), handler=machine_cb) # Connect to a NB-IoT Network lte = LTE() if lte.isattached(): print("already attached") else: print("attach") lte.attach(band=20, apn="att.iot") # for Orange, Belgium while not lte.isattached(): time.sleep(0.5) print("Now attached to network") print("connect") lte.connect() while not lte.isconnected(): time.sleep(0.5) print("Now connected") # Sync time via NTP server for GW timestamps on Events rtc = RTC() rtc.ntp_sync(server="0.pool.ntp.org") # Read the GW config file from Filesystem fp = open('/flash/config.json','r') buf = fp.read() # Start the Pygate machine.pygate_init(buf)
Sorry if I was not clear. The Pygate does work fine with the GPy, the only issues we are currently running into is the RF LTE connection not always being sustained. This is the case with many wireless devices, where you simply cannot rely on always being connected. There are several factors (outside our control) that influence the stability.
We are currently working on tests to recover the connection gracefully once it has dropped. This issue is also at hand without the Pygate, but before we saw no use-cases where the connection had to be sustained for longer.
Interesting that this "problem" pops up now, PyCom had over one year time to test this essential features! What was this video:
fake or a 5 minute test? Sorry to be so displeased, but after one year waiting and other problems without any help from PyCom last year e.g. see https://community.hiveeyes.org/t/investigating-random-core-panics-on-pycom-esp32-devices/2480 this is the next step in the same direction, throwing hardware to the market combined with banaware.
@jand The same holds for us in most cases :) There's lots of different carriers out there that handle things slightly different. Let me know if you found something that works for you!
@Gijs Thanks for the fast response. Since you are already investigating this, I will wait for the outcome of that.
I must say that I already used the same gpy with the same sim card (Orange, for NB-IoT) as an end node (before receiving the pygate). Also in that case I experienced that the LTE NB-IoT communication dropped after random amounts of time. I will now try to understand better what happens by running and logging tests. At least for that application I can put tests in my script and try to recover from it. The difficulty is knowing if the problem is with the modem or with the provider.
We are currently investigating the case of being connected to the LTE network for longer periods of time. It appears the network or the modem drops the connection after a random amount of time. Of course we should be able to detect this and recover. It is not a common use case to be connected through LTE-M or NB-IoT 24/7, but for the pygate this is necessary.
Generally, every 30 seconds, the gateway makes up a
[main] reportwith details about the upstream (node --> gateway --> TTN) and downstream (TTN --> gateway --> node) packets and their status. The
CRC checkrelates to the node-->gateway redundancy check and
packets forwardedthe amount of correct packets received and forwarded to TTN. The last
PUSH_DATA acknowledgedtells us the amount of acknowledged packets. On our gateways, not all packets are always acknowledged by the TTN server, and through LTE, we never get an ACK back, even if the packets show up in the TTN logs.