What is the 2020 state of play for LoPy4 as a single channel gateway?
-
I have a LoPy4 with the latest firmware, 3.1 expansion board and official antenna.
I unticked the PyBytes firmware option.
I'm using the official example here : https://github.com/pycom/pycom-libraries/tree/master/examples/lorawan-nano-gatewayI'm seeing core dumps commonly, and the "Downlink timestamp error!" on downlinks when activating via OTAA.
-
There are many posts in this forum with the above issues, but most are years old. Do people have the LoPy/LoPy4 working reliably as a single channel gateway?
-
Is it possible for it to work as reliably as you could expect from a single channel with
a) TheThingsNetwork
b) loraserver.pycom.io
c) a self hosted ChirpStack with custom timings, frequency plan or whatever -
What is the recommended nanogateway code these days?
@robert-hh I'd like your input as you were a part of so many of the other posts on the subject.
-
-
@gazmcghee Late server responses seem to be a problem for the Asia-Pacific region. Even here, with the TTN europe server almost in viewing distance - at least compared to Australian dimensions - the server responses are not too early, not even for the 5s time. They arrive just a few 100 ms before downlink time.
And it's not the link. Traceroute for router.eu.thethings.network reports ~30ms at the last line and 21 hops.
What I never addressed in the nanogateway code is the aspect, that the server changes the downlink message frequency and data rate. Using OTAA, the server can tell a node that setting. But using ABP, the node does initially not know. I never looked into the transfers logs for that case. The TTN server might still use the uplink configuration for downlink service messages.
For me the non-blocking socket is fine, because I have an external watchdog which I feed right from the main loop. That way I know that this loop is running. Funny enough, I see on indcidence that the watchdog had to act.
The -6000 value was the one I found accurate for LoPy4.
-
@robert-hh Thanks for all that. Yes you are correct that the main thread timing is not critical. By the way, I did some logging of timings and found the garbage collect always takes around 30ms, and a 10ms sleep can be up to 90ms. I still think blocking sockets would be an improvement, but aren't my main issue.
Blocking sockets would mean the _udp_thread literally consumes zero cpu and has zero timing interference while waiting for a packet, and it would respond to that packet immediately. If there were 10 packets in a row, it would process them all without really sleeping, while the original code could be delayed by a sleep and maybe a gc between packets. It could block with a timeout if there was a need to do something regularly, or just check for cancellation.
The main issue after logging pretty much everything timing related is the meshed network server. Regardless of whether I use 4G or VDSL, I regularly get pauses well over 5s (up to 17s once) and past the target send time. The only way I know to fix that is try another server (asia) or set up my own server.
My current code is here https://github.com/buzzware/nano-gateway-2020/tree/master
With the spin wait before the actual send, it's normally within 1ms of accurate. I don't know what the compensation should be, but I'm using your -6000 value.
I need to do more testing to see if that actually works for my third party nodes.
Cheers,
Gary
-
@gazmcghee FInally that was my result too: not sign LoPy or LoPy4 as the main gateway, because of:
- the restriction to one channel. It is already bad here in Europe, where practically only three frequencies are used, and even worse in Australia and the US with their many channels.
- the unreliable timing. That is a little bit better for Non-SPIRAM boards, but in general timings and ISR times are not reliable with ESP32/ESP8266 boards and the espressif RTOS. In my tests, ISRs had a timing lag of up to 1 ms, with the majority being between 50 and 300 µs. The regular garbage collect is used to keep the time low for each of them. The sleep in the upd_thread does not matter, since the downlink is triggered by an alarm. The downlink message only has to arrive early enough, like ~50 ms before it is to be forwarded.
You case of the 9 s delay of responses from TTN may have another reason. There was a post of a user, which had the same problem and solved it by using another TTN router (Australia instead of Europe).
Edit: Another issue I have with LoPy4 is the hardware reset of the SX1276 chip not being connected to a GPIO pin. So it cannot be reset by software. Almost every time I want so use it or change the configuration I have to go though a power cycle. A simple hardware reset of the board is not sufficient. So the SX1276 reset pin seems not even to be connected to the LoPy4 reset.
-
Our priorities have changed, and I'm switching away from this attempt to get the nano gateway working with my third party nodes.
My latest code is here https://github.com/buzzware/nano-gateway-2020I never got my devices to activate with OTAA. The gateway would get the request, send to TTN, get a reply and send it but no activation. I'm considering it failure because the meter doesn't start sending data packets.
Some observations :
- I regularly see a huge delay of up to 9 seconds in receiving the downlink activation data from TTN. This means the downlink window has already passed, so it has no hope of activating that time. I am in Perth Western Australia, using the meshed router. I've tried both my 50Mbps VSDL NBN internet, and Vodafone 4G through my phone (which is often faster).
- There is a large variability in all timings such that the 20ms downlink window is hard to hit.
- The _udp_thread uses a 10 or 20 ms sleep and non-blocking sockets, and also a garbage collect. Its possible that these pauses are causing significant variability in the timing between socket data arriving and it being responded to. In theory, it might work better to use blocking sockets and no sleep. Then it should react ASAP to incoming data, instead of waiting up to a full sleep.
Thanks @robert-hh for your assistance.
-
@gazmcghee said in What is the 2020 state of play for LoPy4 as a single channel gateway?:
I'm assuming you mean the LoPy4 should not get the same value as LoPy, the LoPy should get -6000
The values are negative. So Lopy4 code starts sending the downlink message 5 ms earlier than LoPy. The value was chosen form testing to ensure, that sending the downlink message starts right in the 20 ms receive window. In these test LoPy4 needed longer between calling for sending and actually putting RF onto the antenna. Timing precision was the main problem in using the LoPy4 as nanogateway. For a OTAA downlink at SF7 and DR5, you have to hit a 20ms time window 5 s after sensing the message.That was difficult for the LoPy4. The LoPy, using only internal RAM, works better in matters of timing.
About tq_iq: I do not see any errors. It is still in the code and the code works with it. But maybe you use a different firmware version than me. I am using v1.20.1.r2 on the LoPy, and 1.20.2RC11 on the other boards, built from the Dev branch. There is no specific reason on having an older version on the LoPy besides the fact, that this devices normally sits quietly at the window, looking out for messages. If removing tx_iq works for you, just do it.I abandoned the use of the LoPy as gateway. I have it still running here as long time test device doing something marginally useful, but the main gateway is a different make (IC330A + RPi). That runs rock solid.
-
I've created this project https://github.com/buzzware/nano-gateway-2020 and @robert-hh I made a branch for integrating your code above, which isn't yet merged to master https://github.com/buzzware/nano-gateway-2020/tree/20201025-roberthh
-
@robert-hh
From before :- thats scary
- I'm assuming you mean the LoPy4 should not get the same value as LoPy, the LoPy should get -6000
Additionally,
3) tx_iq seems to have disappeared as a parameter to lora.init - I get an error on the latest firmware. Should that be replaced by something? It is still in the docs.With a modified version of your code, I am no longer getting core dumps or as many timing errors, but my third party OTAA device is still sending activation packets and no data packets. I asume that means the activation was not successful in the device.
One thing I am curious about is that I have the nano gateway configured to SF7BW125 but on the activation downlink it's using SF7BW500 as told to by the server. Why not the same? The Seaslug code forces the bandwidth to 125 ie SF7BW125. Any comments?
Thanks for your help, I am motivated to solve this ASAP and will share the code on github.
-
@gazmcghee I'm using that code on LoPy only. And for ticks_diff, you may right. The code is old, and Pycom changed the order of the argument for ticks_diff. Seems that I do not use that with downlink messages at all.
About the special case for LoPy: That's OK, since LoPy is a little bit more responsive than LoPy4 or FiPy. So starting for sending the downlink message can be a little bit later for LoPy.
-
@robert-hh I'm trying your code. It isn't working yet, but it could be multiple things. I've found 2 issues though :
(1) For the window_compensation, you test
if uos.uname()[0] == "LoPy"
I'm on a LoPy4. Should it be
if (uos.uname()[0] == "LoPy") or (uos.uname()[0] == "LoPy4")
?
(2) For both times you use utime.ticks_diff, I think you've got the args in reverse order. The signature is ticks_diff(new,old).
This becomes an infinite loop after tmst :while utime.ticks_diff(utime.ticks_cpu(), tmst) > 0: pass
and this will give negative values that cause the Downlink Timestamp error :
t_us = utime.ticks_diff(utime.ticks_cpu(), utime.ticks_add(tmst, -15000))
-
@kjm No, I'm on AU915. I haven't got it working yet either
-
@gazmcghee sorry to stick me nose into your thread but I'm unable to get a join accept with a lopy4 from a TTN AS923 gateway, https://forum.pycom.io/topic/6427/the-things-network/11. Just curious if the TTN gateway you're on is AS923 too?
-
@gazmcghee said in What is the 2020 state of play for LoPy4 as a single channel gateway?:
self.lora_sock.settimeout(1)
Yes, I'm still using that line. The code I use is below. You have to remove the watchdog stuff. Just look for lines with watchdog or Watchdog in it.
""" LoPy LoRaWAN Nano Gateway. Can be used for both EU868 and US915. """ import errno import machine import ubinascii import ujson import uos import usocket import utime import _thread import gc from micropython import const from network import LoRa from network import WLAN from machine import Timer from watchdog_pulse import Watchdog PROTOCOL_VERSION = const(2) PUSH_DATA = const(0) PUSH_ACK = const(1) PULL_DATA = const(2) PULL_ACK = const(4) PULL_RESP = const(3) TX_ERR_NONE = 'NONE' TX_ERR_TOO_LATE = 'TOO_LATE' TX_ERR_TOO_EARLY = 'TOO_EARLY' TX_ERR_COLLISION_PACKET = 'COLLISION_PACKET' TX_ERR_COLLISION_BEACON = 'COLLISION_BEACON' TX_ERR_TX_FREQ = 'TX_FREQ' TX_ERR_TX_POWER = 'TX_POWER' TX_ERR_GPS_UNLOCKED = 'GPS_UNLOCKED' UDP_THREAD_CYCLE_MS = const(10) WDT_TIMEOUT = const(120) STAT_PK = { 'stat': { 'time': '', 'lati': 0, 'long': 0, 'alti': 0, 'rxnb': 0, 'rxok': 0, 'rxfw': 0, 'ackr': 100.0, 'dwnb': 0, 'txnb': 0 } } RX_PK = { 'rxpk': [{ 'time': '', 'tmst': 0, 'chan': 0, 'rfch': 0, 'freq': 0, 'stat': 1, 'modu': 'LORA', 'datr': '', 'codr': '4/5', 'rssi': 0, 'lsnr': 0, 'size': 0, 'data': '' }] } TX_ACK_PK = { 'txpk_ack': { 'error': '' } } class NanoGateway: """ Nano gateway class, set up by default for use with TTN, but can be configured for any other network supporting the Semtech Packet Forwarder. Only required configuration is wifi_ssid and wifi_password which are used for connecting to the Internet. """ def __init__(self, id, frequency, datarate, ssid, password, server, port, ntp_server='pool.ntp.org', ntp_period=3600): self.id = id self.server = server self.port = port self.frequency = frequency self.datarate = datarate self.ssid = ssid self.password = password self.ntp_server = ntp_server self.ntp_period = ntp_period self.server_ip = None self.rxnb = 0 self.rxok = 0 self.rxfw = 0 self.dwnb = 0 self.txnb = 0 self.sf = self._dr_to_sf(self.datarate) self.bw = self._dr_to_bw(self.datarate) self.stat_alarm = None self.pull_alarm = None self.uplink_alarm = None self.wlan = None self.sock = None self.udp_stop = False self.udp_lock = _thread.allocate_lock() self.lora = None self.lora_sock = None self.rtc = machine.RTC() self.watchdog = Watchdog("P9", "P10") def start(self): """ Starts the LoRaWAN nano gateway. """ self._log('Starting LoRaWAN nano gateway with id: {}', self.id) # setup WiFi as a station and connect self.wlan = WLAN(mode=WLAN.STA) self._connect_to_wifi() # get a time sync self._log('Syncing time with {} ...', self.ntp_server) self.rtc.ntp_sync(self.ntp_server, update_period=self.ntp_period) while not self.rtc.synced(): utime.sleep_ms(50) self._log("RTC NTP sync complete") # get the server IP and create an UDP socket self.server_ip = usocket.getaddrinfo(self.server, self.port)[0][-1] self._log('Opening UDP socket to {} ({}) port {}...', self.server, self.server_ip[0], self.server_ip[1]) self.sock = usocket.socket(usocket.AF_INET, usocket.SOCK_DGRAM, usocket.IPPROTO_UDP) self.sock.setsockopt(usocket.SOL_SOCKET, usocket.SO_REUSEADDR, 1) self.sock.setblocking(False) # push the first time immediatelly self._push_data(self._make_stat_packet()) # create the alarms self.stat_alarm = Timer.Alarm(handler=lambda t: self._push_data(self._make_stat_packet()), s=60, periodic=True) self.pull_alarm = Timer.Alarm(handler=lambda u: self._pull_data(), s=25, periodic=True) # start the watchdog self.watchdog.start(120) utime.sleep(1) self._log("Watchdog started") # start the UDP receive thread self.udp_stop = False _thread.start_new_thread(self._udp_thread, ()) # initialize the LoRa radio in LORA mode self._log('Setting up the LoRa radio at {} Mhz using {}', self._freq_to_float(self.frequency), self.datarate) self.lora = LoRa( mode=LoRa.LORA, region=LoRa.EU868, frequency=self.frequency, bandwidth=self.bw, sf=self.sf, preamble=8, coding_rate=LoRa.CODING_4_5, tx_iq=True ) # create a raw LoRa socket self.lora_sock = usocket.socket(usocket.AF_LORA, usocket.SOCK_RAW) self.lora_sock.setblocking(False) self.lora_tx_done = False self.lora.callback(trigger=(LoRa.RX_PACKET_EVENT | LoRa.TX_PACKET_EVENT), handler=self._lora_cb) if uos.uname()[0] == "LoPy": self.window_compensation = -1000 else: self.window_compensation = -6000 self._log('LoRaWAN nano gateway online') def stop(self): """ Stops the LoRaWAN nano gateway. """ self._log('Stopping...') # send the LoRa radio to sleep self.lora.callback(trigger=None, handler=None) self.lora.power_mode(LoRa.SLEEP) # stop the NTP sync self.rtc.ntp_sync(None) # cancel all the alarms self.stat_alarm.cancel() self.pull_alarm.cancel() # signal the UDP thread to stop self.udp_stop = True while self.udp_stop: utime.sleep_ms(50) # disable WLAN self.wlan.disconnect() self.wlan.deinit() def _connect_to_wifi(self): self.wlan.connect(self.ssid, auth=(None, self.password)) while not self.wlan.isconnected(): utime.sleep_ms(50) self._log('WiFi connected to: {}', self.ssid) def _dr_to_sf(self, dr): sf = dr[2:4] if sf[1] not in '0123456789': sf = sf[:1] return int(sf) def _dr_to_bw(self, dr): bw = dr[-5:] if bw == 'BW125': return LoRa.BW_125KHZ elif bw == 'BW250': return LoRa.BW_250KHZ else: return LoRa.BW_500KHZ def _sf_bw_to_dr(self, sf, bw): dr = 'SF' + str(sf) if bw == LoRa.BW_125KHZ: return dr + 'BW125' elif bw == LoRa.BW_250KHZ: return dr + 'BW250' else: return dr + 'BW500' def _lora_cb(self, lora): """ LoRa radio events callback handler. """ events = lora.events() if events & LoRa.RX_PACKET_EVENT: self.rxnb += 1 self.rxok += 1 rx_data = self.lora_sock.recv(256) stats = lora.stats() packet = self._make_node_packet(rx_data, self.rtc.now(), stats.rx_timestamp, stats.sfrx, self.bw, stats.rssi, stats.snr) self._push_data(packet) self._log('Received packet: {}', packet) self.rxfw += 1 if events & LoRa.TX_PACKET_EVENT: self.txnb += 1 lora.init( mode=LoRa.LORA, region=LoRa.EU868, frequency=self.frequency, bandwidth=self.bw, sf=self.sf, preamble=8, coding_rate=LoRa.CODING_4_5, tx_iq=True ) def _freq_to_float(self, frequency): """ MicroPython has some inprecision when doing large float division. To counter this, this method first does integer division until we reach the decimal breaking point. This doesn't completely elimate the issue in all cases, but it does help for a number of commonly used frequencies. """ divider = 6 while divider > 0 and frequency % 10 == 0: frequency = frequency // 10 divider -= 1 if divider > 0: frequency = frequency / (10 ** divider) return frequency def _make_stat_packet(self): now = self.rtc.now() STAT_PK["stat"]["time"] = "%d-%02d-%02d %02d:%02d:%02d GMT" % (now[0], now[1], now[2], now[3], now[4], now[5]) STAT_PK["stat"]["rxnb"] = self.rxnb STAT_PK["stat"]["rxok"] = self.rxok STAT_PK["stat"]["rxfw"] = self.rxfw STAT_PK["stat"]["dwnb"] = self.dwnb STAT_PK["stat"]["txnb"] = self.txnb return ujson.dumps(STAT_PK) def _make_node_packet(self, rx_data, rx_time, tmst, sf, bw, rssi, snr): RX_PK["rxpk"][0]["time"] = "%d-%02d-%02dT%02d:%02d:%02d.%dZ" % (rx_time[0], rx_time[1], rx_time[2], rx_time[3], rx_time[4], rx_time[5], rx_time[6]) RX_PK["rxpk"][0]["tmst"] = tmst RX_PK["rxpk"][0]["freq"] = self._freq_to_float(self.frequency) RX_PK["rxpk"][0]["datr"] = self._sf_bw_to_dr(sf, bw) RX_PK["rxpk"][0]["rssi"] = rssi RX_PK["rxpk"][0]["lsnr"] = snr RX_PK["rxpk"][0]["data"] = ubinascii.b2a_base64(rx_data)[:-1] RX_PK["rxpk"][0]["size"] = len(rx_data) return ujson.dumps(RX_PK) def _push_data(self, data): token = uos.urandom(2) packet = bytes([PROTOCOL_VERSION]) + token + bytes([PUSH_DATA]) + ubinascii.unhexlify(self.id) + data with self.udp_lock: try: self.sock.sendto(packet, self.server_ip) except Exception as ex: self._log('Failed to push uplink packet to server: {}', ex) def _pull_data(self): token = uos.urandom(2) packet = bytes([PROTOCOL_VERSION]) + token + bytes([PULL_DATA]) + ubinascii.unhexlify(self.id) with self.udp_lock: try: self.sock.sendto(packet, self.server_ip) except Exception as ex: self._log('Failed to pull downlink packets from server: {}', ex) def _ack_pull_rsp(self, token, error): TX_ACK_PK["txpk_ack"]["error"] = error resp = ujson.dumps(TX_ACK_PK) packet = bytes([PROTOCOL_VERSION]) + token + bytes([PULL_ACK]) + ubinascii.unhexlify(self.id) + resp with self.udp_lock: try: self.sock.sendto(packet, self.server_ip) except Exception as ex: self._log('PULL RSP ACK exception: {}', ex) def _send_down_link(self, data, tmst, datarate, frequency): """ Transmits a downlink message over LoRa. """ self.lora.init( mode=LoRa.LORA, region=LoRa.EU868, frequency=frequency, bandwidth=self._dr_to_bw(datarate), sf=self._dr_to_sf(datarate), preamble=8, coding_rate=LoRa.CODING_4_5, tx_iq=True ) while utime.ticks_diff(utime.ticks_cpu(), tmst) > 0: pass self.lora_sock.settimeout(1) self.lora_sock.send(data) self.lora_sock.setblocking(False) self._log( 'Sent downlink packet scheduled on {:.3f}, at {:,d} Hz using {}: {}', tmst / 1000000, frequency, datarate, data ) def _udp_thread(self): """ UDP thread, reads data from the server and handles it. """ while not self.udp_stop: gc.collect() try: data, src = self.sock.recvfrom(1024) _token = data[1:3] _type = data[3] if _type == PUSH_ACK: self._log("Push ack") elif _type == PULL_ACK: self._log("Pull ack") elif _type == PULL_RESP: self.dwnb += 1 ack_error = TX_ERR_NONE tx_pk = ujson.loads(data[4:]) payload = ubinascii.a2b_base64(tx_pk["txpk"]["data"]) # depending on the board, pull the downlink message 1 or 6 ms upfronnt tmst = utime.ticks_add(tx_pk["txpk"]["tmst"], self.window_compensation) t_us = utime.ticks_diff(utime.ticks_cpu(), utime.ticks_add(tmst, -15000)) if 1000 < t_us < 10000000: self.uplink_alarm = Timer.Alarm( handler=lambda x: self._send_down_link( payload, tmst, tx_pk["txpk"]["datr"], int(tx_pk["txpk"]["freq"] * 1000 + 0.0005) * 1000 ), us=t_us ) else: ack_error = TX_ERR_TOO_LATE self._log('Downlink timestamp error!, t_us: {}', t_us) self._ack_pull_rsp(_token, ack_error) self._log("Pull rsp") except usocket.timeout: pass except OSError as ex: if ex.args[0] != errno.EAGAIN: self._log('UDP recv OSError Exception: {}', ex) except Exception as ex: self._log('UDP recv Exception: {}', ex) if self.watchdog.status() == 0: self.watchdog.feed() self._log("Feeding the dog") # wait before trying to receive again utime.sleep_ms(UDP_THREAD_CYCLE_MS) # we are to close the socket self.sock.close() self.udp_stop = False self._log('UDP thread stopped') def _log(self, message, *args): """ Outputs a log message to stdout. """ print('[{:>10.3f}] {}'.format( utime.ticks_ms() / 1000, str(message).format(*args) ))
-
Thanks for replying.
I'm sticking with the LoPy4. My delay issue may have been because I'm in Australia but was using the Asia TTN server. I'm switching to the meshed AU server.
I have no idea why the core dumps.One question I have for you is regarding this :
"The gamechanger is the line:
self.lora_sock.settimeout(1)
which changes the way the message queue deals with messages."
https://forum.pycom.io/topic/2924/not-solved-lora-timing-of-lopy-and-fipy/8?_=1593150795069Do you still use that line? I can't find it anywhere on github, and something suggested that its effect was superceded by a later firmware update.
-
@gazmcghee I have a LoPy1 working as single channel gateway on the Loriot network. It seems more reliable than a LoPy4. Using the same device at the TTN network I had quite a few core dumps. But that's long ago, and the software changed a few times in between.
I switched again with the configuration that turned out robust the TTN network. So let's see how long that lasts. I have a hardware watchdog added and create a boot log file.