What is the 2020 state of play for LoPy4 as a single channel gateway?

gazmcghee

I have a LoPy4 with the latest firmware, 3.1 expansion board and official antenna.
I unticked the PyBytes firmware option.
I'm using the official example here : https://github.com/pycom/pycom-libraries/tree/master/examples/lorawan-nano-gateway

I'm seeing core dumps commonly, and the "Downlink timestamp error!" on downlinks when activating via OTAA.

There are many posts in this forum with the above issues, but most are years old. Do people have the LoPy/LoPy4 working reliably as a single channel gateway?
Is it possible for it to work as reliably as you could expect from a single channel with
a) TheThingsNetwork
b) loraserver.pycom.io
c) a self hosted ChirpStack with custom timings, frequency plan or whatever
What is the recommended nanogateway code these days?

@robert-hh I'd like your input as you were a part of so many of the other posts on the subject.

robert-hh

@gazmcghee Late server responses seem to be a problem for the Asia-Pacific region. Even here, with the TTN europe server almost in viewing distance - at least compared to Australian dimensions - the server responses are not too early, not even for the 5s time. They arrive just a few 100 ms before downlink time.
And it's not the link. Traceroute for router.eu.thethings.network reports ~30ms at the last line and 21 hops.
What I never addressed in the nanogateway code is the aspect, that the server changes the downlink message frequency and data rate. Using OTAA, the server can tell a node that setting. But using ABP, the node does initially not know. I never looked into the transfers logs for that case. The TTN server might still use the uplink configuration for downlink service messages.
For me the non-blocking socket is fine, because I have an external watchdog which I feed right from the main loop. That way I know that this loop is running. Funny enough, I see on indcidence that the watchdog had to act.
The -6000 value was the one I found accurate for LoPy4.

gazmcghee

@robert-hh Thanks for all that. Yes you are correct that the main thread timing is not critical. By the way, I did some logging of timings and found the garbage collect always takes around 30ms, and a 10ms sleep can be up to 90ms. I still think blocking sockets would be an improvement, but aren't my main issue.

Blocking sockets would mean the _udp_thread literally consumes zero cpu and has zero timing interference while waiting for a packet, and it would respond to that packet immediately. If there were 10 packets in a row, it would process them all without really sleeping, while the original code could be delayed by a sleep and maybe a gc between packets. It could block with a timeout if there was a need to do something regularly, or just check for cancellation.

The main issue after logging pretty much everything timing related is the meshed network server. Regardless of whether I use 4G or VDSL, I regularly get pauses well over 5s (up to 17s once) and past the target send time. The only way I know to fix that is try another server (asia) or set up my own server.

My current code is here https://github.com/buzzware/nano-gateway-2020/tree/master

With the spin wait before the actual send, it's normally within 1ms of accurate. I don't know what the compensation should be, but I'm using your -6000 value.

I need to do more testing to see if that actually works for my third party nodes.

Cheers,

Gary

robert-hh

@gazmcghee FInally that was my result too: not sign LoPy or LoPy4 as the main gateway, because of:

the restriction to one channel. It is already bad here in Europe, where practically only three frequencies are used, and even worse in Australia and the US with their many channels.
the unreliable timing. That is a little bit better for Non-SPIRAM boards, but in general timings and ISR times are not reliable with ESP32/ESP8266 boards and the espressif RTOS. In my tests, ISRs had a timing lag of up to 1 ms, with the majority being between 50 and 300 µs. The regular garbage collect is used to keep the time low for each of them. The sleep in the upd_thread does not matter, since the downlink is triggered by an alarm. The downlink message only has to arrive early enough, like ~50 ms before it is to be forwarded.

You case of the 9 s delay of responses from TTN may have another reason. There was a post of a user, which had the same problem and solved it by using another TTN router (Australia instead of Europe).

Edit: Another issue I have with LoPy4 is the hardware reset of the SX1276 chip not being connected to a GPIO pin. So it cannot be reset by software. Almost every time I want so use it or change the configuration I have to go though a power cycle. A simple hardware reset of the board is not sufficient. So the SX1276 reset pin seems not even to be connected to the LoPy4 reset.

gazmcghee

Our priorities have changed, and I'm switching away from this attempt to get the nano gateway working with my third party nodes.
My latest code is here https://github.com/buzzware/nano-gateway-2020

I never got my devices to activate with OTAA. The gateway would get the request, send to TTN, get a reply and send it but no activation. I'm considering it failure because the meter doesn't start sending data packets.

Some observations :

I regularly see a huge delay of up to 9 seconds in receiving the downlink activation data from TTN. This means the downlink window has already passed, so it has no hope of activating that time. I am in Perth Western Australia, using the meshed router. I've tried both my 50Mbps VSDL NBN internet, and Vodafone 4G through my phone (which is often faster).
There is a large variability in all timings such that the 20ms downlink window is hard to hit.
The _udp_thread uses a 10 or 20 ms sleep and non-blocking sockets, and also a garbage collect. Its possible that these pauses are causing significant variability in the timing between socket data arriving and it being responded to. In theory, it might work better to use blocking sockets and no sleep. Then it should react ASAP to incoming data, instead of waiting up to a full sleep.

Thanks @robert-hh for your assistance.

robert-hh

@gazmcghee said in What is the 2020 state of play for LoPy4 as a single channel gateway?:

I'm assuming you mean the LoPy4 should not get the same value as LoPy, the LoPy should get -6000

The values are negative. So Lopy4 code starts sending the downlink message 5 ms earlier than LoPy. The value was chosen form testing to ensure, that sending the downlink message starts right in the 20 ms receive window. In these test LoPy4 needed longer between calling for sending and actually putting RF onto the antenna. Timing precision was the main problem in using the LoPy4 as nanogateway. For a OTAA downlink at SF7 and DR5, you have to hit a 20ms time window 5 s after sensing the message.That was difficult for the LoPy4. The LoPy, using only internal RAM, works better in matters of timing.
About tq_iq: I do not see any errors. It is still in the code and the code works with it. But maybe you use a different firmware version than me. I am using v1.20.1.r2 on the LoPy, and 1.20.2RC11 on the other boards, built from the Dev branch. There is no specific reason on having an older version on the LoPy besides the fact, that this devices normally sits quietly at the window, looking out for messages. If removing tx_iq works for you, just do it.

I abandoned the use of the LoPy as gateway. I have it still running here as long time test device doing something marginally useful, but the main gateway is a different make (IC330A + RPi). That runs rock solid.

gazmcghee

I've created this project https://github.com/buzzware/nano-gateway-2020 and @robert-hh I made a branch for integrating your code above, which isn't yet merged to master https://github.com/buzzware/nano-gateway-2020/tree/20201025-roberthh

gazmcghee

@robert-hh
From before :

thats scary
I'm assuming you mean the LoPy4 should not get the same value as LoPy, the LoPy should get -6000

Additionally,
3) tx_iq seems to have disappeared as a parameter to lora.init - I get an error on the latest firmware. Should that be replaced by something? It is still in the docs.

With a modified version of your code, I am no longer getting core dumps or as many timing errors, but my third party OTAA device is still sending activation packets and no data packets. I asume that means the activation was not successful in the device.

One thing I am curious about is that I have the nano gateway configured to SF7BW125 but on the activation downlink it's using SF7BW500 as told to by the server. Why not the same? The Seaslug code forces the bandwidth to 125 ie SF7BW125. Any comments?

Thanks for your help, I am motivated to solve this ASAP and will share the code on github.

robert-hh

@gazmcghee I'm using that code on LoPy only. And for ticks_diff, you may right. The code is old, and Pycom changed the order of the argument for ticks_diff. Seems that I do not use that with downlink messages at all.
About the special case for LoPy: That's OK, since LoPy is a little bit more responsive than LoPy4 or FiPy. So starting for sending the downlink message can be a little bit later for LoPy.

gazmcghee

@robert-hh I'm trying your code. It isn't working yet, but it could be multiple things. I've found 2 issues though :

(1) For the window_compensation, you test

if uos.uname()[0] == "LoPy"

I'm on a LoPy4. Should it be

if (uos.uname()[0] == "LoPy") or (uos.uname()[0] == "LoPy4")

?

(2) For both times you use utime.ticks_diff, I think you've got the args in reverse order. The signature is ticks_diff(new,old).
This becomes an infinite loop after tmst :

while utime.ticks_diff(utime.ticks_cpu(), tmst) > 0:
            pass

and this will give negative values that cause the Downlink Timestamp error :

t_us = utime.ticks_diff(utime.ticks_cpu(), utime.ticks_add(tmst, -15000))

gazmcghee

@kjm No, I'm on AU915. I haven't got it working yet either

kjm

@gazmcghee sorry to stick me nose into your thread but I'm unable to get a join accept with a lopy4 from a TTN AS923 gateway, https://forum.pycom.io/topic/6427/the-things-network/11. Just curious if the TTN gateway you're on is AS923 too?

robert-hh

@gazmcghee said in What is the 2020 state of play for LoPy4 as a single channel gateway?:

self.lora_sock.settimeout(1)

Yes, I'm still using that line. The code I use is below. You have to remove the watchdog stuff. Just look for lines with watchdog or Watchdog in it.

""" LoPy LoRaWAN Nano Gateway. Can be used for both EU868 and US915. """

import errno
import machine
import ubinascii
import ujson
import uos
import usocket
import utime
import _thread
import gc
from micropython import const
from network import LoRa
from network import WLAN
from machine import Timer
from watchdog_pulse import Watchdog

PROTOCOL_VERSION = const(2)

PUSH_DATA = const(0)
PUSH_ACK = const(1)
PULL_DATA = const(2)
PULL_ACK = const(4)
PULL_RESP = const(3)

TX_ERR_NONE = 'NONE'
TX_ERR_TOO_LATE = 'TOO_LATE'
TX_ERR_TOO_EARLY = 'TOO_EARLY'
TX_ERR_COLLISION_PACKET = 'COLLISION_PACKET'
TX_ERR_COLLISION_BEACON = 'COLLISION_BEACON'
TX_ERR_TX_FREQ = 'TX_FREQ'
TX_ERR_TX_POWER = 'TX_POWER'
TX_ERR_GPS_UNLOCKED = 'GPS_UNLOCKED'

UDP_THREAD_CYCLE_MS = const(10)
WDT_TIMEOUT = const(120)


STAT_PK = {
    'stat': {
        'time': '',
        'lati': 0,
        'long': 0,
        'alti': 0,
        'rxnb': 0,
        'rxok': 0,
        'rxfw': 0,
        'ackr': 100.0,
        'dwnb': 0,
        'txnb': 0
    }
}

RX_PK = {
    'rxpk': [{
        'time': '',
        'tmst': 0,
        'chan': 0,
        'rfch': 0,
        'freq': 0,
        'stat': 1,
        'modu': 'LORA',
        'datr': '',
        'codr': '4/5',
        'rssi': 0,
        'lsnr': 0,
        'size': 0,
        'data': ''
    }]
}

TX_ACK_PK = {
    'txpk_ack': {
        'error': ''
    }
}


class NanoGateway:
    """
    Nano gateway class, set up by default for use with TTN, but can be configured
    for any other network supporting the Semtech Packet Forwarder.
    Only required configuration is wifi_ssid and wifi_password which are used for
    connecting to the Internet.
    """

    def __init__(self, id, frequency, datarate, ssid, password, server, port, ntp_server='pool.ntp.org', ntp_period=3600):
        self.id = id
        self.server = server
        self.port = port

        self.frequency = frequency
        self.datarate = datarate

        self.ssid = ssid
        self.password = password

        self.ntp_server = ntp_server
        self.ntp_period = ntp_period

        self.server_ip = None

        self.rxnb = 0
        self.rxok = 0
        self.rxfw = 0
        self.dwnb = 0
        self.txnb = 0

        self.sf = self._dr_to_sf(self.datarate)
        self.bw = self._dr_to_bw(self.datarate)

        self.stat_alarm = None
        self.pull_alarm = None
        self.uplink_alarm = None

        self.wlan = None
        self.sock = None
        self.udp_stop = False
        self.udp_lock = _thread.allocate_lock()

        self.lora = None
        self.lora_sock = None

        self.rtc = machine.RTC()

        self.watchdog = Watchdog("P9", "P10")

    def start(self):
        """
        Starts the LoRaWAN nano gateway.
        """

        self._log('Starting LoRaWAN nano gateway with id: {}', self.id)

        # setup WiFi as a station and connect
        self.wlan = WLAN(mode=WLAN.STA)
        self._connect_to_wifi()

        # get a time sync
        self._log('Syncing time with {} ...', self.ntp_server)
        self.rtc.ntp_sync(self.ntp_server, update_period=self.ntp_period)
        while not self.rtc.synced():
            utime.sleep_ms(50)
        self._log("RTC NTP sync complete")

        # get the server IP and create an UDP socket
        self.server_ip = usocket.getaddrinfo(self.server, self.port)[0][-1]
        self._log('Opening UDP socket to {} ({}) port {}...', self.server, self.server_ip[0], self.server_ip[1])
        self.sock = usocket.socket(usocket.AF_INET, usocket.SOCK_DGRAM, usocket.IPPROTO_UDP)
        self.sock.setsockopt(usocket.SOL_SOCKET, usocket.SO_REUSEADDR, 1)
        self.sock.setblocking(False)

        # push the first time immediatelly
        self._push_data(self._make_stat_packet())

        # create the alarms
        self.stat_alarm = Timer.Alarm(handler=lambda t: self._push_data(self._make_stat_packet()), s=60, periodic=True)
        self.pull_alarm = Timer.Alarm(handler=lambda u: self._pull_data(), s=25, periodic=True)

        # start the watchdog
        self.watchdog.start(120)
        utime.sleep(1)
        self._log("Watchdog started")

        # start the UDP receive thread
        self.udp_stop = False
        _thread.start_new_thread(self._udp_thread, ())

        # initialize the LoRa radio in LORA mode
        self._log('Setting up the LoRa radio at {} Mhz using {}', self._freq_to_float(self.frequency), self.datarate)
        self.lora = LoRa(
            mode=LoRa.LORA,
            region=LoRa.EU868,
            frequency=self.frequency,
            bandwidth=self.bw,
            sf=self.sf,
            preamble=8,
            coding_rate=LoRa.CODING_4_5,
            tx_iq=True
        )

        # create a raw LoRa socket
        self.lora_sock = usocket.socket(usocket.AF_LORA, usocket.SOCK_RAW)
        self.lora_sock.setblocking(False)
        self.lora_tx_done = False

        self.lora.callback(trigger=(LoRa.RX_PACKET_EVENT | LoRa.TX_PACKET_EVENT), handler=self._lora_cb)
        
        if uos.uname()[0] == "LoPy":
            self.window_compensation = -1000
        else:
            self.window_compensation = -6000

        self._log('LoRaWAN nano gateway online')

    def stop(self):
        """
        Stops the LoRaWAN nano gateway.
        """

        self._log('Stopping...')

        # send the LoRa radio to sleep
        self.lora.callback(trigger=None, handler=None)
        self.lora.power_mode(LoRa.SLEEP)

        # stop the NTP sync
        self.rtc.ntp_sync(None)

        # cancel all the alarms
        self.stat_alarm.cancel()
        self.pull_alarm.cancel()

        # signal the UDP thread to stop
        self.udp_stop = True
        while self.udp_stop:
            utime.sleep_ms(50)

        # disable WLAN
        self.wlan.disconnect()
        self.wlan.deinit()

    def _connect_to_wifi(self):
        self.wlan.connect(self.ssid, auth=(None, self.password))
        while not self.wlan.isconnected():
            utime.sleep_ms(50)
        self._log('WiFi connected to: {}', self.ssid)

    def _dr_to_sf(self, dr):
        sf = dr[2:4]
        if sf[1] not in '0123456789':
            sf = sf[:1]
        return int(sf)

    def _dr_to_bw(self, dr):
        bw = dr[-5:]
        if bw == 'BW125':
            return LoRa.BW_125KHZ
        elif bw == 'BW250':
            return LoRa.BW_250KHZ
        else:
            return LoRa.BW_500KHZ

    def _sf_bw_to_dr(self, sf, bw):
        dr = 'SF' + str(sf)
        if bw == LoRa.BW_125KHZ:
            return dr + 'BW125'
        elif bw == LoRa.BW_250KHZ:
            return dr + 'BW250'
        else:
            return dr + 'BW500'

    def _lora_cb(self, lora):
        """
        LoRa radio events callback handler.
        """

        events = lora.events()
        if events & LoRa.RX_PACKET_EVENT:
            self.rxnb += 1
            self.rxok += 1
            rx_data = self.lora_sock.recv(256)
            stats = lora.stats()
            packet = self._make_node_packet(rx_data, self.rtc.now(), stats.rx_timestamp, stats.sfrx, self.bw, stats.rssi, stats.snr)
            self._push_data(packet)
            self._log('Received packet: {}', packet)
            self.rxfw += 1
        if events & LoRa.TX_PACKET_EVENT:
            self.txnb += 1
            lora.init(
                mode=LoRa.LORA,
                region=LoRa.EU868,
                frequency=self.frequency,
                bandwidth=self.bw,
                sf=self.sf,
                preamble=8,
                coding_rate=LoRa.CODING_4_5,
                tx_iq=True
                )

    def _freq_to_float(self, frequency):
        """
        MicroPython has some inprecision when doing large float division.
        To counter this, this method first does integer division until we
        reach the decimal breaking point. This doesn't completely elimate
        the issue in all cases, but it does help for a number of commonly
        used frequencies.
        """

        divider = 6
        while divider > 0 and frequency % 10 == 0:
            frequency = frequency // 10
            divider -= 1
        if divider > 0:
            frequency = frequency / (10 ** divider)
        return frequency

    def _make_stat_packet(self):
        now = self.rtc.now()
        STAT_PK["stat"]["time"] = "%d-%02d-%02d %02d:%02d:%02d GMT" % (now[0], now[1], now[2], now[3], now[4], now[5])
        STAT_PK["stat"]["rxnb"] = self.rxnb
        STAT_PK["stat"]["rxok"] = self.rxok
        STAT_PK["stat"]["rxfw"] = self.rxfw
        STAT_PK["stat"]["dwnb"] = self.dwnb
        STAT_PK["stat"]["txnb"] = self.txnb
        return ujson.dumps(STAT_PK)

    def _make_node_packet(self, rx_data, rx_time, tmst, sf, bw, rssi, snr):
        RX_PK["rxpk"][0]["time"] = "%d-%02d-%02dT%02d:%02d:%02d.%dZ" % (rx_time[0], rx_time[1], rx_time[2], rx_time[3], rx_time[4], rx_time[5], rx_time[6])
        RX_PK["rxpk"][0]["tmst"] = tmst
        RX_PK["rxpk"][0]["freq"] = self._freq_to_float(self.frequency)
        RX_PK["rxpk"][0]["datr"] = self._sf_bw_to_dr(sf, bw)
        RX_PK["rxpk"][0]["rssi"] = rssi
        RX_PK["rxpk"][0]["lsnr"] = snr
        RX_PK["rxpk"][0]["data"] = ubinascii.b2a_base64(rx_data)[:-1]
        RX_PK["rxpk"][0]["size"] = len(rx_data)
        return ujson.dumps(RX_PK)

    def _push_data(self, data):
        token = uos.urandom(2)
        packet = bytes([PROTOCOL_VERSION]) + token + bytes([PUSH_DATA]) + ubinascii.unhexlify(self.id) + data
        with self.udp_lock:
            try:
                self.sock.sendto(packet, self.server_ip)
            except Exception as ex:
                self._log('Failed to push uplink packet to server: {}', ex)

    def _pull_data(self):
        token = uos.urandom(2)
        packet = bytes([PROTOCOL_VERSION]) + token + bytes([PULL_DATA]) + ubinascii.unhexlify(self.id)
        with self.udp_lock:
            try:
                self.sock.sendto(packet, self.server_ip)
            except Exception as ex:
                self._log('Failed to pull downlink packets from server: {}', ex)

    def _ack_pull_rsp(self, token, error):
        TX_ACK_PK["txpk_ack"]["error"] = error
        resp = ujson.dumps(TX_ACK_PK)
        packet = bytes([PROTOCOL_VERSION]) + token + bytes([PULL_ACK]) + ubinascii.unhexlify(self.id) + resp
        with self.udp_lock:
            try:
                self.sock.sendto(packet, self.server_ip)
            except Exception as ex:
                self._log('PULL RSP ACK exception: {}', ex)

    def _send_down_link(self, data, tmst, datarate, frequency):
        """
        Transmits a downlink message over LoRa.
        """

        self.lora.init(
            mode=LoRa.LORA,
            region=LoRa.EU868,
            frequency=frequency,
            bandwidth=self._dr_to_bw(datarate),
            sf=self._dr_to_sf(datarate),
            preamble=8,
            coding_rate=LoRa.CODING_4_5,
            tx_iq=True
            )
        while utime.ticks_diff(utime.ticks_cpu(), tmst) > 0:
            pass
        self.lora_sock.settimeout(1)
        self.lora_sock.send(data)
        self.lora_sock.setblocking(False)
        self._log(
            'Sent downlink packet scheduled on {:.3f}, at {:,d} Hz using {}: {}',
            tmst / 1000000,
            frequency,
            datarate,
            data
        )

    def _udp_thread(self):
        """
        UDP thread, reads data from the server and handles it.
        """

        while not self.udp_stop:
            gc.collect()
            try:
                data, src = self.sock.recvfrom(1024)
                _token = data[1:3]
                _type = data[3]
                if _type == PUSH_ACK:
                    self._log("Push ack")
                elif _type == PULL_ACK:
                    self._log("Pull ack")
                elif _type == PULL_RESP:
                    self.dwnb += 1
                    ack_error = TX_ERR_NONE
                    tx_pk = ujson.loads(data[4:])
                    payload = ubinascii.a2b_base64(tx_pk["txpk"]["data"])
                    # depending on the board, pull the downlink message 1 or 6 ms upfronnt
                    tmst = utime.ticks_add(tx_pk["txpk"]["tmst"], self.window_compensation)
                    t_us = utime.ticks_diff(utime.ticks_cpu(), utime.ticks_add(tmst, -15000))
                    if 1000 < t_us < 10000000:
                        self.uplink_alarm = Timer.Alarm(
                            handler=lambda x: self._send_down_link(
                                payload,
                                tmst, tx_pk["txpk"]["datr"],
                                int(tx_pk["txpk"]["freq"] * 1000 + 0.0005) * 1000
                            ),
                            us=t_us
                        )
                    else:
                        ack_error = TX_ERR_TOO_LATE
                        self._log('Downlink timestamp error!, t_us: {}', t_us)
                    self._ack_pull_rsp(_token, ack_error)
                    self._log("Pull rsp")
            except usocket.timeout:
                pass
            except OSError as ex:
                if ex.args[0] != errno.EAGAIN:
                    self._log('UDP recv OSError Exception: {}', ex)
            except Exception as ex:
                self._log('UDP recv Exception: {}', ex)

            if self.watchdog.status() == 0:
                self.watchdog.feed()
                self._log("Feeding the dog")

            # wait before trying to receive again
            utime.sleep_ms(UDP_THREAD_CYCLE_MS)

        # we are to close the socket
        self.sock.close()
        self.udp_stop = False
        self._log('UDP thread stopped')

    def _log(self, message, *args):
        """
        Outputs a log message to stdout.
        """

        print('[{:>10.3f}] {}'.format(
            utime.ticks_ms() / 1000,
            str(message).format(*args)
            ))

gazmcghee

Thanks for replying.
I'm sticking with the LoPy4. My delay issue may have been because I'm in Australia but was using the Asia TTN server. I'm switching to the meshed AU server.
I have no idea why the core dumps.

One question I have for you is regarding this :

"The gamechanger is the line:
self.lora_sock.settimeout(1)
which changes the way the message queue deals with messages."
https://forum.pycom.io/topic/2924/not-solved-lora-timing-of-lopy-and-fipy/8?_=1593150795069

Do you still use that line? I can't find it anywhere on github, and something suggested that its effect was superceded by a later firmware update.

robert-hh

@gazmcghee I have a LoPy1 working as single channel gateway on the Loriot network. It seems more reliable than a LoPy4. Using the same device at the TTN network I had quite a few core dumps. But that's long ago, and the software changed a few times in between.
I switched again with the configuration that turned out robust the TTN network. So let's see how long that lasts. I have a hardware watchdog added and create a boot log file.

Explore Pybytes | Official Documentation | Report a Firmware Bug/Issue | GitHub

What is the 2020 state of play for LoPy4 as a single channel gateway?

Pycom on Twitter