Pygate - ethernet reliability issues



  • Hi,

    Been using my pygate with POE for a while now and I notice that it frequently stops communications over ethernet and has to be reset or power cycled to get it to work again. My code is based on the examples in the documentation.

    I've not tried the same tests with wifi yet.

    Has anyone else experienced this?

    Thanks

    Andrew



  • @Jaap-Crezee This looks like a good starting point. You probably want to add a loop and send a counter in the client, and print the counter on both client (TX and RX) and on the server, with timestamps.

    It's regular Python but I don't think there's much to change.

    Other examples here and here if needed.



  • @jcaron @gijs @peterp
    I tried another network on an (much) older sitecom 10/100mbps switch connected to my computer on a different ethernet port.
    That network does not have the noise which is present on my "normal" office lan.

    I am now seeing this:

    64 bytes from 192.168.0.2: icmp_seq=386 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=387 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=388 ttl=255 time=1.90 ms
    64 bytes from 192.168.0.2: icmp_seq=389 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=390 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=391 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=392 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=393 ttl=255 time=1.96 ms
    64 bytes from 192.168.0.2: icmp_seq=394 ttl=255 time=1.92 ms    # <== until here it worked without any of the previous problems;
    64 bytes from 192.168.0.2: icmp_seq=395 ttl=255 time=1.92 ms    # <== at this point in time I attach my other LAN to the sitecom switch
    64 bytes from 192.168.0.2: icmp_seq=413 ttl=255 time=1.29 ms    # <== here I disconnect that cable
    64 bytes from 192.168.0.2: icmp_seq=414 ttl=255 time=1.90 ms
    64 bytes from 192.168.0.2: icmp_seq=415 ttl=255 time=1.90 ms
    64 bytes from 192.168.0.2: icmp_seq=416 ttl=255 time=2.14 ms
    64 bytes from 192.168.0.2: icmp_seq=417 ttl=255 time=1.69 ms
    64 bytes from 192.168.0.2: icmp_seq=418 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=419 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=420 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=421 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=422 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=423 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=424 ttl=255 time=1.30 ms
    64 bytes from 192.168.0.2: icmp_seq=425 ttl=255 time=1.94 ms    # <== at this point in time I attach my other LAN to the sitecom switch again
    64 bytes from 192.168.0.2: icmp_seq=434 ttl=255 time=1.31 ms    # <== here I disconnect that cable again
    64 bytes from 192.168.0.2: icmp_seq=435 ttl=255 time=2.10 ms
    64 bytes from 192.168.0.2: icmp_seq=436 ttl=255 time=1.93 ms
    ^C
    --- 192.168.0.2 ping statistics ---
    436 packets transmitted, 372 received, +36 errors, 14.6789% packet loss, time 438231ms
    rtt min/avg/max/mdev = 1.186/3.154/419.886/21.812 ms, pipe 3
    [jaap@jaap ~ ]$ 
    

    This makes me believe for at least 90% this is actually a software issue somewhere in the pygate/pycom module related to incoming network traffic.
    Doesn't this still ring a bell? I could potentially debug this, but there is currently not enough datasheet information available about components used, especially the POE module. I couldn't even find information regarding the required POE voltage and typical current usages. But that's a bit offtopic for now.

    PS 15 minutes later without the other lan:

    [jaap@jaap ~ ]$ ping 192.168.0.2
    PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
    64 bytes from 192.168.0.2: icmp_seq=1 ttl=255 time=129 ms
    64 bytes from 192.168.0.2: icmp_seq=2 ttl=255 time=1.22 ms
    64 bytes from 192.168.0.2: icmp_seq=3 ttl=255 time=1.88 ms
    64 bytes from 192.168.0.2: icmp_seq=4 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=5 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=6 ttl=255 time=1.95 ms
    64 bytes from 192.168.0.2: icmp_seq=7 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=8 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=9 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=10 ttl=255 time=1.96 ms
    64 bytes from 192.168.0.2: icmp_seq=11 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=12 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=13 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=14 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=15 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=16 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=17 ttl=255 time=1.91 ms
    64 bytes from 192.168.0.2: icmp_seq=18 ttl=255 time=1.92 ms
    64 bytes from 192.168.0.2: icmp_seq=19 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=20 ttl=255 time=1.62 ms
    64 bytes from 192.168.0.2: icmp_seq=21 ttl=255 time=1.93 ms
    64 bytes from 192.168.0.2: icmp_seq=22 ttl=255 time=1.94 ms
    64 bytes from 192.168.0.2: icmp_seq=23 ttl=255 time=1.94 ms
    64 bytes from 192.168.0.2: icmp_seq=24 ttl=255 time=1.94 ms
    ^C
    --- 192.168.0.2 ping statistics ---
    24 packets transmitted, 24 received, 0% packet loss, time 23043ms
    rtt min/avg/max/mdev = 1.219/7.172/128.881/25.378 ms
    [jaap@jaap ~ ]$ 
    

    @jcaron said in Pygate - ethernet reliability issues:

    Otherwise it could be quite a long debug. As stated above, I would start by trying to determine if the issue is on the RX or TX side, by either switching on Ethernet on the WiPy or using a simple UDP echo sever with logging, to see if the delay (and loss) happens on one side or the other.

    Do we have some Pycom/python examples at hand to try within 30 seconds?



  • @Jaap-Crezee you may want to try a direct connection to your computer to see what happens if there’s no (or little, some computers can be quite chatty) other traffic. You would also probably get information about Ethernet issues via ifconfig stats. I would also run a tcpdump during the ping tests to see if there’s anything else going on.

    Mmmmmhhh just wondering if we’re not faced with a good old duplex negotiation issue. Haven’t seen one in nearly two decades, but a mixup between half and full duplex can lead to similar terrible performance. Not quite sure it would lead to that weird latency, though.

    Otherwise it could be quite a long debug. As stated above, I would start by trying to determine if the issue is on the RX or TX side, by either switching on Ethernet on the WiPy or using a simple UDP echo sever with logging, to see if the delay (and loss) happens on one side or the other.

    After that... there are quite a few places where there could be issues so it’s going to be quite some work to pinpoint it. I’m really puzzled by the recurring 7-8 second latency though. I have a hard time trying to guess where that could possibly come from.



  • @Gijs @peterp @jcaron I tried both wifi_on_boot & pybytes_on_boot without any difference, unfortunately.
    I am not only seeing ICMP "slowness", FTP is also very slow as is telnet which makes me suspicious about the network functionality in general:

    [jaap@jaap /data/work/gateway ]$ wget ftp://micro:python@10.0.0.99/flash/boot.py
    --2020-11-20 22:12:43--  ftp://micro:*password*@10.0.0.99/flash/boot.py
               => 'boot.py.1'
    Connecting to 10.0.0.99:21... connected.
    Logging in as micro ... Logged in!
    ==> SYST ... done.    ==> PWD ... done.
    ==> TYPE I ... done.  ==> CWD (1) /flash ... done.
    ==> SIZE boot.py ... 233
    ==> PASV ... done.    ==> RETR boot.py ... done.
    Length: 233 (unauthoritative)
    
    boot.py.1                                                                     100%[=================================================================================================================================================================================================>]     233  --.-KB/s    in 0.02s   
    
    2020-11-20 22:14:24 (14.2 KB/s) - 'boot.py.1' saved [233]
    
    [jaap@jaap /data/work/gateway ]$ 
    

    Almost 2 minutes fetching a file of 233 bytes (!?!).

    64 bytes from 10.0.0.99: icmp_seq=176 ttl=255 time=696 ms
    64 bytes from 10.0.0.99: icmp_seq=177 ttl=255 time=849 ms
    64 bytes from 10.0.0.99: icmp_seq=178 ttl=255 time=872 ms
    64 bytes from 10.0.0.99: icmp_seq=181 ttl=255 time=1681 ms
    64 bytes from 10.0.0.99: icmp_seq=183 ttl=255 time=1630 ms
    64 bytes from 10.0.0.99: icmp_seq=185 ttl=255 time=1208 ms
    64 bytes from 10.0.0.99: icmp_seq=186 ttl=255 time=1004 ms
    64 bytes from 10.0.0.99: icmp_seq=187 ttl=255 time=1301 ms
    64 bytes from 10.0.0.99: icmp_seq=190 ttl=255 time=1541 ms
    64 bytes from 10.0.0.99: icmp_seq=191 ttl=255 time=1478 ms
    64 bytes from 10.0.0.99: icmp_seq=192 ttl=255 time=1310 ms
    64 bytes from 10.0.0.99: icmp_seq=194 ttl=255 time=1319 ms
    64 bytes from 10.0.0.99: icmp_seq=195 ttl=255 time=2241 ms
    64 bytes from 10.0.0.99: icmp_seq=196 ttl=255 time=2537 ms
    64 bytes from 10.0.0.99: icmp_seq=198 ttl=255 time=1766 ms
    64 bytes from 10.0.0.99: icmp_seq=199 ttl=255 time=1489 ms
    64 bytes from 10.0.0.99: icmp_seq=200 ttl=255 time=1457 ms
    64 bytes from 10.0.0.99: icmp_seq=205 ttl=255 time=1785 ms
    64 bytes from 10.0.0.99: icmp_seq=206 ttl=255 time=2072 ms
    64 bytes from 10.0.0.99: icmp_seq=207 ttl=255 time=1369 ms
    64 bytes from 10.0.0.99: icmp_seq=208 ttl=255 time=1272 ms
    64 bytes from 10.0.0.99: icmp_seq=209 ttl=255 time=1272 ms
    64 bytes from 10.0.0.99: icmp_seq=210 ttl=255 time=1273 ms
    64 bytes from 10.0.0.99: icmp_seq=211 ttl=255 time=2153 ms
    64 bytes from 10.0.0.99: icmp_seq=212 ttl=255 time=2333 ms
    64 bytes from 10.0.0.99: icmp_seq=213 ttl=255 time=1342 ms
    64 bytes from 10.0.0.99: icmp_seq=214 ttl=255 time=1342 ms
    64 bytes from 10.0.0.99: icmp_seq=215 ttl=255 time=1342 ms
    64 bytes from 10.0.0.99: icmp_seq=216 ttl=255 time=1289 ms
    64 bytes from 10.0.0.99: icmp_seq=217 ttl=255 time=1289 ms
    64 bytes from 10.0.0.99: icmp_seq=218 ttl=255 time=1288 ms
    64 bytes from 10.0.0.99: icmp_seq=219 ttl=255 time=2212 ms
    64 bytes from 10.0.0.99: icmp_seq=220 ttl=255 time=2378 ms
    64 bytes from 10.0.0.99: icmp_seq=221 ttl=255 time=1406 ms
    64 bytes from 10.0.0.99: icmp_seq=222 ttl=255 time=1401 ms
    64 bytes from 10.0.0.99: icmp_seq=223 ttl=255 time=1410 ms
    64 bytes from 10.0.0.99: icmp_seq=224 ttl=255 time=1305 ms
    64 bytes from 10.0.0.99: icmp_seq=225 ttl=255 time=1229 ms
    64 bytes from 10.0.0.99: icmp_seq=226 ttl=255 time=1240 ms
    64 bytes from 10.0.0.99: icmp_seq=227 ttl=255 time=2055 ms
    64 bytes from 10.0.0.99: icmp_seq=228 ttl=255 time=2113 ms
    64 bytes from 10.0.0.99: icmp_seq=231 ttl=255 time=1369 ms
    64 bytes from 10.0.0.99: icmp_seq=232 ttl=255 time=1248 ms
    64 bytes from 10.0.0.99: icmp_seq=233 ttl=255 time=1246 ms
    64 bytes from 10.0.0.99: icmp_seq=234 ttl=255 time=1245 ms
    64 bytes from 10.0.0.99: icmp_seq=235 ttl=255 time=2209 ms
    64 bytes from 10.0.0.99: icmp_seq=239 ttl=255 time=1232 ms
    64 bytes from 10.0.0.99: icmp_seq=241 ttl=255 time=1216 ms
    64 bytes from 10.0.0.99: icmp_seq=242 ttl=255 time=1211 ms
    64 bytes from 10.0.0.99: icmp_seq=247 ttl=255 time=1991 ms
    64 bytes from 10.0.0.99: icmp_seq=248 ttl=255 time=1860 ms
    64 bytes from 10.0.0.99: icmp_seq=249 ttl=255 time=1487 ms
    ^C
    --- 10.0.0.99 ping statistics ---
    251 packets transmitted, 149 received, +1 duplicates, 40.6375% packet loss, time 255131ms
    rtt min/avg/max/mdev = 1.551/941.478/2536.884/553.739 ms, pipe 3
    [jaap@jaap ~ ]$ 
    

    Here is my new boot.py

    # boot.py -- run on boot-up
    
    import pycom
    import machine
    
    from network import ETH
    
    pycom.wifi_on_boot(False)
    pycom.pybytes_on_boot(False)
    
    e = ETH()
    e.ifconfig(config=('10.0.0.99', '255.255.0.0', '10.0.0.1', '10.0.0.1'))
    

    I can hardly believe this problem could be solved by adding more code, like a fully configured LoraWan gateway.
    Having said that, if that does help anyway, we are seeing some kind of race- or timing-condition issue.
    Either one should probably be investigated in more detail, identified, and fixed somehow. Because currently, this prevents me from determining if the Pycom hardware (& software) would be a viable solution for us; it should provide stable LoraWan gateway functionality.

    PS the switch is a "normal" unmanaged TP-Link TL-SG103 8 port 10/100/1000Mbps switch which has proven to be stable for any device which I connected to it the last two years until I connected this pygate + pycom + poe module.
    I have tried multiple cables which are known to be okay.
    PS2 I somewhat recognize this behavior from previous projects involving an Arduino & ENC28J60 SPI ethernet Module...


  • Global Moderator

    We had some good experiences with that, but it did not resolve the issue completely. We also tried simulating a noisy network here, but that did not 'improve' our slow-ping time starting delay unfortunately
    Let me know of your results!
    Gijs



  • Hi @Gijs @peterp @jcaron ,

    Thank you for your inputs. I will try the two disable commands. Could it also be related to receiving a lot of UDP broadcast messages?
    I am on a noisy network. I could also try moving to a less noisy network (although I would argue that a device should not get slower by receiving some broadcasts; we are talking about 0.5-1kbyte/s here.). And we are not always in control of the networks we should be connecting to.
    I will get back once I have some results.


  • Global Moderator

    @Jaap-Crezee We are (sort of) aware of this issue and we can reproduce it, but we have not been able to ' trigger' slow ping times on demand. My colleague @peterp looked into this for a long time, but we could not pinpoint the cause of it. In our case, it takes hours or even days before it gets into this state when pinging the device. It seems yours gets in this state of slow ping responses relatively quickly. Now the only difference between your setup and ours, is in the network infrastructure (I believe), making me believe the issue can be found there.

    The suggestions of @jcaron sound like a good place to start (I think we evaluated that as well, but I dont remember the outcome exactly). We tried a whole assortment of solutions but none have resulted in a proper triggered reproduction.

    Note that we are not able to reproduce it when using the product as it was meant to be (A LoRa gateway connected through wired internet / PoE), only when we are throwing pings at it.

    Let me know if you have any suggestions!
    Gijs



  • @Jaap-Crezee Have you tried setting wifi_on_boot(false)? Though this could possibly explain the initial packet loss, but probably not the 8 second latency and later packet loss!

    Also I believe Pygate builds don't include Pybytes, but if they do, pybytes_on_boot(false) would be a good idea.

    It would probably be interesting to run a simple UDP client+server to check in which direction the packet loss/latency occurs.

    Is the Ethernet switch managed? Do you see any errors or link bounces on that?



  • @Gijs Dear Pycom,

    I have got a new Pygate, Wipy & POE module connected and fully updated to the latest (stable) firmware as of this evening.
    I am using a 5V 2.5A USB power supply connected to the USB-C port of the PyGate.
    There is an ethernet connection to a 1Gbps switch on my desk which works perfectly fine with other devices.
    I am having this code in my device:

    [jaap@jaap ~ ]$ curl ftp://micro:python@10.0.0.99/flash/boot.py
    # boot.py -- run on boot-up
    
    from network import ETH
    e = ETH()
    e.ifconfig(config=('10.0.0.99', '255.255.0.0', '10.0.0.1', '10.0.0.1'))
    
    [jaap@jaap ~ ]$ curl ftp://micro:python@10.0.0.99/flash/main.py
    # main.py -- put your code here!
    

    No more. No gateway yet.

    Now I am seeing this erratic behaviour:

    [jaap@jaap ~ ]$ ping 10.0.0.99
    PING 10.0.0.99 (10.0.0.99) 56(84) bytes of data.
    64 bytes from 10.0.0.99: icmp_seq=6 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=7 ttl=255 time=1.88 ms
    64 bytes from 10.0.0.99: icmp_seq=20 ttl=255 time=35.6 ms
    64 bytes from 10.0.0.99: icmp_seq=21 ttl=255 time=2.23 ms
    64 bytes from 10.0.0.99: icmp_seq=22 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=23 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=24 ttl=255 time=1.90 ms
    64 bytes from 10.0.0.99: icmp_seq=25 ttl=255 time=1.96 ms
    64 bytes from 10.0.0.99: icmp_seq=26 ttl=255 time=1.77 ms
    64 bytes from 10.0.0.99: icmp_seq=27 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=28 ttl=255 time=1.92 ms
    64 bytes from 10.0.0.99: icmp_seq=29 ttl=255 time=1.95 ms
    64 bytes from 10.0.0.99: icmp_seq=30 ttl=255 time=1.90 ms
    64 bytes from 10.0.0.99: icmp_seq=31 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=32 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=33 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=34 ttl=255 time=1.94 ms
    64 bytes from 10.0.0.99: icmp_seq=35 ttl=255 time=1.90 ms
    64 bytes from 10.0.0.99: icmp_seq=36 ttl=255 time=1.92 ms
    64 bytes from 10.0.0.99: icmp_seq=37 ttl=255 time=12.0 ms
    64 bytes from 10.0.0.99: icmp_seq=38 ttl=255 time=34.9 ms
    64 bytes from 10.0.0.99: icmp_seq=39 ttl=255 time=57.0 ms
    64 bytes from 10.0.0.99: icmp_seq=40 ttl=255 time=80.0 ms
    64 bytes from 10.0.0.99: icmp_seq=41 ttl=255 time=7.20 ms
    64 bytes from 10.0.0.99: icmp_seq=42 ttl=255 time=5.80 ms
    64 bytes from 10.0.0.99: icmp_seq=44 ttl=255 time=142 ms
    64 bytes from 10.0.0.99: icmp_seq=45 ttl=255 time=6.89 ms
    64 bytes from 10.0.0.99: icmp_seq=46 ttl=255 time=188 ms
    64 bytes from 10.0.0.99: icmp_seq=47 ttl=255 time=7378 ms
    64 bytes from 10.0.0.99: icmp_seq=48 ttl=255 time=7243 ms
    64 bytes from 10.0.0.99: icmp_seq=49 ttl=255 time=7204 ms
    64 bytes from 10.0.0.99: icmp_seq=55 ttl=255 time=7971 ms
    64 bytes from 10.0.0.99: icmp_seq=59 ttl=255 time=7729 ms
    64 bytes from 10.0.0.99: icmp_seq=60 ttl=255 time=7906 ms
    64 bytes from 10.0.0.99: icmp_seq=62 ttl=255 time=7935 ms
    64 bytes from 10.0.0.99: icmp_seq=72 ttl=255 time=7772 ms
    64 bytes from 10.0.0.99: icmp_seq=73 ttl=255 time=8232 ms
    64 bytes from 10.0.0.99: icmp_seq=74 ttl=255 time=8203 ms
    64 bytes from 10.0.0.99: icmp_seq=75 ttl=255 time=8165 ms
    64 bytes from 10.0.0.99: icmp_seq=76 ttl=255 time=8636 ms
    64 bytes from 10.0.0.99: icmp_seq=77 ttl=255 time=8595 ms
    64 bytes from 10.0.0.99: icmp_seq=78 ttl=255 time=8449 ms
    64 bytes from 10.0.0.99: icmp_seq=82 ttl=255 time=8755 ms
    64 bytes from 10.0.0.99: icmp_seq=89 ttl=255 time=8752 ms
    64 bytes from 10.0.0.99: icmp_seq=90 ttl=255 time=8712 ms
    64 bytes from 10.0.0.99: icmp_seq=92 ttl=255 time=8642 ms
    64 bytes from 10.0.0.99: icmp_seq=95 ttl=255 time=8353 ms
    64 bytes from 10.0.0.99: icmp_seq=96 ttl=255 time=8306 ms
    64 bytes from 10.0.0.99: icmp_seq=97 ttl=255 time=8159 ms
    64 bytes from 10.0.0.99: icmp_seq=100 ttl=255 time=8235 ms
    64 bytes from 10.0.0.99: icmp_seq=101 ttl=255 time=8275 ms
    From 10.0.1.3 icmp_seq=109 Destination Host Unreachable
    From 10.0.1.3 icmp_seq=110 Destination Host Unreachable
    From 10.0.1.3 icmp_seq=111 Destination Host Unreachable
    64 bytes from 10.0.0.99: icmp_seq=105 ttl=255 time=8333 ms
    64 bytes from 10.0.0.99: icmp_seq=106 ttl=255 time=8304 ms
    64 bytes from 10.0.0.99: icmp_seq=107 ttl=255 time=8254 ms
    64 bytes from 10.0.0.99: icmp_seq=112 ttl=255 time=10331 ms
    64 bytes from 10.0.0.99: icmp_seq=113 ttl=255 time=9633 ms
    ^C
    --- 10.0.0.99 ping statistics ---
    122 packets transmitted, 56 received, +3 errors, 54.0984% packet loss, time 123939ms
    rtt min/avg/max/mdev = 1.770/4161.969/10331.318/4164.290 ms, pipe 11
    [jaap@jaap ~ ]$ 
    

    How to debug this?


  • Global Moderator

    Good to hear the issue got solved!
    I believe there are still people out there that do have problems with the PyEthernet dropping the connection, so thanks for your take on the situation!
    Also good to hear the TTN configuration works better for you. I will work on putting this in the docs!
    Thanks
    Gijs



  • Hi @Gijs,
    I have now also found the solution for the Ethernet reliability problem.
    It's because of the WiFi that I use to ping. I suspect the Ubiquiti Nano HD AccessPoint as the cause. If I send a ping via Ethernet, this is always answered 100% by the PyGate. Even after several days of continuous operation. At first I was not aware of the problem with the access point, because the ping losses occurred from several computers, all of which were connected via WLAN. Only after I ran the ping tests from a stationary computer did I no longer have any problems.
    So sorry for the noises ...

    Best wishes
    Peter



  • Hi @gijs,
    Please excuse the late response, but I wanted to make sure everything works now.
    As you recommended, I used the config.json for EU868 from the TTN for my PyGate. So it seems to work perfectly so far. The devices I tested need about 5-10 seconds to join the TTN via OTAA.
    Thank you very much for your support.

    Best wishes
    Peter

    {
    	"SX1301_conf": {
    		"lorawan_public": true,
    		"clksrc": 1,
    		"clksrc_desc": "radio_1 provides clock to concentrator for most devices except MultiTech. For MultiTech set to 0.",
    		"antenna_gain": 2,
    		"antenna_gain_desc": "antenna gain, in dBi",
    		"radio_0": {
    			"enable": true,
    			"type": "SX1257",
    			"freq": 867500000,
    			"rssi_offset": -166.0,
    			"tx_enable": true,
    			"tx_freq_min": 863000000,
    			"tx_freq_max": 870000000
    		},
    		"radio_1": {
    			"enable": true,
    			"type": "SX1257",
    			"freq": 868500000,
    			"rssi_offset": -166.0,
    			"tx_enable": false
    		},
    		"chan_multiSF_0": {
    			"desc": "Lora MAC, 125kHz, all SF, 868.1 MHz",
    			"enable": true,
    			"radio": 1,
    			"if": -400000
    		},
    		"chan_multiSF_1": {
    			"desc": "Lora MAC, 125kHz, all SF, 868.3 MHz",
    			"enable": true,
    			"radio": 1,
    			"if": -200000
    		},
    		"chan_multiSF_2": {
    			"desc": "Lora MAC, 125kHz, all SF, 868.5 MHz",
    			"enable": true,
    			"radio": 1,
    			"if": 0
    		},
    		"chan_multiSF_3": {
    			"desc": "Lora MAC, 125kHz, all SF, 867.1 MHz",
    			"enable": true,
    			"radio": 0,
    			"if": -400000
    		},
    		"chan_multiSF_4": {
    			"desc": "Lora MAC, 125kHz, all SF, 867.3 MHz",
    			"enable": true,
    			"radio": 0,
    			"if": -200000
    		},
    		"chan_multiSF_5": {
    			"desc": "Lora MAC, 125kHz, all SF, 867.5 MHz",
    			"enable": true,
    			"radio": 0,
    			"if": 0
    		},
    		"chan_multiSF_6": {
    			"desc": "Lora MAC, 125kHz, all SF, 867.7 MHz",
    			"enable": true,
    			"radio": 0,
    			"if": 200000
    		},
    		"chan_multiSF_7": {
    			"desc": "Lora MAC, 125kHz, all SF, 867.9 MHz",
    			"enable": true,
    			"radio": 0,
    			"if": 400000
    		},
    		"chan_Lora_std": {
    			"desc": "Lora MAC, 250kHz, SF7, 868.3 MHz",
    			"enable": true,
    			"radio": 1,
    			"if": -200000,
    			"bandwidth": 250000,
    			"spread_factor": 7
    		},
    		"chan_FSK": {
    			"desc": "FSK 50kbps, 868.8 MHz",
    			"enable": true,
    			"radio": 1,
    			"if": 300000,
    			"bandwidth": 125000,
    			"datarate": 50000
    		},
    		"tx_lut_0": {
    			"desc": "TX gain table, index 0",
    			"pa_gain": 0,
    			"mix_gain": 8,
    			"rf_power": -6,
    			"dig_gain": 0
    		},
    		"tx_lut_1": {
    			"desc": "TX gain table, index 1",
    			"pa_gain": 0,
    			"mix_gain": 10,
    			"rf_power": -3,
    			"dig_gain": 0
    		},
    		"tx_lut_2": {
    			"desc": "TX gain table, index 2",
    			"pa_gain": 0,
    			"mix_gain": 12,
    			"rf_power": 0,
    			"dig_gain": 0
    		},
    		"tx_lut_3": {
    			"desc": "TX gain table, index 3",
    			"pa_gain": 1,
    			"mix_gain": 8,
    			"rf_power": 3,
    			"dig_gain": 0
    		},
    		"tx_lut_4": {
    			"desc": "TX gain table, index 4",
    			"pa_gain": 1,
    			"mix_gain": 10,
    			"rf_power": 6,
    			"dig_gain": 0
    		},
    		"tx_lut_5": {
    			"desc": "TX gain table, index 5",
    			"pa_gain": 1,
    			"mix_gain": 12,
    			"rf_power": 10,
    			"dig_gain": 0
    		},
    		"tx_lut_6": {
    			"desc": "TX gain table, index 6",
    			"pa_gain": 1,
    			"mix_gain": 13,
    			"rf_power": 11,
    			"dig_gain": 0
    		},
    		"tx_lut_7": {
    			"desc": "TX gain table, index 7",
    			"pa_gain": 2,
    			"mix_gain": 9,
    			"rf_power": 12,
    			"dig_gain": 0
    		},
    		"tx_lut_8": {
    			"desc": "TX gain table, index 8",
    			"pa_gain": 1,
    			"mix_gain": 15,
    			"rf_power": 13,
    			"dig_gain": 0
    		},
    		"tx_lut_9": {
    			"desc": "TX gain table, index 9",
    			"pa_gain": 2,
    			"mix_gain": 10,
    			"rf_power": 14,
    			"dig_gain": 0
    		},
    		"tx_lut_10": {
    			"desc": "TX gain table, index 10",
    			"pa_gain": 2,
    			"mix_gain": 11,
    			"rf_power": 16,
    			"dig_gain": 0
    		},
    		"tx_lut_11": {
    			"desc": "TX gain table, index 11",
    			"pa_gain": 3,
    			"mix_gain": 9,
    			"rf_power": 20,
    			"dig_gain": 0
    		},
    		"tx_lut_12": {
    			"desc": "TX gain table, index 12",
    			"pa_gain": 3,
    			"mix_gain": 10,
    			"rf_power": 23,
    			"dig_gain": 0
    		},
    		"tx_lut_13": {
    			"desc": "TX gain table, index 13",
    			"pa_gain": 3,
    			"mix_gain": 11,
    			"rf_power": 25,
    			"dig_gain": 0
    		},
    		"tx_lut_14": {
    			"desc": "TX gain table, index 14",
    			"pa_gain": 3,
    			"mix_gain": 12,
    			"rf_power": 26,
    			"dig_gain": 0
    		},
    		"tx_lut_15": {
    			"desc": "TX gain table, index 15",
    			"pa_gain": 3,
    			"mix_gain": 14,
    			"rf_power": 27,
    			"dig_gain": 0
    		}
    	},
    	"gateway_conf": {
    		"gateway_ID": "<FILL ME IN>",
    		"server_address": "router.eu.thethings.network",
    		"serv_port_up": 1700,
    		"serv_port_down": 1700,
            "keepalive_interval": 10,
    		"stat_interval": 30,
    		"push_timeout_ms": 100,
    		"forward_crc_valid": true,
    		"forward_crc_error": false,
    		"forward_crc_disabled": false,
    		"servers": [ {
    			"server_address": "router.eu.thethings.network",
    			"serv_port_up": 1700,
    			"serv_port_down": 1700,
    			"serv_enabled": true
    		} ]
    	}
    }
    
    


  • @Gijs: Ok, understood. I'll update my PyGate in a few minutes and see the what's going on. :-)

    Thanks, Peter


  • Global Moderator

    Sorry for the confusion!
    The configuration files are very similar in the end, but I feel like the configuration provided by TTN is slightly better than the one we created ourselves, but let me know of your experiences! They also have examples for more regions than we do currently ;)

    Best,
    Gijs



  • @Gijs: I'm confused now... :-)
    Which config.json should I use?
    The one mentioned in your last reply with the additions or this one from your documentation?

    Thanks, Peter


  • Global Moderator

    @tronto The PyEthernet has a 10/100 Mbps ethernet (in 10 mode) controller that talks over an SPI bus at 8MHz. In that datalink, it is probable some incoming packets will get dropped. Ive had mine run over the weekend and it is still connected to TTN (on a netgear gigabit switch). I have heard of the possibility that using a 10 Mbps device on a switch could drop all the traffic through the switch to 10 Mbps (I have yet to see a reproduction with the PyEthernet though). Another issue can be related to the single jumper on the PyEthernet (close to the Ethernet port). Some (older) PoE adapters require a minimum amount of current to be drawn or they will switch off. The Pygate is generallly below the minimum, causing the PoE to dropout for a second, inserting the jumper will connect a default resistor load.

    @schmelpe, I looked into the OTAA issue myself as well, and could only reproduce it (as the original problem states) by bodging of frequency settings.. Moreover, the OTAA join accept is still send out when the logs mention 'Ignored not rejected'. If you are experiencing LoRa OTAA issues at EU868MHz, with the frequency settings set correctly on all devices, please do let us know! (nb. Ill take up the firmware versioning with the firmware team, seems they missed to upload it!)

    I am currently working on testing different frequency settings using the configuration files from here: https://github.com/TheThingsNetwork/gateway-conf and adding

    		"gateway_ID": "XXXX",
    		"keepalive_interval": 10,
    		"stat_interval": 30,
    		"push_timeout_ms": 100,
    		"forward_crc_valid": true,
    		"forward_crc_error": false,
    		"forward_crc_disabled": false
    

    at the bottom of the config file, within the closing brackets. Let me know if that works better for you! (ill add it to the other thread as well)



  • Hi @Gijs ,
    no, I don't think so.
    Please look at this Thread: https://forum.pycom.io/topic/6206/cannot-make-join-otaa-timely-with-pygate-and-lopy4.
    I'm using a WiPy3 with firmware 1.20.2rc10 and have exact the same problem.
    OTAA JOIN request over Pygate do not work for me (EU868), OTAA JOIN requests with my other TTN Gateway are working fine within two or three attempts.
    You'll find my code in the this thread.
    config.json:

    {
    	"SX1301_conf": {
    		"lorawan_public": true,
    		"clksrc": 1,
    		"antenna_gain": 0,
    		"radio_0": {
    			"enable": true,
    			"type": "SX1257",
    			"freq": 867500000,
    			"rssi_offset": -164.0,
    			"tx_enable": true,
    			"tx_freq_min": 863000000,
    			"tx_freq_max": 870000000
    		},
    		"radio_1": {
    			"enable": true,
    			"type": "SX1257",
    			"freq": 868500000,
    			"rssi_offset": -164.0,
    			"tx_enable": false
    		},
    		"chan_multiSF_0": {
    			"enable": true,
    			"radio": 1,
    			"if": -400000
    		},
    		"chan_multiSF_1": {
    			"enable": true,
    			"radio": 1,
    			"if": -200000
    		},
    		"chan_multiSF_2": {
    			"enable": true,
    			"radio": 1,
    			"if": 0
    		},
    		"chan_multiSF_3": {
    			"enable": true,
    			"radio": 0,
    			"if": -400000
    		},
    		"chan_multiSF_4": {
    			"enable": true,
    			"radio": 0,
    			"if": -200000
    		},
    		"chan_multiSF_5": {
    			"enable": true,
    			"radio": 0,
    			"if": 0
    		},
    		"chan_multiSF_6": {
    			"enable": true,
    			"radio": 0,
    			"if": 200000
    		},
    		"chan_multiSF_7": {
    			"enable": true,
    			"radio": 0,
    			"if": 400000
    		},
    		"chan_Lora_std": {
    			"enable": true,
    			"radio": 1,
    			"if": -200000,
    			"bandwidth": 250000,
    			"spread_factor": 7
    		},
    		"chan_FSK": {
    			"enable": true,
    			"radio": 1,
    			"if": 300000,
    			"bandwidth": 125000,
    			"datarate": 50000
    		},
    		"tx_lut_0": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 9,
    			"dig_gain": 3
    		},
    		"tx_lut_1": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 9,
    			"dig_gain": 3
    		},
    		"tx_lut_2": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 9,
    			"dig_gain": 3
    		},
    		"tx_lut_3": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 9,
    			"dig_gain": 3
    		},
    		"tx_lut_4": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 9,
    			"dig_gain": 3
    		},
    		"tx_lut_5": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 9,
    			"dig_gain": 3
    		},
    		"tx_lut_6": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 9,
    			"dig_gain": 3
    		},
    		"tx_lut_7": {
    			"pa_gain": 0,
    			"mix_gain": 6,
    			"rf_power": 11,
    			"dig_gain": 3
    		},
    		"tx_lut_8": {
    			"pa_gain": 0,
    			"mix_gain": 5,
    			"rf_power": 13,
    			"dig_gain": 2
    		},
    		"tx_lut_9": {
    			"pa_gain": 0,
    			"mix_gain": 8,
    			"rf_power": 14,
    			"dig_gain": 3
    		},
    		"tx_lut_10": {
    			"pa_gain": 0,
    			"mix_gain": 6,
    			"rf_power": 15,
    			"dig_gain": 2
    		},
    		"tx_lut_11": {
    			"pa_gain": 0,
    			"mix_gain": 6,
    			"rf_power": 16,
    			"dig_gain": 1
    		},
    		"tx_lut_12": {
    			"pa_gain": 0,
    			"mix_gain": 9,
    			"rf_power": 17,
    			"dig_gain": 3
    		},
    		"tx_lut_13": {
    			"pa_gain": 0,
    			"mix_gain": 10,
    			"rf_power": 18,
    			"dig_gain": 3
    		},
    		"tx_lut_14": {
    			"pa_gain": 0,
    			"mix_gain": 11,
    			"rf_power": 19,
    			"dig_gain": 3
    		},
    		"tx_lut_15": {
    			"pa_gain": 0,
    			"mix_gain": 12,
    			"rf_power": 20,
    			"dig_gain": 3
    		}
    	},
    
    	"gateway_conf": {
    		"gateway_ID": "myID",
    		"server_address": "router.eu.thethings.network",
    		"serv_port_up": 1700,
    		"serv_port_down": 1700,
    		"keepalive_interval": 10,
    		"stat_interval": 30,
    		"push_timeout_ms": 100,
    		"forward_crc_valid": true,
    		"forward_crc_error": false,
    		"forward_crc_disabled": false
    	}
    }
    
    

    I'll reflash the actual firmware 1.20.2 today, but I think it won't help because it's the same as 1.20.2rc10.
    There is no new firmware for the PyGate available. 1.20.2rc10 seems to be the last one.

    Best regards
    Peter



  • @Gijs My issue seems to be related to ethernet communication. I don't seem to have the same issue using wifi for communication (and power from POE). Have tried two different brands of switch (ubnt + cisco) with same results with eventual ethernet drop out.

    I believe both switches saw the POE module as 10FDX if that makes any difference.


  • Global Moderator

    Hi,
    I believe the issue is not heat related (I just tested the Pygate (with the lora concentrator disabled)) and PyEthernet. Applied heat with a heatgun and checked the ping times. I was not able to find any differences, with heat applied of about 100*C.

    Now I did some further testing, and sometimes pings are not getting through with the Gateway enabled or disabled. I cannot seem to reproduce the case of @tronto Where the ping times significantly increase after a few timeouts.

    Pings are generally <10ms on our network, with few going above that..

    Also, I see no relation in the lost pings compared to the Pygate dropping any lora packets, but please do let us know of your findings!

    Btw @schmelpe, you mention OTAA not joining being a known Pygate problem, could you elaborate? I believe the issue you are referring to has to do with frequency plan settings not being in sync..

    Best,
    Gijs


Log in to reply
 

Pycom on Twitter