LoPy not obeying data rate requests from network

jcaron

Hi all,

I have an issue with LoPys seemingly not obeying LoRaWAN ADR data rate requests.

I just tested with a LoPy running 1.16.0.b1 (but the problem seems to date at least as far back as 1.9.2.b2). The network requests an SF12 data rate, and the LoPy acknowledges it, but it continues to send at SF7.

I do have adr=True set in the LoRa constructor, and I don't explicitly set the data rate when sending.

This is without even going to deep sleep or using the nvram_save/nvram_restore methods.

It seems this was already raised here: https://forum.pycom.io/topic/2493/experience-with-lopy-and-adr-adaptive-data-rate but there was no response.

Has anyone else had similar issues, or found a solution?

Thanks!

jcaron

Hi @daniel, just wanted to make sure these two issues will be addressed in the next release:

save/restore channel mask
fix MAC command buffers restore

Let me know if you need/want issues filed in GitHub and/or PRs.

jcaron

Oh BTW @daniel you fell into the same trap I did... :-( Sadly UpLinkFrequency is not filled out and is always 0 so lora.stats() does not reflect the actual TX frequency. It's quite a bit more difficult to dig it out, however retrieving the channel should be somewhat easier.

jcaron

@daniel for the MAC command save/restore across deep sleep, the issue is that in modlora.c, when the MAC command buffers are restored, the length is set to 15 before calling modlora_nvs_get_blob. However, the buffers are 128 bytes long, so those calls fail, and the buffers are (silently) not restored.

Easy fix: set length to 128 (it should really be LORA_MAC_COMMAND_MAX_LENGTH, so it should probably be retrieved together with the buffer pointers) before each of the calls to modlora_nvs_get_blob.

jcaron

Regarding the delay it takes to send a frame, it looks like it's the network sending a 5 second delay in the Join Accept message. I'll take that one up with them.

So the open issues are:

channels mask save/restore
issue with LinkADRAck sent after a save/sleep/restore cycle.

Continuing my investigations on that one...

jcaron

This post is deleted!

jcaron

Also confirmed that if a LinkADRAns is queued before sleep, there's an issue when it is sent after sleep. The network sees there was a MAC command, but states "unknown" instead of the decoded MAC command, so I suppose there's a problem with it.

Haven't quite traced the details of this yet, though. The relevant data seems to be correctly saved to and restored from NVRAM, but it must somehow get overwritten somewhere between the restore and the actual send.

jcaron

Just confirmed the channels issue.

LoPy 1, EU868 region. Starts with the default channels (868.1, 868.3, 868.5). The network sends a CFList in the Join Accept adding three channels (865.1, 865.3, 865.5).

Just after join, sending frames uses all 6 channels.

Just do nvram_save() and nvram_restore() and the LoPy reverts to sending only on the original channels.

It appears even though the channels themselves are saved and restored, the channel mask isn't, and is reset to the default 3 channels.

Saving the channels mask is non trivial, as the size of the channels mask is region-dependent and there does not seem to be an interface to get it at the moment, so it requires changes to all regions to feed back the information to the relevant places...

jcaron

Not directly related to the original problem, but it seems there's an issue again with keeping the frequencies received from the network. Not sure yet if that's related to deep sleep or not, and haven't confirmed the details, but network logs only show the three basic frequencies (for EU868) since I upgraded.

Any indication of where this could be coming from?

If that's indeed again a firmware problem, this would reinforce my remark above: making sure the LoRaWAN stack is fully tested and passes compliance tests should really be part of the release process to avoid all these issues.

jcaron

Hi @daniel, thanks for the update.

That's indeed an approach. I feel the LoRaWAN spec is quite unclear as to what "ADR enabled" means (it basically says either the node or the network can decide to use it), but it working when adr is set would probably work for me. I haven't checked, if adr is not set, would the stack reject a LinkADRReq for a different data rate, or acknowledge it but ignore it?

Also, even if adr is set, there are probably scenarios where you would want to set an initial data rate but then let the device/network control it up and down.
While you're in lora.stats, adding TX time on air, TX power and TX frequency would be useful. The first two are easy, the last one seems a little bit tricker than I expected. Adding whether a MAC command was received in response could also be useful.
Using SF12, it always takes about 8 seconds, and a bit more if there's a downlink in response (8.5 and 8.7 seconds depending on the received downlink, I haven't quite checked if that's consistent with the downlink length). Unless I'm mistaken, it should be about 1.5 seconds for the uplink, up to 2 seconds for RX2, plus the downlink reception time if any.
Haven't checked the details, but the network showed be having received a MAC command in the following uplink (after save/sleep/restore), but it wasn't properly decoded. I'll try it again later and let you know the details.

A few things that could be helpful:

add an option on the LoRa constructor to enable debug mode, and log (just printf) the raw packets received/sent with all details (SF/DR, frequency, power, and full packet hex dump...) when the mode is enabled.
I would pass the equivalent of lora.events(), and possibly lora.stats() directly as a parameter to the callback. LoRa is slow enough that we shouldn't quite risk a race condition, but still...
add callbacks for successful join and MAC command received.
maybe an option to read the current data rate (getsockopt maybe, but it should retrieve the actual data rate used by LoRaMac, not the one set via setsockopt).

An important remark: the Semtech stack is tested and certified, but there have been quite a few instances where higher-level issues (deep sleep, this data rate issue...) have broken LoRaWAN compliance. Not sure how that works (i.e. if that involves $$$), but making sure releases undergo actual compliance tests (both with or with deep sleep, and in the case of deep sleep, both the "external" (Deep Sleep Shield / PyTrack / PySense) and "internal" deep sleep) would help make sure the modules actually work as intended in that respect. With all the additional regions that's quite a lot of tests, but it would definitely help.

Thanks!

daniel

@jcaron thanks for the detailed feedback!

I have a patch ready with the following fixes:

Correct ADR functionality by simply doing the following in modlora.c:

-                        // set the data rate before checking if Tx is possible
-                        mibReq.Type = MIB_CHANNELS_DATARATE;
-                        mibReq.Param.ChannelsDatarate = task_cmd_data.info.tx.dr;
-                        LoRaMacMibSetRequestConfirm( &mibReq );
+                        if (!lora_obj.adr) {
+                            // set the data rate before checking if Tx is possible
+                            mibReq.Type = MIB_CHANNELS_DATARATE;
+                            mibReq.Param.ChannelsDatarate = task_cmd_data.info.tx.dr;
+                            LoRaMacMibSetRequestConfirm( &mibReq );
+                        }

sftx value in lora.stats() also updated for unconfirmed frames.
For me Tx time is correct, but I have added a small patch to improve it a bit, specially when a downlink is received.
The LoRa join logic when the DR is forced is fixed as well, it will attempt the selected data rate and will lower it on the following attempts as specified by the LoRaWAN specification.
What's the issue with NVRAM save/restore?

Cheers,
Daniel

jcaron

An alternative to the method in the previous post could be (and that would probably be a lot cleaner) to have setsockopt of the data rate just send the MIB_CHANNELS_DATARATE MIB set request, rather than store that data rate in the socket and pass it along with every send. Haven't quite explored the ramifications of such a change.

jcaron

OK, so I checked a bit what happens in the code, and there is indeed an issue in esp32/mods/modlora.c:

in lora_socket_socket, a default data rate is attached to the socket upon creation (DR5/SF7 for the EU868 region)
this data rate is passed along when sending
in TASK_LoRa, case E_LORA_CMD_LORAWAN_TX, the data rate is sent to the LoRaMAC code in all cases via a MIB_CHANNELS_DATARATE MIB set request

The end result is that if you don't specify a data rate at all, it will be reset to the default for each send whatever the network says (in LinkADRReq MAC commands) or if the network conditions dictate it (ADR_ACK_LIMIT + ADR_ACK_DELAY uplinks without a downlink). Likewise, if you set a data rate via setsockopt, it will be set forever.

My solution is to:

set the default data rate in lora_socket_socket to special value 0xff
in TASK_LoRa, avoid the MIB_CHANNELS_DATARATE MIB set request if the data rate is set to 0xff, and reset the data rate to 0xff afterwards (so the setsockopt only applies to the next uplink, not all)
in LoRaMac.c:LoRaMacMcpsRequest, add a few checks for 0xff as well

The LoRaMAC code still has it's own default data rate. So if you don't set anything, you initially get DR0/SF12, until the network says otherwise (LinkADRReq). If you set a data rate via setsockopt (or the data rate is changed by the network via LinkADRReq), it will stay that way until the network says otherwise or the ADR_ACK_LIMIT + ADR_ACK_DELAY limit is reached (move up one DR), if the data rate is not already SF12.

Let me know if you want an issue filed in GitHub and/or a pull request for those changes.

By the way, while you're in there:

the logic for data rates in joins seems quite broken as well. Like for regular sends, it will stay stuck at the DR specified in the lora.join call, or DR5/SF7 if you don't specify anything, rather than move to different data rates.
the sftx value in lora.stats() is valid only for raw LoRa and confirmed LoRaWAN frames. It's not updated for unconfirmed LoRaWAN frames.
it could be useful to report the TX power, frequency and time on air as well.
I don't understand why it takes 7-8 seconds to send a LoRaWAN frame, even a few bytes at SF7. It should take less than 3 seconds at SF7, and less than 4 at SF12.
it looks there's till an issue with nvram save/restore of ACKs of MAC commands.

Let me know if you want issues filed in GitHub for those problems.

Explore Pybytes | Official Documentation | Report a Firmware Bug/Issue | GitHub

LoPy not obeying data rate requests from network

Pycom on Twitter