LTE Modem Lockup. All methods throw "OSError: the requested operation failed"



  • Hi have been struggling to get reliable LTE communications for several months now, and it would appear from other posts, so have many others. Whilst having worked around many of the issues (including severe lack of pycom documentation), I have now pinned down at least one major problem that in my opinion renders the LTE module unusable in a commercial product.

    The basic issue I'm encountering is as follows. At some point (for no apparent reason) virtually ALL LTE methods, except init() start throwing OSError exceptions ("the requested operation failed"), including LTE.deinit() and LTE.reset(). The only solution I have found is machine.reset() which is not really a valid solution in a commercial product. Having trawled through the C source code, it would appear this would happen if the Modem stops responding to the uart so the lte methods can do nothing but abort by throwing the exception.

    Initially I thought this may have been a result of other code in my application (i.e, multiple threads, heavy CANBUS traffic/interrupts, etc because some of the LTE methods do interfere with interrupts and background tasks by blocking all threads. I have however stripped away all but the essential LTE code and have included a basic script below that can hopefully demonstrate the issue. The program can run anywhere from 10 mins to several hours before it then fails catastrophically. Yes I know I could catch the exception but other than machine.reset() I have found no other way of recovering from the situation. There are a multitude of reasons why machine.reset() is not an acceptable solution (which I don't think I need to go into here).

    My test Script below, is fairly basic if you strip away all the print statements, it essentially connects the LTE, sends a tiny UDP packet (to an echo server), waits for a reply, disconnects and powers down the modem and then sleeps (time.sleep()) for approx 1 minute and then starts all over again.

    Please note: you will need to set the UDP echo server IP address to a valid echo server before running this script if you want to receive a reply, although being UDP, not getting a reply is quite valid and the LTE module should handle this properly anyway (i.e. not throw exceptions).

    from network import LTE
    import socket
    import time
    
    def init_lte():    
      print("Initialising LTE ... ", end='')        
      global lte          
      if lte:            
        lte.init(psm_period_value=1, psm_period_unit=LTE.PSM_PERIOD_1H, 
                 psm_active_value=5, psm_active_unit=LTE.PSM_ACTIVE_2S )    
      else:
        lte = LTE(psm_period_value=1, psm_period_unit=LTE.PSM_PERIOD_1H, 
                  psm_active_value=5, psm_active_unit=LTE.PSM_ACTIVE_2S )                                                                                          
      print("OK")            
    
    def attach_lte():
      print("Attaching to LTE network ", end='')
      lte.attach()
      while(True):
        if lte.isattached():
          print(" OK")
          return
        print('.', end='')
        time.sleep(1)    
    
    def connect_lte():  
      if lte.isattached():  
        print("Connecting on LTE network ", end='')
        lte.connect()
        timeout = 60
        while(timeout > 0):
          if not lte.isattached():
            print("Abort: LTE no longer attached")
            return
          if lte.isconnected():
            print(" OK")
            return
          print('.', end='')
          time.sleep(1)
          timeout -= 1
        print("Timeout")                  
    
    def disconnect_lte():
      if lte.isconnected():
        print("Disconnecting LTE ... ", end='')
        lte.disconnect()
        print("OK" if not lte.isconnected() else "Failed")
    
    def deinit_lte():
      print("De-initialising LTE ... ", end='')         
      lte.deinit(detach=False, reset=False)                
      print("OK")
    
    def send_data():    
      print("Sending UDP Packet")  
      udp_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)       
      udp_socket.settimeout(0) # non blocking    
      udp_socket.sendto("TICK{}".format(tick), server_addr)
    
      print("Waiting for Reply ", end='')  
      for i in range(20):          
    
        if not lte.isattached():
          print("Abort: LTE no longer attached")
          return
    
        if not lte.isconnected():
          print("Abort: LTE no longer connected")
          return
    
        reply = udp_socket.recv(1024).decode('utf-8')
        if reply:          
          print("\nGot Reply: {}".format(reply))    
          udp_socket.close()
          return; 
        if i == 10:  # retry sending
          print("Retry Sending", end='')      
          udp_socket.sendto("TICK{}-2".format(tick), server_addr)    
    
        time.sleep(1)         
        print(".", end='')      
    
      print("Timeout")  
      udp_socket.close()
    
    # Program starts here.
    print("Begin")
    
    server_addr=('176.32.103.205', 7) # Need to set this to a valid UDP Echo Server
    lte = None
    tick = 1
    sleep_secs = 60
    
    while True:        
        print("==================================================")
        print("Tick: ", tick)
    
        init_lte()
        attach_lte()        
        connect_lte() 
        
        if lte.isconnected():          
          send_data()  
    
        disconnect_lte()    
        deinit_lte()    
    
        print("Sleeping for", sleep_secs, "secs")
        for i in range(sleep_secs):   # allow keyboard interrupt to terminate app sooner
          time.sleep(1)
        tick += 1  
    
    

    Often the fist fail is in LTE.disconnect() after timing out from not receiving a reply as shown in the output below. I suspect an issue in the modem socket send functionality but I am only guessing.

    ==================================================
    Tick:  64
    Initialising LTE ... OK
    Attaching to LTE network  OK
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply .
    Got Reply: tick64
    Disconnecting LTE ... OK
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  65
    Initialising LTE ... OK
    Attaching to LTE network  OK
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..........Retry Sending..........Timeout
    Disconnecting LTE ... ?Traceback (most recent call last):
      File "<stdin>", line 110, in <module>
      File "<stdin>", line 49, in disconnect_lte
    OSError: the requested operation failed
    ?>
    Pycom MicroPython 1.20.2.r1 [v1.11-a5aa0b8] on 2020-09-09; GPy with ESP32
    Type "help()" for more information.
    

    My setup is as follows:
    GPy: with firmware 1.20.2.r1
    Modem: UE5.0.0.0d, LR5.1.1.0-47510
    M2MOne Sim on Telstra network

    I sincerely hope someone at Pycom takes an interest in fixing this. With a robust LTE library, the GPy could be a great platform with serious potential. Without it, its more of a tinker toy.

    I suspect the modem firmware may also be part of the problem if it has been customised for pycom. There appear to be custom AT commands in the C source that don't apear in any of the sequans documentation or anywhere on the internet. I believe ver 47510 is the latest modem firmware, but it is hard to actually find that out. The latest one available for download seems to be 41065 but a tiny one line on one of the pycom pages states "there is no MAJOR difference" between these two versions. Which is not exactly NO difference?


    Log in to reply
     


  • @peterp I'm not sure if anyone has actually reproduced the issue yet, so I am posting another output transcript where this program has failed in the hope it could shed some light as to what's happening.

    In summary the output below shows the script terminating, followed by some manual lte commands via the REPL terminal, followed by calling machine.reset() and some more lte commands that now succeed. Then restarting the original script which runs for 9 loops until it terminates again. It is worth noting that on the re-run, ALL 9 ticks were actually received by the echo server even though several of the replies never made it back.

    ==================================================
    Tick:  42
    Initialising LTE ... OK
    Attaching to LTE network .. OK
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply .
    Got Reply: tick42
    Disconnecting LTE ... OK
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  43
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network Abort: LTE no longer attached
    Disconnecting LTE ... OK
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  44
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network ♦Traceback (most recent call last):
      File "<stdin>", line 110, in <module>
      File "<stdin>", line 39, in connect_lte
    OSError: the requested operation failed
    ♦>
    Pycom MicroPython 1.20.2.r1 [v1.11-a5aa0b8] on 2020-09-09; GPy with ESP32
    Type "help()" for more information.
    >>>
    >>> lte.isattached()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: the requested operation failed
    >>> lte.isconnected()
    False
    >>> lte.reset()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: the requested operation failed
    >>> lte.attach()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: the requested operation failed
    >>> lte.detach()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: the requested operation failed
    >>> lte.deinit()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: the requested operation failed
    >>> lte.init()
    >>> lte.attach()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: the requested operation failed
    >>> import machine
    >>> machine.reset()
    ets Jun  8 2016 00:22:57
    
    rst:0x7 (TG0WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
    configsip: 0, SPIWP:0xee
    clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
    mode:DIO, clock div:1
    load:0x3fff8020,len:8
    load:0x3fff8028,len:2140
    ho 0 tail 12 room 4
    load:0x4009fa00,len:19760
    entry 0x400a05bc
    Pycom MicroPython 1.20.2.r1 [v1.11-a5aa0b8] on 2020-09-09; GPy with ESP32
    Type "help()" for more information.
    >>> from network import LTE
    >>> lte = LTE()
    >>> lte.attach()
    >>> lte.isattached()
    True
    >>> lte.connect()
    >>> lte.isconnected()
    True
    >>> lte.deinit()
    >>> Running d:\svn\other\pycom\smh_test\test_lte5.py
    
    >>>
    >>>
    Begin
    ==================================================
    Tick:  1
    Initialising LTE ... OK
    Attaching to LTE network .... OK
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..........Retry Sending..........Timeout
    Disconnecting LTE ... Failed
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  2
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..........Retry Sending..........Timeout
    Disconnecting LTE ... Failed
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  3
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..
    Got Reply: tick3
    Disconnecting LTE ... OK
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  4
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply .....
    Got Reply: tick4
    Disconnecting LTE ... OK
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  5
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..
    Got Reply: tick5
    Disconnecting LTE ... OK
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  6
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..........Retry Sending..........Timeout
    Disconnecting LTE ... Failed
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  7
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply .
    Got Reply: tick7
    Disconnecting LTE ... OK
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  8
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..........Retry Sending..........Timeout
    Disconnecting LTE ... Failed
    De-initialising LTE ... OK
    Sleeping for 60 secs
    ==================================================
    Tick:  9
    Initialising LTE ... OK
    LTE is attached
    Connecting on LTE network  OK
    Sending UDP Packet
    Waiting for Reply ..........Retry Sending..........Timeout
    Disconnecting LTE ... ♦Traceback (most recent call last):
      File "<stdin>", line 115, in <module>
      File "<stdin>", line 53, in disconnect_lte
    OSError: the requested operation failed
    ♦>
    Pycom MicroPython 1.20.2.r1 [v1.11-a5aa0b8] on 2020-09-09; GPy with ESP32
    Type "help()" for more information.
    >>>
    

    Summary:
    Tick 42 - Perfect
    Tick 43 - Aborted because isattached() returned false while waiting for isconnected() after connect()
    Tick 44 - Crashed during connect()
    Manual lte.isattached() - OSError
    Manual lte.isconnected() - Returned false
    Manual lte.reset() - OSError
    Manual lte.attach() - OSError
    Manual lte.deinit() - OSError
    Call machine.reset()
    Manual lte.attach() - OK
    Manual lte.isattached() - True
    Manual lte.connect() - OK
    Manual lte.isconnected() - True
    Manual lte.deinit() - OK
    Run Script
    Ticks 3,4,5,7 - Perfect
    Ticks 1,2,6,8 - Ok but didn't get replies even though echo server did receive data.
    Tick 9 - Died with OSerror in lte.disconnect()



  • @peterp Thanks, I did not know this. I added the following "check if attached" code to my attach_lte() function and can confirm that yes the lte does appear to be already attached after re-initialising, and you therefore don't always need to call lte.attach()

    def attach_lte():
      if lte.isattached():
        print("LTE is attached")
        return
      print("Attaching to LTE network ", end='')
      lte.attach()
      ...
    
    


  • @ruok4t said in LTE Modem Lockup. All methods throw "OSError: the requested operation failed":

    @peterp Yes machine.reset() is sufficient to allow you to then restart the program (without power cycling). To be clear, when this script fails, it terminates with the exception. Via the REPL terminal, attempting lte commands manually will throw the exception and/or re-starting the program will fail immediately. If however you call machine.reset() and then re-start the script it will be happy gain.

    Interesting. Thanks for the confirmation!



  • @ruok4t said in LTE Modem Lockup. All methods throw "OSError: the requested operation failed":

    The use of PSM greatly reduces the time it takes to re-attach to the network, most of the time a reattach would be within a second.

    Just a small remark for this. When PSM works properly, no re-attach should be needed at all. lte.isattached() will return true and you can even save the call to lte.attach()



  • @kjm Yes I'm aware of the other OSErrors, and they do indeed occur if the modem is in the wrong state, and my real application code does indeed handle these. All of those (so far) have been recoverable, without resetting the machine, "the requested operation failed" exception however appears to imply the modem is no longer responding to AT commands (from my understanding of the C source) and so there is no way of recovering from this, I can see. The exception handling was not included in the script I posted purely in an attempt to keep it as simple as possible.

    The fact that pycom acknowledge the isconnected() and isattached() methods can return incorrect values, and they are aware of other ways of discovering the state is disappointing in itself. Perhaps they could FIX those functions by catching the state changes internally (or at least mention this in the actual documentation, not only in an example). In any case, you still need to handle those exceptions as there is no guarantee that the modem will still be ready for coms in between checking isconnected() and sending the data.

    As for PSM, my real application has to stay powered for many hours at a time but only needs to sends LTE data occasionally (tiny packets anywhere from 10 secs apart to several hours a part). I have discovered through hours of trial and error, trying to keep lte attached permanently was problematic compared to attaching when needed and then disconnecting. when no longer needed. The use of PSM greatly reduces the time it takes to re-attach to the network, most of the time a reattach would be within a second. The other major advantage is. if not initialising with PSM, my network provider (Telstra) would log completed data sessions, and round up the data usage PER attachment to 1kb chunks. So I was billed for 1kb of data for every attach() / detach(), which in many cases would have only sent <50 bytes.
    Using PSM, Telstra keep the session active between attachments until no data for 2 hours or so.



  • @peterp Yes machine.reset() is sufficient to allow you to then restart the program (without power cycling). To be clear, when this script fails, it terminates with the exception. Via the REPL terminal, attempting lte commands manually will throw the exception and/or re-starting the program will fail immediately. If however you call machine.reset() and then re-start the script it will be happy gain.



  • Hi @ruok4t

    Just to be sure, I'd like to double check something.

    The only solution I have found is machine.reset() which is not really a valid solution in a commercial product.

    Can you confirm that machine.reset() gets you out of the "the requested operation failed" situation? Or are you power cycling the FiPy/Gpy (unplugging/replugging USB or power supply)? I understand how both cases would be frustrating from a users perspective, but I want to be sure, because this is a big difference for the reproduction!



  • Given the vagaries of radio, it's not unusual for the lte to loose attachment &/or connection. Sadly the lte.isattached() & lte.isconnected() status indicators are not aware of this & pycom now suggest using a callback https://docs.pycom.io/tutorials/networks/lte/ to check where your modem is at connectivity wise.

    Your code has to be able to deal with unannounced changes in modem status if you don't want OSErrors. For example trying to connect or disconnect a previously attached modem that has become unattached will give an OSError. Attempting to reset a connected modem will also throw an OSError (because you can't send at commands when the modem is in a data state).

    In all my lte tinkerings I've only ever found one modem OSError I couldn't blame on unannounced changes in modem status & I'm pretty sure that was due to a power supply brown out. Nothing but removing power for a few minutes was able to cure that one.

    Lastly I'm intrigued by your use of psm. I've found psm really hard to understand & useless for reducing power drain, if that is even it's intended purpose, when the modem is left on & attached during deepsleep.


Log in to reply
 

4 out of 9

Pycom on Twitter