Guru Meditation Error - Pycom WiPy



  • @oligauc Hmm

    May be I am missing something but _verify_connection_state in MQTTMSgHandler has the code below .

    self.connect() is in the scope of the MQTTMSgHandler instance - I can't see how self.connect() can be referring to the instance in MQTTClient class.

       def _verify_connection_state(self):
            elapsed = time.time() - self._start_time
            if not self._waiting_ping_resp and elapsed > self._ping_interval:
                if self._connection_state == mqttConst.STATE_CONNECTED:
                    self._pingSent=False
                    self._send_pingreq()
                    self._waiting_ping_resp=True
                elif self._connection_state == mqttConst.STATE_DISCONNECTED:
                    self.connect()
    
                self._start_time = time.time()
            elif self._waiting_ping_resp and (self._connection_state == mqttConst.STATE_CONNECTED or elapsed > self._mqttOperationTimeout):
                if not self._pingSent:
                    if self._ping_failures <= self._ping_cutoff:
                        self._ping_failures+=1
                    else:
                        self.connect()
                else:
                    self._ping_failures=0
    
                self._start_time = time.time()
                self._waiting_ping_resp=False
    


  • @oligauc Output is:

    >>> os.uname()
    (sysname='LoPy4', nodename='LoPy4', release='1.18.1.r7', version='v1.8.6-849-d1c5ea9 on 2018-12-17', machine='LoPy4 with ESP32', lorawan='1.0.2', sigfox='1.0.1')
    


  • @milan Please can you post the output of:

    import os
    os.uname()
    


  • @timh self.connect is defined in the MQTTClient class and is passed to the MsgHandler class using self._msgHandler=msgHandler.MsgHandler(self._recv_callback, self.connect)

    Your connect function creates a socket, but it does not send the connect packet to the aws server



  • @timh When the handler sees a disconnection, _connect_helper() is called, which in turn calls connect in the mqtt client class.

    The poll registration is done each time a new socket is created. Look at createSocketConnection

    The firmware has an embedded garbage collector



  • ... and one more! Had five Wipys running over the weekend, one died after roughly 37000 sent messages to AWS:

    Guru Meditation Error: Core 1 panic'ed (Unknown reason)

    And that was it. I'd really appreciate a fix here.



  • @timh been going over the AWS MQTT msgHandler further. I see a number of more bugs.

    For instance if a the handler see's a disconnection, and (tries to reconnect - of course connect() is missing)
    The poll registration on the original socket still exists. No poll registration is created when a new socket is created.

    When I finish reviewing I will post separately a updated message handler.

    Some of my earlier changes (above) don't make things worse but are incorrect/incomplete.

    T



  • @milan The master branch at github https://github.com/pycom/pycom-micropython-sigfox always almost always reflects the latest build, which you can install with pyupgrade. The version number is in the file https://github.com/pycom/pycom-micropython-sigfox/blob/master/esp32/pycom_version.h.
    I am not sure which commit to check out to get 1.18.1.r7. The log is not detailed in respect to that file.



  • @robert-hh Ok, but I can't find this release on https://github.com/pycom/pycom-micropython-sigfox/releases
    Where could I found source for this release?



  • @milan You have to build you own firmware to get the .elf file. It will be in esp32/build/WIPY/release



  • @robert-hh Unfortunately I can't share code with you, sorry. Firmware version is v1.18.1.r7. Also, where can I find .elf file from this version, I couldn't find it on GitHub?

    @Martinnn @timh I'm not using AWSIoT or threads but same thing happens to me anyway. I hope we will get some help on this, device is so unreliable this way.



  • @timh I also suspect AWSIoT or threads. I'll look into your suggestions - thanks!



  • @martinnn I wonder if your guru meditation could be due to AWS MQTT lib (threads).

    I found I had to make some fixes to that library. (I use the MQTTClient and MQTTMessageHandler classes to connect to a none AWS service that uses TLS.)

    I found that there where problems,

    for instance I found self.connect() is called in _verify_connection_state() but connect method isn't defined.

    That in itself should cause a guru but the next problem could lead to it thread issues.

    I found that if a disconnection occurs then underlying thread isn't being killed for the message handler.

    In fact I think there is no mechanism in that code to explicitly disconnect reliably and re-connect as threads are left running over time. This causes a fault eventually.

    I added connect and modified disconnect

    def disconnect(self):
            if self._sock:
                self._sock.close()
                self._sock = None
            self._conn_state_mutex.acquire()
            self._exitRequest = True
            self._conn_state_mutex.release()
    
        def connect(self):
            self.disconnect()
            self._conn_state_mutex.acquire()
            self._exitRequest = False
            self._conn_state_mutex.release()
            self.createSocketConnection()
    

    Added a flag in the handler to mark a request to close the thread and added a guard in the io_thread loop

    if self._exitRequest:
         _thread.exit()
         raise mqttConst.MQTTDisconnected
    

    This seems to have made the library more robust, at least for me.

    There does still seem to be one other run away thread issue, but my WDT catches that one.



  • Same here. Guru meditation error on Wipy, latest release FW, sending stuff every 2s to AWS via the builtin AWSIoT library. Also using I2C and asynchronous communication. Happens rarely though, but a killer for a device deployed in the field (it's dead then until someone presses the reset button). We'll add a hardware watchdog for this.



  • @milan That ceratinly should not happen with Python. Do you have more information, like the piece of code that crashed, the firmware version.


Log in to reply
 

Pycom on Twitter