Interrupt/callback design pattern?



  • Hi guys,

    I'm in need of some advice.

    I'm writing a simple application with various Bluetooth-based interrupts. I was wondering what is the best approach to handle those callbacks (or a button's interrupt)? I can simply do what I need to do inside the callback's handler, but I guess that's not a good idea. Interrupt handler's should be short, am I right? The available memory to interrupts are limited, am I right?

    Now, in Arduino, you can always define an App state and check it in your loop() . Here we don't have a loop, but we can manage to make one using a while True: in main.py and check for a shared state object, but I cannot work it. The CPU will get overwhelmed and doesn't allow interrupts to work properly. Has anyone done anything like this?

    P.S.: Why I'm doing it is because I'm having trouble managing my Application's functionality. It works erratically and most of the time results in Guru Meditation Errors hard to explain. My only explanation after 3 days of pulling my hair is that these are happening because of the complexity of interrupt handlers. For example, I have 40k free memory but I cannot write to a file without getting Guru Meditation Error of type IllegalInstruction occurred on core 0. Exception was unhandled. I have tested it in simple threads, nested threads, nested threads inside interrupt handlers... They all work nicely, but it won't work inside my application.

    Thanks,
    M



  • @crankshaft
    I have faced a lot of crashes and been able to somehow circumvent them by some tactics (this shouldn't be this way though). This one is particularly unique in comparison. The crashes you have mentioned are of type LoadProhibited but this is an IllegalInstruction exception. I have spent over a week rewriting my whole application in a completely different architecture that uses a while True loop in main thread and therefore no Actual functionality happens in any thread/interrupt/callback, everything in main thread, but, in the end, I have the same error as before. I wish Pycom team would assist me with some support, but I guess they are very busy these days.
    My guess is, some of the built-in functions or objects (especially WiFi-BLE classes) are NOT thread safe, they are designed to be ran on a separate core (not threads) and they are causing serious memory issues with other codes. I have a vigorous logging mechanism for all my functions using try/except/finally blocks, and I should not face any unhandled exceptions, but they happen nonetheless.
    One more thing I have seen which is interesting is that, sometimes allocating memory for a variable requests enormous space in memory (obviously an under-flow of memory-address byte size) and that immediately causes LoadProhibited errors. I have seen sporadic exceptions regarding such phenomena in which an int for example requires 32004304343409 bytes to allocate (fictional number)!
    Memory management is a very serious and profound aspect of every language/compiler and it is obvious to me that Micropython implementation used for xPy boards is seriously flawed in a way that makes a production ready application a very challenging task for any developer.



  • @abilio - Hi, I am also facing regular crashes of the same type as reported here: https://forum.pycom.io/topic/584/crash-when-running-thread

    Pretty much all my attempts at using interrupts / threads result in a crash and similar backtrace.



  • @abilio
    Hi,

    Could you add this dump to your list of dumps to investigate:

    Guru Meditation Error of type IllegalInstruction occurred on core 0. Exception was unhandled.
    Register dump:
    PC : 0x4018a1d2 PS : 0x00060031 A0 : 0x8004864d A1 : 0x3ffc0770
    A2 : 0x3ffbd978 A3 : 0x3ffc0a94 A4 : 0x800d9e99 A5 : 0x3ffb8360
    A6 : 0x3ffc34f4 A7 : 0x00000004 A8 : 0x800154fd A9 : 0x00000000
    A10 : 0x3ffd3320 A11 : 0x3ffd3320 A12 : 0x800d8ec8 A13 : 0x3ffd3320
    A14 : 0x00000000 A15 : 0x3ffc3530 SAR : 0x00000020 EXCCAUSE: 0x00000000
    EXCVADDR: 0x00000000 LBEG : 0x4000c46c LEND : 0x4000c477 LCOUNT : 0xffffffff

    Backtrace: 0x4018a1d2:0x3ffc0770 0x4004864d:0x3ffc0790 0x40054ed9:0x3ffc07b0 0x400839f9:0x3ffc07d0 0x400811d4:0x3ffc0800

    Here is the uname() result:

    (sysname='WiPy', nodename='WiPy', release='1.5.0.b2', version='v1.8.6-412-gf55ba50 on 2017-01-28', machine='WiPy with ESP32')

    This happens when I try to write to a file and close the file stream. For example, here is the code that causes the crash (It is not easily reproducible, it works fine when MCU load is low, but crashes when got there after a few interrupts and callbacks...):

       try:
            time.sleep_ms(5)
            setting_file = open("w.s", mode = 'w')
            print("=========== File Opened.")
            time.sleep_ms(5)
            setting_file.write(setting_str)
            print("=========== File Written.")
            time.sleep_ms(100)
            setting_file.flush()
            print("=========== File Flushed.")
            time.sleep_ms(100)
            # Error happens here
            setting_file.close()
            print("=========== File Closed.")
            time.sleep_ms(5)
            gc.collect()
            print("Preserved setting.")
            return True
        except Exception as ex:
            print("Unknown error in preserving WiFi Setting.")
            template = "An exception of type {0} occured. Arguments:\n{1!r}"
            message = template.format(type(ex).__name__, ex.args)
            print(message)    
            return False
        finally:
            gc.collect()
    

    Now, the line where =========== File Closed should be be printed never happens (under above circumstances.). And when the device gets rebooted, my app in the flash is cleared and replaced with default firmware. I'm sure it is not a memory issue, we have ample free memory at this point.

    Do you have any idea why?



  • @mohpor, right now the device is using just one of the cores. To find out what is going on, you can, for example, build the C code for that specific version, open it using gdb, and do disassemble ADDR (where ADDR is any of the addresses from the dump backtrace). That will tell you where in the C code the exception happened. From there we get an idea on how the bug was triggered, and we can proceed to find a solution for it.

    loadProhibited sounds like trying to access an invalid address (pointer related issue probably). Doing the disassemble normally tells more detailed info on the error.



  • @abilio
    Chances are it would be resolved before you find the cause (How are you finding the problem btw?)
    The point of the question is Why this is happening? I've read somewhere it happens because RTOS is raising it because the dual core system has objected the command. Do you know anything related to that?



  • @mohpor, well, it doesn't matter, it was a just in case question so we could test any possible solution. Anyhow, the dump should be good enough to find out the cause.



  • @abilio
    I'm continuously evolving my code to solve the crash, adding locks, gc.collect()s, delays,.... so, there is no solid state that I can send over.
    The only thing I want to say is that I have BLE + WiFi(STA) stacks in memory.



  • @mohpor, ok, I'll put it in the list of dumps to investigate. Do you happen to have a minimal working example to reproduce it?



  • @abilio
    Sorry.

    (sysname='WiPy', nodename='WiPy', release='1.5.0.b2', version='v1.8.6-412-gf55ba50 on 2017-01-28', machine='WiPy with ESP32')



  • @mohpor, what's the version number in this one? The addresses in the dump are only valid for your specific version.



  • @abilio
    Thanks for the reply.

    I have since moved from the long interrupt handlers to the while true loop in main.
    But I keep getting errors.
    Could you confirm once and for all, when does loadProhibited happen?
    for example:

    Guru Meditation Error of type LoadProhibited occurred on core 0. Exception was unhandled.
    Register dump:
    PC : 0x400dea6c PS : 0x00060630 A0 : 0x800df77b A1 : 0x3fff5e30
    A2 : 0x0000002c A3 : 0x00000001 A4 : 0x00060820 A5 : 0x3fff6118
    A6 : 0x01010001 A7 : 0x3ffdb6fc A8 : 0x00000001 A9 : 0x00004b1e
    A10 : 0x0000fb20 A11 : 0x3ffc39f8 A12 : 0x3ffc39f8 A13 : 0xb33f0000
    A14 : 0xb33fffff A15 : 0x00060623 SAR : 0x00000008 EXCCAUSE: 0x0000001c
    EXCVADDR: 0x00004b36 LBEG : 0x4000c349 LEND : 0x4000c36b LCOUNT : 0xffffffff
    Backtrace: 0x400dea6c:0x3fff5e30 0x400df77b:0x3fff5e50 0x401583d1:0x3fff60d0 0x40156c9f:0x3fff6110
    Rebooting...
    ets Jun 8 2016 00:22:57



  • @mohpor,

    The new interrupt mechanism is based on a thread that receives the interrupt sources in a queue, and fires the python callbacks appropriately. That way you don't have the limitations other micropython ports have. Still, you have to keep the handler as short (in execution time) as possible, because the queue is limited.

    Can you please state the firmware version you're using and copy all the information dumped by the Error, so we can look into the problem with more detail?

    Thanks in advance


Log in to reply
 

Pycom on Twitter